March 18, 2006

Tools and the SemWeb / SynWeb debate

Danny responds to my latest SemWeb rant. I really don't like being boring, but this subject interests me a lot (and I quite like all the rhetorical flourishes) so I'll indulge myself with some further responses. But to be clear, I'm not dissing Danny, I have a lot of respect for what he does both technically and as a SemWeb advocate.

But I would suggest that these things are well on their way. ... But the real application is the Semantic Web itself, in the same way that a browser isn't interesting without the web behind it.


I'd suggest that they're not well on their way at all, because he misses my point about what a tool "is".

A tool is not simply a program that manipulates a data-structure. It's something which solves an existing problem. Something which is the "best tool for the job". Or at least the prefered / preferable solution to the problem. I might be able to open a beer-bottle with a pen-knife, but a pen-knife, however well made, is not a tool for opening beer-bottles. The inner-technology has to, in some sense, match or "afford" the outer usage.

The trouble for the SemWeb technologies is that there are almost no problems for which they have any real distinguishing advantages over the SynWeb ones. (This is assuming my distinction between SemWeb, as the place where the meaning of a piece of meta-data is fixed a priori by attaching a URI to it, and the SynWeb as a place where meaning is determined later according to context and convention. As far as I can tell, Danny does still agree with this distinction?

Regarding formats without tools, I think you've got a fallacy in suggesting RSS and OPML are somehow special. Which came first, the RSS aggregator or RSS?


No, I certainly wasn't trying to suggest RSS and OPML are special. They're interesting only in the sense that they are succesful examples of the SynWeb, and what you could call the "Winer" strategy.

Heh, your choice of the word "parasitic" is a bit ironic, because you could say the OPML Editor and RSS apps are parasitic on the web, and the Semantic Web is an extension of the web.


Balderdash! ;-) The web is a wonderful example of doing the right thing. It's a "worse is better", funky little hack which combines an easy to use tool (the browser) with the simplest protocol which could possibly work, and a slightly fuzzy, problematic mark-up language, to solve a real-world problem : how to share documents over the internet. It disregarded mainstream academic Hypertext theory (how do you avoid broken links? answer : we don't) and the complexities of client-server architectures of the time. Consequently, it spread like wild-fire, adding many incremental improvements. It's survived (and thrived) for nearly 20 years with incompatible, buggy browsers and multiple specifications for HTML. (There are now three "official", incompatible versions of HTML from the W3C alone.)

The Semantic Web isn't an extension of this at all. It's basically a naive assumption about how meaning works ("hey! if we all use the same code-numbers for things then we'll know when we're talking about the same stuff") and a crowd of enthusiasts engaged in the Sisyphan task of trying to make anything useful from this assumption; while, all the while, casting covetous glances at the meta-data which is being generated by people who aren't burdened with the responsibility of inventing common code-numbers and ontologies.

Final point on what I meant by "tools trump formats / processes" because both Danny and Scribe have picked up on it. I'm not saying you can do without common formats. Of course a format is absolutely essential. What I'm saying is that the "goodness" of the tools is more important than the "goodness" of the format for the adoption and survival of the combined pair. Because it's the tools which are the interface to the outside world : the users.

2 comments:

Danny said...

Nice post. I'm glad you posted this, because in my latest post I singled out a single line of yours which seemed an easy target, but it didn't feel fair. But naturally I still disagree ;-)

Three things I'd like to come back on. First I'm not clear on is what you have in mind when you say: "the SynWeb as a place where meaning is determined later according to context and convention". Example?

What you say of the web around "the simplest protocol which could possibly work" makes sense. But I'd argue that Semantic Web technologies offer the simplest model which could work for sharing arbitrary data on the web.

I do need to get a better picture of what you mean by SynWeb, but I think I can respond to: "The trouble for the SemWeb technologies is that there are almost no problems for which they have any real distinguishing advantages over the SynWeb ones." I have a couple of cases as day job stuff, both essentially data integration tasks. One is medical data, the other geospatial data. I'd better not describe them now (soon) - I've been procrastinating all day. But I believe the medical project would be a lot harder to do without SemWeb tech, and previous attempts at the geospatial project using other technologies have failed, but the SemWeb stuff looks like it stands a good chance.

Composing said...

Danny :

Three things I'd like to come back on. First I'm not clear on is what you have in mind when you say: "the SynWeb as a place where meaning is determined later according to context and convention". Example?

Well, I define the SynWeb in opposition to the SemWeb. It's the anti-SemWeb if you like. Or given that the SemWeb is basically the place where "meaning == URIs" I take it as any strategy for infering meaning which isn't based on URIs.

Take this piece of data :

<name>fred blogs</name>

The SemWeb and SynWeb are rival theories about how we should infer its meaning.

In the SemWeb it would, of course, be something like <dc:name>fred blogs</dc:name>

And how we would infer meaning is this : dc:name is actually given a "type" via a unique id, the "dc" is just a local shorthand for this URI. Only this type determines its meaning. Whatever document or context we find this tag in, it means the same thing, because the URI type is considered to be both necessary and sufficient to give us all the information we need to infer meaning.

In the SynWeb, by contrast there's no claim for one particular method of assigning meanings to data, although there are syntactic constraints to help parse it.

How then do we find out what <name>fred blogs</name> actually means? Well, we infer it from context and convention. We might look at the kind of file that the tag is in, the application that produced it, the web-server or page where we got it etc. And we defer to the conventions of the community. If our community is using "name" to mean legal, given name then we'll assume it's a legal, given name. If the community uses it for some kind of login or nick, then we'll treat it that way.

When I say "determined later" what I really mean is that the consumer of the information is having to make this judgement at the time of consumption. Or the author of the consuming program is making it when the consuming program is written. Although this is obviously not a hard or fast rule. The community might already have a standard when the producing program was written.

Actually, I'm starting to wonder if there's an analogy with the debate between static and dynamic typing. The SemWeb is the static typing approach to data, when you produce it, you have to say what it is by giving it the URI. The SynWeb is the dynamic typing approach, it's at the time of consumption you decide what type it is, according to context.

But I'd argue that Semantic Web technologies offer the simplest model which could work for sharing arbitrary data on the web.

Well, I'm not sure what you mean by "arbitrary data". Doesn't http already allow the sharing of arbitrary data? I can stick a picture, a Midi file, a Word document etc. through it.

I believe the medical project would be a lot harder to do without SemWeb tech, and previous attempts at the geospatial project using other technologies have failed, but the SemWeb stuff looks like it stands a good chance.

Well, it's hard to make an assessment or response without seeing them :-) But I'm still willing to bet that *any* old UID would solve your problems equally well.