Linked Data for WWW2009 Online

Posted by Knud on April 16th, 2009

I don’t announce every new addition to the Semantic Web Dog Food Server, but this is a big one: based on the data available from EPrints, we managed to get information about papers and authors for the upcoming WWW2009 in Madrid up as linked data on the dog food server. You can get all the papers, authors and their affiliations, all nicely integrated with the rest of the dog food data from other conferences. You can start start browsing here or get a dump of the data. Enjoy!

“There is probably no Semantic Web …

Posted by Knud on March 16th, 2009

… now stop infering and get lodding!” A great little (a great little?!) photoshop tribute to the Atheist Bus Campaign in London and elsewhere (now also in Germany). I don’t know exactly where this picture appeared originally – a friend of a friend saw it on Twitter somewhere, and I don’t use Twitter. Anyway, I love it! I also love the fact that we now have a new verb. I wonder how it is inflected? It’s probably regular, so it should look like this:

to lod (verb): lod, lodded, lodding – the act of publishing linked open data on the World Wide Web, adhering to the rules of linked data.

There is probably no Semantic Web - LOD Bus

Tim Berners-Lee on Linked Data at TED

Posted by Knud on March 16th, 2009

Tim Berners-Lee1 gave an enthusiastic talk about linked data at TED, urging everybody to get their data out there or, if they don’t have any, to demand access to data in a proper format.

Interestingly, he didn’t mention the words “Semantic Web” once during the talk, nor did he ever say “RDF” or even “URI” – instead he spoke about “names starting with ‘http’”. Cool enough, his slides had the dog food data set in them! :)

A video of the talk and a link to the slides can be found on the ebiquity blog.

LOD Cloud with dogfood

1I wish this link would lead me to something nice when I go to it with a Web browser!

RDF for all of O’Reilly’s titles (with OPMI)

Posted by Knud on March 14th, 2009

I might be a bit late (one month) to discover this, but IT book publisher O’Reilly have recently started a service called O’Reilly Product Metadata Interface (OPMI), which provides RDF metadata for their whole catalogue of books. More details about this can be found on the O’Reilly Labs page.

I think it’s great news that a major publisher starts to open up their data to the Semantic Web! Term-wise, they do the right thing and use vocabularies that have turned into de-facto standards (FOAF and DC (terms) in particular), as well as some newly coined terms in their own O’Reilly namespace. They also get brownie points for actually making their namespace dereferencable. Good practice!

There are a few things that could be improved to make their data more useful, though:

  • They use non-http URIs like this: urn:x-domain:oreilly.com:agent:pdb:1210. That’s perfectly fine RDF, but it breaks the linked data rules – URIs like that are not dereferencable, which means it is impossible for interested agents to find out more about those resources.
  • Both the book URIs and the ontology namespace URI lead only to RDF. It would be nice if, upon a request for HTML, their servers would provide something human-readable as well. They acknowledge this problem themselves, so hopefully it will be addressed soon. Content negotiation to the rescue? For their vocabulary, these vocabulary publishing recipes might help (in combination with a tool like VocDoc).
  • The ontology source looks a bit messy, with weird namespace declarations like xmlns:p3="http://purl.org/dc/terms/#". These might be artifacts from the ontology editor they used, though. Not really harmful, just ugly.

Linked Data Access Analysis

Posted by Knud on February 4th, 2009

I’m currently working on an analysis of the log files of the Semantic Web Dog Food server. Apart from the obvious queries such as “How much traffic was there?”, “When were the peaks in traffic?” or “Where did the traffic come from?”, Semantic Web-type linked data inspires some other questions as well. Examples of such questions are to figure out how intensively the Semantic Web portion of the data was used (i.e., how often was RDF requested compared to HTML), how the distribution of “semantic” vs. “conventional” user agents was or what kind of data was requested.

Using the techniques described earlier in a post on my Confused Development blog I sifted through about 7 months worth of log files and generated some pretty pictures. Here is what I came up with so far:

Linked data hit analysis (Data tail)

The serving of linked data on the dog food server works through content negotiation – basically, the first request by an agent would be to the URI of the resource (“plain” in the graph), specifying in the header whether an RDF or HTML representation is desired. The server then redirects to either the HTML or RDF document with the desired representation. In theory, this means that requests(rdf) + requests(html) = requests(plain). However, since it is perfectly feasible to request the HTML or RDF documents directly, the total of RDF+HTML is slightly higher. The total numbers are:

HTML: 238486
RDF: 35491
HTML+RDF: 273977
Plain: 247576

As the graph and the numbers show, the usage in terms of RDF requests is relatively low at the moment, indicating that there is still a long way to go for the Semantic Web to really take off (and that we need to work on making the site more popular).

Linked data hit analysis (Resource type)

This second graph shows the distribution of hits over time for the different kinds of resources which the server offers, as indicated by the requested namespace (dogfood:person, dogfood:conference, …). Interest in people resources is highest almost all of the time. Partially, this may be due to ego surfing of Semantic Web researchers. However, as the graphs below will show, bot traffic far exceeds traffic by human visitors, so my hunch is that the preference of people pages can be explained through the search strategies of the big search engine players out there – people information is probably considered more valuable. Of course, another factor is the fact that there are about three times as many people resources on the dog food server than e.g. conference resources.

Regarding the conference and workshop resources, those need to be examined in a more fine-grained fashion, since the respective namespaces cover everything connected to an event: papers, talks, chairs, the event itself, etc.

Linked data hit analysis (Agent tail)

No self-respecting analysis can live without a nice longtail graph these days. Looking at visiting agents, we get such a distribution (y-scale is logarithmic). The agents in the head are the big search engine crawlers – GoogleBot, Yahoo! Slurp and MSNBot -, as well as the big name browsers. In the middle and long tail we find lots and lots of different other bots, crawlers and browsers, as well as various tools, data services and agents who didn’t give themselves a proper identifier and instead just show up as “Java” or “perl-libwww” (very naughty behaviour indeed…).

Linked data hit analysis (Agent types)

More interesting is probably this graph, which shows the agent distribution after I had sliced and diced it manually according to some criteria:

  • What type of agent is it: bot/crawler, browser (=human visitor), unspecified programming library, debugging or scripting tool (curl, wget, …) or data-service. The latter is Richard’s term for agents which provide a service for other agents by processing some data on the Web. In contrast to crawlers, the purpose here is not archiving or indexing. Examples are format converters, snapshot generators, etc.
  • What is the “semanticity” of the agent: is it a conventional agent, or one that operates in a Semantic Web-aware fashion?
  • Mobile or not: I noticed a (small) amount of visits by mobile browsers, which I thought could be interesting to record separately.

All this and more will become part of my thesis and also (hopefully) make into some sort of more polished publication soon.

VoCamp Galway 2008

Posted by Knud on December 3rd, 2008

Last week we organised a second VoCamp – a grass roots, BarCamp-style workshop for creating Semantic Web vocabularies – in Galway. The setup was much like the first one in Oxford: we as the organisers provided the room and coffee breaks, but otherwise only set a very basic schedule (start-coffee-lunch-coffee-wrapup). The real action was provided by the delegates, who divided up into groups according to interests and worked away. On several occasions throughout the two days we all came together again and every group had the chance to report on their progress, discuss problems with all VoCamp delegates, etc. It was all very relaxed and productive, and with an interesting mix of people. Apart from a good crowd from DERI, there were people from Talis, Yahoo (Peter Mika was luckily able to make it) and Edinburgh. Some people even came from as far as Germany and Florida!

Vocabulary Hacking

All the different groups and their results can be found on this wiki page, so I’ll just mention a few things here, such as vocabularies for meeting minutes, calls for papers or real estate (not forgetting the very important Ear Worm vocabulary), more work on a SW starter pack, discussions and work on Microformat-RDF mappings and RDFa in Drupal.

Luckily Galway was on its best behaviour – I think it didn’t rain at all during the two days. Looking forward to more VoCamps in other places soon!

“A New Way of Linking People to Places”

Posted by admin on November 26th, 2008

I recently discovered this very cool project here in Galway called murmur. “Linking People to Places”? Sounds a lot like some Semantic Web or Linked Data thing. In a way it is, only it’s live and doesn’t (directly) involve the internet or URI. Instead, the people behind murmur have put up metal signs in different locations all over Galway. Each sign has a freefone number on it, which, if called, will get you to a recording of a story about the place where the sign was set up. The stories are all told by Galway locals, and were also recorded at the sign’s location.

Murmur Galway

I think this is a nice example of using technology to provide a better experience and understanding of a city. Conceptually it’s located somewhere in the vicinity of topics such as the internet of things, ubiquitous computing – only it doesn’t involve computing. :) It reminds me of a story where someone demonstrated the principle of topic maps with strings and other physical artifacts, thereby moving from the digital over to the physical domain.

Murmur has also been set up in other cities, such as Toronto, Edinburgh, Dublin, San José, Montreal, Calgary and São Paolo.

The Value of Advertising

Posted by admin on November 1st, 2008

So, ISWC2008 is over and I’m back in Galway. What did I learn this year?

  • There are more and more Semantic Web applications out there, and they are getting slicker and more user-friendly every year. The demo and poster session and the Semantic Web challenges clearly showed that. Some highlights were probably paggr (semantic widgets) by Benjamin Nowack and several different apps that make use of mobile technologies (on the iPhone, no less). Incidentally, those two also won the first and second prize in the challenge (Benjamin won this for the second time already, after having won with CONFOTO (seems to be offline at the moment) at ISWC2005.
  • Interestingly for me, a lot of people are working on solutions to make SPARQL-querying more accessible to end users. There is our own work on a SPARQL builder component for Konduit, there is the web-based graphical interface NITELIGHT, and some cool SPARQL extensions by Benjamin Nowack (again!). While those were all presented during the poster session, I also talked to some other people in the coffee breaks who told me about their work in this area – this clearly seems to be an area where a lot of developments and improvements are going to surface soon!
  • OpenCyc – this is of course not really a new development, but after having attended the tutorial of using OpenCyc for the Semantic Web, I’m starting to think that their ontology and knowledge base are, at the very least, a very interesting point of reference for linked open data. Those guys have worked on their ontologies for a long time, and a lot of reasoning technology is already in place. Therefore, if we hook up our linked data to (Open)Cyc terms, the hope is that we can finally have the inferencing magic that people are dreaming of for the Web.
  • And finally, to come to the title of this post. I learned the hard way this year that one cannot put enough effort into advertising one’s work and also oneself. I think Richard and I did a pretty good job with the conference metadata this year, and set up a very nice site with a lot of interesting functionality for developers and conference attendees. Unfortunately, we didn’t spend an equal amount of work on making the people at the conference aware of that, with the result that e.g. way too few knew that there was an option to discuss papers online and make those discussion become part of the metadata about the paper. Also, to my surprise, some people even didn’t seem to know that I had been acting as metadata co-chair at all. Note to self: be more proactive next year.

Semantic Web Dog Food

Posted by admin on October 16th, 2008

Hooray, the spanking new Semantic Web Dog Food site is finally ready for prime time at http://data.semanticweb.org! The site has been the central repository for conference metadata (people, papers, talks, organisations, etc.) from the major Semantic Web conferences (mainly ISWC and ESWC) in the past years, but so far has lacked a unified, cross-conference interface. Also, because different people had been responsible for generating the data for different conferences, the dataset wasn’t really as well interlinked as it could have been.

Semantic Web Dog Food

Now, with the help of funding from SWSA and the Nepomuk project, Richard Cyganiak, research intern Venkatram Yadav and me have managed to do a lot of data-cleaning and aligning and redo the whole site as a module on top of the Drupal CMS, with the result that everything is now a lot nicer looking, more user friendly, better interlinked and generally speaking cooler. Thanks a lot also to Stéphane Corlosquet, our local Drupal guru here at DERI, who helped us out with a lot of tricky Drupal questions.

Apropos Drupal: There is an interesting discussion going on at the moment in the Drupal community to add RDF export functionality to the Drupal Core system. What it means is basically exporting the Drupal DB as RDF (SIOC, FOAF, etc.). Somehow, our approach is the exact opposite – we export an RDF-DB through Drupal! Both approaches put together in a meaningful way would probably result in a very cool end product!

So, what can the Dog Food site do for you? Here is a list:

  • Browse thousands of people, papers and organisations in your Web browser, …
  • … or in a linked data browser – it’s all linked data!
  • SPARQL to your heart’s content, making use of the named graphs we have established for each event in the database.
  • To support your SPARQL needs, you can also use the snorql tool on the site.
  • Comment and discuss each paper. All papers and comments are good citizens of the SIOC-osphere!
  • Do a full-text search on the data on the site.
  • Enjoy eye-candy like the map of all organisations in the repository (provided we have their geo-coordinates).

I am not Alone

Posted by admin on October 3rd, 2008

I just read a post by Akshay Java which makes me feel less bad about posting so little to this blog. “We have all been there: started a blog and never kept up with it, got busy with other interests or simply do not find enough time to keep up with blogging.” He then goes on to give an analysis of a sample of 50.000 blog posts with respect to the date of their latest update. However, the main message for me is that I am not alone in being a lazy blogger…