Archive for the 'semanticweb' Category

Tim Berners-Lee on Linked Data at TED

Posted by Knud on March 16th, 2009

Tim Berners-Lee1 gave an enthusiastic talk about linked data at TED, urging everybody to get their data out there or, if they don’t have any, to demand access to data in a proper format.

Interestingly, he didn’t mention the words “Semantic Web” once during the talk, nor did he ever say “RDF” or even “URI” – instead he spoke about “names starting with ‘http’”. Cool enough, his slides had the dog food data set in them! :)

A video of the talk and a link to the slides can be found on the ebiquity blog.

LOD Cloud with dogfood

1I wish this link would lead me to something nice when I go to it with a Web browser!

RDF for all of O’Reilly’s titles (with OPMI)

Posted by Knud on March 14th, 2009

I might be a bit late (one month) to discover this, but IT book publisher O’Reilly have recently started a service called O’Reilly Product Metadata Interface (OPMI), which provides RDF metadata for their whole catalogue of books. More details about this can be found on the O’Reilly Labs page.

I think it’s great news that a major publisher starts to open up their data to the Semantic Web! Term-wise, they do the right thing and use vocabularies that have turned into de-facto standards (FOAF and DC (terms) in particular), as well as some newly coined terms in their own O’Reilly namespace. They also get brownie points for actually making their namespace dereferencable. Good practice!

There are a few things that could be improved to make their data more useful, though:

  • They use non-http URIs like this: urn:x-domain:oreilly.com:agent:pdb:1210. That’s perfectly fine RDF, but it breaks the linked data rules – URIs like that are not dereferencable, which means it is impossible for interested agents to find out more about those resources.
  • Both the book URIs and the ontology namespace URI lead only to RDF. It would be nice if, upon a request for HTML, their servers would provide something human-readable as well. They acknowledge this problem themselves, so hopefully it will be addressed soon. Content negotiation to the rescue? For their vocabulary, these vocabulary publishing recipes might help (in combination with a tool like VocDoc).
  • The ontology source looks a bit messy, with weird namespace declarations like xmlns:p3="http://purl.org/dc/terms/#". These might be artifacts from the ontology editor they used, though. Not really harmful, just ugly.

Linked Data Access Analysis

Posted by Knud on February 4th, 2009

I’m currently working on an analysis of the log files of the Semantic Web Dog Food server. Apart from the obvious queries such as “How much traffic was there?”, “When were the peaks in traffic?” or “Where did the traffic come from?”, Semantic Web-type linked data inspires some other questions as well. Examples of such questions are to figure out how intensively the Semantic Web portion of the data was used (i.e., how often was RDF requested compared to HTML), how the distribution of “semantic” vs. “conventional” user agents was or what kind of data was requested.

Using the techniques described earlier in a post on my Confused Development blog I sifted through about 7 months worth of log files and generated some pretty pictures. Here is what I came up with so far:

Linked data hit analysis (Data tail)

The serving of linked data on the dog food server works through content negotiation – basically, the first request by an agent would be to the URI of the resource (“plain” in the graph), specifying in the header whether an RDF or HTML representation is desired. The server then redirects to either the HTML or RDF document with the desired representation. In theory, this means that requests(rdf) + requests(html) = requests(plain). However, since it is perfectly feasible to request the HTML or RDF documents directly, the total of RDF+HTML is slightly higher. The total numbers are:

HTML: 238486
RDF: 35491
HTML+RDF: 273977
Plain: 247576

As the graph and the numbers show, the usage in terms of RDF requests is relatively low at the moment, indicating that there is still a long way to go for the Semantic Web to really take off (and that we need to work on making the site more popular).

Linked data hit analysis (Resource type)

This second graph shows the distribution of hits over time for the different kinds of resources which the server offers, as indicated by the requested namespace (dogfood:person, dogfood:conference, …). Interest in people resources is highest almost all of the time. Partially, this may be due to ego surfing of Semantic Web researchers. However, as the graphs below will show, bot traffic far exceeds traffic by human visitors, so my hunch is that the preference of people pages can be explained through the search strategies of the big search engine players out there – people information is probably considered more valuable. Of course, another factor is the fact that there are about three times as many people resources on the dog food server than e.g. conference resources.

Regarding the conference and workshop resources, those need to be examined in a more fine-grained fashion, since the respective namespaces cover everything connected to an event: papers, talks, chairs, the event itself, etc.

Linked data hit analysis (Agent tail)

No self-respecting analysis can live without a nice longtail graph these days. Looking at visiting agents, we get such a distribution (y-scale is logarithmic). The agents in the head are the big search engine crawlers – GoogleBot, Yahoo! Slurp and MSNBot -, as well as the big name browsers. In the middle and long tail we find lots and lots of different other bots, crawlers and browsers, as well as various tools, data services and agents who didn’t give themselves a proper identifier and instead just show up as “Java” or “perl-libwww” (very naughty behaviour indeed…).

Linked data hit analysis (Agent types)

More interesting is probably this graph, which shows the agent distribution after I had sliced and diced it manually according to some criteria:

  • What type of agent is it: bot/crawler, browser (=human visitor), unspecified programming library, debugging or scripting tool (curl, wget, …) or data-service. The latter is Richard’s term for agents which provide a service for other agents by processing some data on the Web. In contrast to crawlers, the purpose here is not archiving or indexing. Examples are format converters, snapshot generators, etc.
  • What is the “semanticity” of the agent: is it a conventional agent, or one that operates in a Semantic Web-aware fashion?
  • Mobile or not: I noticed a (small) amount of visits by mobile browsers, which I thought could be interesting to record separately.

All this and more will become part of my thesis and also (hopefully) make into some sort of more polished publication soon.

The Value of Advertising

Posted by admin on November 1st, 2008

So, ISWC2008 is over and I’m back in Galway. What did I learn this year?

  • There are more and more Semantic Web applications out there, and they are getting slicker and more user-friendly every year. The demo and poster session and the Semantic Web challenges clearly showed that. Some highlights were probably paggr (semantic widgets) by Benjamin Nowack and several different apps that make use of mobile technologies (on the iPhone, no less). Incidentally, those two also won the first and second prize in the challenge (Benjamin won this for the second time already, after having won with CONFOTO (seems to be offline at the moment) at ISWC2005.
  • Interestingly for me, a lot of people are working on solutions to make SPARQL-querying more accessible to end users. There is our own work on a SPARQL builder component for Konduit, there is the web-based graphical interface NITELIGHT, and some cool SPARQL extensions by Benjamin Nowack (again!). While those were all presented during the poster session, I also talked to some other people in the coffee breaks who told me about their work in this area – this clearly seems to be an area where a lot of developments and improvements are going to surface soon!
  • OpenCyc – this is of course not really a new development, but after having attended the tutorial of using OpenCyc for the Semantic Web, I’m starting to think that their ontology and knowledge base are, at the very least, a very interesting point of reference for linked open data. Those guys have worked on their ontologies for a long time, and a lot of reasoning technology is already in place. Therefore, if we hook up our linked data to (Open)Cyc terms, the hope is that we can finally have the inferencing magic that people are dreaming of for the Web.
  • And finally, to come to the title of this post. I learned the hard way this year that one cannot put enough effort into advertising one’s work and also oneself. I think Richard and I did a pretty good job with the conference metadata this year, and set up a very nice site with a lot of interesting functionality for developers and conference attendees. Unfortunately, we didn’t spend an equal amount of work on making the people at the conference aware of that, with the result that e.g. way too few knew that there was an option to discuss papers online and make those discussion become part of the metadata about the paper. Also, to my surprise, some people even didn’t seem to know that I had been acting as metadata co-chair at all. Note to self: be more proactive next year.

Semantic Web Dog Food

Posted by admin on October 16th, 2008

Hooray, the spanking new Semantic Web Dog Food site is finally ready for prime time at http://data.semanticweb.org! The site has been the central repository for conference metadata (people, papers, talks, organisations, etc.) from the major Semantic Web conferences (mainly ISWC and ESWC) in the past years, but so far has lacked a unified, cross-conference interface. Also, because different people had been responsible for generating the data for different conferences, the dataset wasn’t really as well interlinked as it could have been.

Semantic Web Dog Food

Now, with the help of funding from SWSA and the Nepomuk project, Richard Cyganiak, research intern Venkatram Yadav and me have managed to do a lot of data-cleaning and aligning and redo the whole site as a module on top of the Drupal CMS, with the result that everything is now a lot nicer looking, more user friendly, better interlinked and generally speaking cooler. Thanks a lot also to Stéphane Corlosquet, our local Drupal guru here at DERI, who helped us out with a lot of tricky Drupal questions.

Apropos Drupal: There is an interesting discussion going on at the moment in the Drupal community to add RDF export functionality to the Drupal Core system. What it means is basically exporting the Drupal DB as RDF (SIOC, FOAF, etc.). Somehow, our approach is the exact opposite – we export an RDF-DB through Drupal! Both approaches put together in a meaningful way would probably result in a very cool end product!

So, what can the Dog Food site do for you? Here is a list:

  • Browse thousands of people, papers and organisations in your Web browser, …
  • … or in a linked data browser – it’s all linked data!
  • SPARQL to your heart’s content, making use of the named graphs we have established for each event in the database.
  • To support your SPARQL needs, you can also use the snorql tool on the site.
  • Comment and discuss each paper. All papers and comments are good citizens of the SIOC-osphere!
  • Do a full-text search on the data on the site.
  • Enjoy eye-candy like the map of all organisations in the repository (provided we have their geo-coordinates).

VoCamp Oxford 2008

Posted by admin on September 30th, 2008

I just came back from the first VoCamp, held at Wolfson College in Oxford. It was the first in what will hopefully become a series of small, hands-on, community-driven events where people get together to build and work on vocabularies and ontologies for the Semantic Web. Peter Mika had a nice blog post recently on why such activity is badly needed.

VoCamp2008 Oxford

The whole event was pretty organic and loosely organised. Compared to big, official events with lots of pretty boring talks (not saying that _all_ talks are always boring), VoCamp was refreshingly fun and engaging. I actually had the feeling that I was doing something useful. Ad-hoc groups formed on the spot, working on varied topics such as an IRC vocabulary, a whiskey ontology, something which could be called a “vocabulary starter pack for SemWeb newbies”, an evidence ontology, bio-med vocabularies, etc. The idea is that we will have a number of VoCamps in rapid succession (the next one will be in November here in Galway), and so, even though probably none of the individual topics will have enourmous impact just now, I think VoCamp can definitely create a lot of momentum over time.

On Thursday, we planned to take the opportunity to join the Oxford SWIG meeting, but unfortunately there didn’t seem to be a lot of Semantic Web interest just that evening in Oxford. However, I did manage to say hello to Kal Ahmed of TM4J (Topic Maps) fame!

Shift for KDE

Posted by Knud on August 8th, 2007

Over at the spanking new SMILE group blog, there is a post about Dragos’s work on porting Shift to KDE, as part of the Nepomuk project. Of course, it’s not quite as slick as on a Mac. ;)

Shift Binaries for Download

Posted by Knud on July 30th, 2007

Shift (as well as Kante and Knoten) have been available as source for a while now, but installing from source is not really a lot of fun if you just want to try it out.
So, now I finally got around to putting together an installer package for Shift, that will just install the binaries on your computer! It contains the Shift application itself, as well as three plugins (AddressBook, iCal and BibDesk). So, no more excuses – just download the installer, install Shift, and start creating RDFa! :)
When I get the time, I will put together a proper readme file. In the meantime, I hope you will find Shift self-explanatory.

Oh, and please send plenty of bug reports.

Go back to start

Posted by Knud on May 31st, 2007

Arggghhhhh… by some unfortunate series of events, I deleted this whole blog. I won’t tell you the long, boring story, just the bottom line: I will have to start over again. The only good thing about it is that it gives me the opportunity to implement some changes I had planned on doing for quite some time. For starters, I changed the name from “semiblog” to “kantenwerk”, because I’m no longer working on the semiBlog software. I realized it was a bit pointless to try building a complete blog authoring tool, when all I really wanted to do was provide a tool to annotate blogs with all kinds of content-related metadata. There are lots of really good blogging tools out there that have functionality that semiBlog just couldn’t compete with.

So, I extracted the core functionality of semiBlog – create RDF metadata from desktop objects – and threw away the rest. The result is an application called “Shift”, which simply let’s you take objects from other desktop applications (contacts, calendar event, papers, …) and generate a snippet of metadata from them in a very convenient way. Drag&Drop, autocompletion, copy&paste, that’s all. Those snippets can be in various format, such as RDFa, which makes it really easy to incorporate them into web pages – blog posts, but also any other web page. For WWW2007, I made a little video of Shift in action.

Also, instead of using this blog as the web site for one software project, I think I will just turn it into my personal web site. It’s too annoying to have to juggle around with too many different sites.

Another one of those test posts, just to demonstrate how you can use Shift (formerly semiBlog) and SemClip to move data between desktops. The Shift application was written by Knud Möller, and allows you to produce RDFa code from desktop objects, which you can then use to annotate any webpage. E.g., this blog post! The Semantic Clipboard (or SemClip) is a software by Gerald Reif and his group, and allows you to do the exact opposite: you can copy RDFa (or other RDF) from web pages and paste it back into your desktop applications.
Both will be demonstrated at the Web Data 2 Session of the Developer’s Track at WWW2007 by Siegfried Handschuh.