… hello Talis!

Posted by Knud on October 16th, 2011

After saying goodbye to DERI in my previous post, now it’s time to say a few words about my new job: two weeks ago on the 3rd of October I joined the consulting team at Talis. As a company, Talis has pretty amazing track record of being a market leader in Semantic Web and linked data technology (e.g, through their linked data platform, their Aspire e-learning system and recently through Kasabi). Beyond these products, the consulting team helps clients (see the case studies for a list of previous projects) to learn about the possibilities of the technology, and eventually design and develop individual linked data solutions.

I’m pretty excited about this move – after years in academia, I will finally be able to apply my know-how to real-world problems and use cases of actual, paying customers! And as I said in my previous post: now it is the perfect time to do this, as semantic technologies are more and more moving into the mainstream.

For the coming six months I’ll be based in Birmingham (coincidentally the home of metal!), but will eventually move to Berlin and work from there. Germany is still a little behind in the whole open data movement, but things are happening there as well. See you soon!

Bye Bye DERI…

Posted by Knud on October 9th, 2011

'So Irish...' by Dunkoman on Flickr

It feels strange, but last Friday, after a good 7 1/2 years (that’s 2829 days!), I finally had my last day at DERI, the Digital Enterprise Resarch Institute at the National University of Ireland in Galway.

Coming from a background as a linguist and knowing very little about the Semantic Web, I started as a fresh PhD student in January 2004, when DERI was still only a handful of researchers. Very few people had ever heard about this “Semantic Web” (let alone “linked data” – that label was only coined a few years later), and those who did mostly considered it to be a rather far-fetched, purely academic exercise. I experienced the somewhat crazy early years at DERI (read about it in the paper…), saw the institute grow, change management, change location and eventually turn into the largest (currently 137 members) and probably most successful SW research institute world-wide. I’m pretty sure that for many if not most SW or linked data-related projects and activities you will come across today, there will be someone involved who either did or does work at DERI. Or someone who will work at DERI in the future – during the almost 8 years I have spent there, many outstanding personalities I met in the community eventually joined our little institute.

I experienced DERI as a fantastic place to work: I learned an immense amount of things (skills and experiences that definitely helped me find my new job), made good friends from all over the world (some of the for life, I’m sure), had the opportunity to work and engage with some of the most interesting and influential people in the community (both at DERI and in collaboration with outside partners) and even managed to finish a PhD along the way. Of course, part of the DERI experience is the (mostly) beautiful city of Galway, where the institute is located – but that’s a whole different story. I feel privileged having been very close to the centre of a development which saw the idea of a meaningful, machine-interpretable, “smarter” Web evolve from something that was either ignored or laughed at, into something that is now (in one form or the other) on the agenda of virtually all the big players who define what the Web is today – to pick a few arbitrary examples, just look at schema.org (Google, Yahoo!, Microsoft), Opengraph (facebook) or the adoption of linked data by the BBC.

So, now that my time at DERI is over, I’d like to say “thank you” once more to everyone I have met there, worked with, laughed with, argued with, drank Guinness, whiskey and wine with (or coffee and tea), or walked through the rain with – go raibh míle maith agat! We’ll meet again!

NHS Jargon RDF

Posted by Knud on May 16th, 2011

Did you feel it? The LOD Cloud just grew again by a tiny fraction: https://kantenwerk.org/metadata/nhs_jargon.rdf. While playing around with triplifying some NHS data, I started making notes about the various acronyms used in there. I noted a link to the brilliantly named Jargon Buster and thought “Why don’t I triplify that?”. It would provide me with a good resource to link the actual data to.

This little project was an opportunity to try out the very cool ScraperWiki. One thing led to another, after a short while I arrived at this scraper, and eventually the final Jargon RDF was done. Now on to the actual task at hand…

NHS Jargon Buster Scraper

The Web of Data Grows and Grows…

Posted by Knud on September 21st, 2010

Back in September 2009, Bob DuCharme highlighted the growth of the Web of Linked Data by comparing versions of Richard Cyganiak’s LOD cloud diagramme. Now I’m sitting in Chris Bizer’s keynote at FIS2010 and just got to see the latest version of this diagramme. The amount of growth looks amazing; just by looking at it you get the impression that things are really happening now. 24.7 billion triples, 436 million links. Also, what I like about the diagramme is how it uses colour to show the different domains the various datasets belong to.

LOD Cloud, September 2010

The new version of the LOD cloud will be published later today or tomorrow, but you get a sneak peak here first! ;)

Close, but a Cigar Nevertheless

Posted by Knud on May 4th, 2010

I just came back from this year’s Web Science Confernce in Raleigh, NC. The idea of the conference – as of Web Science in general – is to give a holistic, multi-disciplinary view on the Web, and while I’m still not sure if and exactly how this will work like in the end (there was a heated discussion between social and computer scientists in the closing panel), I found the event very interesting and a lot of fun. Of course, the best surprise came right at the end, when our paper on Linked Data Usage (I had reported on early stages of this quite a while ago on this blog) was shortlisted as one of three papers for the best paper award! In the end we didn’t win (the prize went to the paper by Metaxas and Mustafaraj: From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search), but just to get the nomination was pretty awesome. I really didn’t expect this, considering that this paper had been in the pipeline for more that a year now, but never quite made it for any submission deadline, and was therefore delayed time and time again. This is great encouragement for continuing our work in this area!

Semantic User Agents

Posted by Knud on October 8th, 2009

I’m still very much interested in the topic of analysing usage of linked data sites. To that end, an interesting question to ask is what kinds of agents access a linked data site. And here, apart from the usual categorisation into bots, browsers and such, it makes sense to differentiate between semantic and non-semantic agents. Very loosely, we could say that

Semantic agents are agents which are aware of RDF data and actively request it.

To know whether or not an agent requests RDF, we could look at the header of an individual HTTP request and check if the agent had specified Accept: application/rdf+xml. However, the Apache server log files unfortunately don’t tell us anything about the request header. Luckily though, there is an indirect way of finding out about this. If our linked data site uses best practice content negotiation and 303 redirects, we can look at pairs of requests in the log files. E.g., the Semantic Web Dog Food site uses a particular URI pattern for resources and their HTML and RDF representations. E.g.:


http://data.semanticweb.org/organization/deri-nui-galway

http://data.semanticweb.org/organization/deri-nui-galway/html

http://data.semanticweb.org/organization/deri-nui-galway/rdf

If the plain URI is requested, the server will either redirect to the HTML or the RDF representation, based on what was specified by the agent. Therefore, if we find a request for a plain URI and a request for the corresponding RDF URI, from the same IP address and the same agent, within a short time frame (e.g. 5 seconds), then we can infer that the agent had requested application/rdf+xml and can therefore be classified as a semantic agent.

90.21.243.141 - - [06/Oct/2008:16:07:58 +0100] "GET /organization/vrije-universiteit-amsterdam-the-netherlands HTTP/1.1" 303 7592 "-" "rdflib-2.4.0 (http://rdflib.net/; )"
90.21.243.141 - - [06/Oct/2008:16:08:02 +0100] "GET /organization/vrije-universiteit-amsterdam-the-netherlands/rdf HTTP/1.1" 200 45358 "-" "rdflib-2.4.0 (http://rdflib.net/; )"

The example above shows this: the “rdflib.net” agent requested the plain URI .../organization/vrije-universiteit-amsterdam-the-netherlands and was 303 redirected to .../organization/vrije-universiteit-amsterdam-the-netherlands/rdf a few seconds later. From this we can automatically infer that “rdflib.net” is a semantic agent.

A list of 423 semantic agents found in this way for the dog food site from 10/2008-10/2009 is here. Looking at the list, we can find a lot of agents that are clearly “semantic”, such as the “SindiceFetcher” or a SIOC browser. However, most of them are actually not what I would normally consider “semantic”, such as hordes of “Mozilla”-branded agents or dodgy looking bots. More research is awaiting…

Growth of the Web of Linked Data

Posted by Knud on September 4th, 2009

Bob DuCharme points out nicely how much the Web of Linked Data has grown in the past year by comparing to versions of Richard Cyganiak’s LOD cloud diagram. It looks pretty impressive when you compare the two versions side by side!

Linked Data for WWW2009 Online

Posted by Knud on April 16th, 2009

I don’t announce every new addition to the Semantic Web Dog Food Server, but this is a big one: based on the data available from EPrints, we managed to get information about papers and authors for the upcoming WWW2009 in Madrid up as linked data on the dog food server. You can get all the papers, authors and their affiliations, all nicely integrated with the rest of the dog food data from other conferences. You can start start browsing here or get a dump of the data. Enjoy!

“There is probably no Semantic Web …

Posted by Knud on March 16th, 2009

… now stop infering and get lodding!” A great little (a great little?!) photoshop tribute to the Atheist Bus Campaign in London and elsewhere (now also in Germany). I don’t know exactly where this picture appeared originally – a friend of a friend saw it on Twitter somewhere, and I don’t use Twitter. Anyway, I love it! I also love the fact that we now have a new verb. I wonder how it is inflected? It’s probably regular, so it should look like this:

to lod (verb): lod, lodded, lodding – the act of publishing linked open data on the World Wide Web, adhering to the rules of linked data.

There is probably no Semantic Web - LOD Bus

Tim Berners-Lee on Linked Data at TED

Posted by Knud on March 16th, 2009

Tim Berners-Lee1 gave an enthusiastic talk about linked data at TED, urging everybody to get their data out there or, if they don’t have any, to demand access to data in a proper format.

Interestingly, he didn’t mention the words “Semantic Web” once during the talk, nor did he ever say “RDF” or even “URI” – instead he spoke about “names starting with ‘http’”. Cool enough, his slides had the dog food data set in them! :)

A video of the talk and a link to the slides can be found on the ebiquity blog.

LOD Cloud with dogfood

1I wish this link would lead me to something nice when I go to it with a Web browser!