Birmingham Swimming Pools on ScraperWiki

Posted by Knud on October 30th, 2011

After my first few weeks in Birmingham, I’m still looking for a decent public swimming pool. There is a webpage from Birmingham City Council which lists all the local pools, but it’s a little hard to figure out where each of them is, when you don’t know your way around. So yesterday I sat down and wrote a little scraper to collect this data, and then plot it on a GoogleMap. The city council page only has the street addresses, so in order to locate each pool on the map, the scraper uses Google’s geocoding API to derive latitude and longitude.

The map below is a special version of the view, made to fit in a small iFrame. The larger view is here.

… hello Talis!

Posted by Knud on October 16th, 2011

After saying goodbye to DERI in my previous post, now it’s time to say a few words about my new job: two weeks ago on the 3rd of October I joined the consulting team at Talis. As a company, Talis has pretty amazing track record of being a market leader in Semantic Web and linked data technology (e.g, through their linked data platform, their Aspire e-learning system and recently through Kasabi). Beyond these products, the consulting team helps clients (see the case studies for a list of previous projects) to learn about the possibilities of the technology, and eventually design and develop individual linked data solutions.

I’m pretty excited about this move – after years in academia, I will finally be able to apply my know-how to real-world problems and use cases of actual, paying customers! And as I said in my previous post: now it is the perfect time to do this, as semantic technologies are more and more moving into the mainstream.

For the coming six months I’ll be based in Birmingham (coincidentally the home of metal!), but will eventually move to Berlin and work from there. Germany is still a little behind in the whole open data movement, but things are happening there as well. See you soon!

Bye Bye DERI…

Posted by Knud on October 9th, 2011

'So Irish...' by Dunkoman on Flickr

It feels strange, but last Friday, after a good 7 1/2 years (that’s 2829 days!), I finally had my last day at DERI, the Digital Enterprise Resarch Institute at the National University of Ireland in Galway.

Coming from a background as a linguist and knowing very little about the Semantic Web, I started as a fresh PhD student in January 2004, when DERI was still only a handful of researchers. Very few people had ever heard about this “Semantic Web” (let alone “linked data” – that label was only coined a few years later), and those who did mostly considered it to be a rather far-fetched, purely academic exercise. I experienced the somewhat crazy early years at DERI (read about it in the paper…), saw the institute grow, change management, change location and eventually turn into the largest (currently 137 members) and probably most successful SW research institute world-wide. I’m pretty sure that for many if not most SW or linked data-related projects and activities you will come across today, there will be someone involved who either did or does work at DERI. Or someone who will work at DERI in the future – during the almost 8 years I have spent there, many outstanding personalities I met in the community eventually joined our little institute.

I experienced DERI as a fantastic place to work: I learned an immense amount of things (skills and experiences that definitely helped me find my new job), made good friends from all over the world (some of the for life, I’m sure), had the opportunity to work and engage with some of the most interesting and influential people in the community (both at DERI and in collaboration with outside partners) and even managed to finish a PhD along the way. Of course, part of the DERI experience is the (mostly) beautiful city of Galway, where the institute is located – but that’s a whole different story. I feel privileged having been very close to the centre of a development which saw the idea of a meaningful, machine-interpretable, “smarter” Web evolve from something that was either ignored or laughed at, into something that is now (in one form or the other) on the agenda of virtually all the big players who define what the Web is today – to pick a few arbitrary examples, just look at schema.org (Google, Yahoo!, Microsoft), Opengraph (facebook) or the adoption of linked data by the BBC.

So, now that my time at DERI is over, I’d like to say “thank you” once more to everyone I have met there, worked with, laughed with, argued with, drank Guinness, whiskey and wine with (or coffee and tea), or walked through the rain with – go raibh míle maith agat! We’ll meet again!

NHS Jargon RDF

Posted by Knud on May 16th, 2011

Did you feel it? The LOD Cloud just grew again by a tiny fraction: http://kantenwerk.org/metadata/nhs_jargon.rdf. While playing around with triplifying some NHS data, I started making notes about the various acronyms used in there. I noted a link to the brilliantly named Jargon Buster and thought “Why don’t I triplify that?”. It would provide me with a good resource to link the actual data to.

This little project was an opportunity to try out the very cool ScraperWiki. One thing led to another, after a short while I arrived at this scraper, and eventually the final Jargon RDF was done. Now on to the actual task at hand…

NHS Jargon Buster Scraper

The Web of Data Grows and Grows…

Posted by Knud on September 21st, 2010

Back in September 2009, Bob DuCharme highlighted the growth of the Web of Linked Data by comparing versions of Richard Cyganiak’s LOD cloud diagramme. Now I’m sitting in Chris Bizer’s keynote at FIS2010 and just got to see the latest version of this diagramme. The amount of growth looks amazing; just by looking at it you get the impression that things are really happening now. 24.7 billion triples, 436 million links. Also, what I like about the diagramme is how it uses colour to show the different domains the various datasets belong to.

LOD Cloud, September 2010

The new version of the LOD cloud will be published later today or tomorrow, but you get a sneak peak here first! ;)

Close, but a Cigar Nevertheless

Posted by Knud on May 4th, 2010

I just came back from this year’s Web Science Confernce in Raleigh, NC. The idea of the conference – as of Web Science in general – is to give a holistic, multi-disciplinary view on the Web, and while I’m still not sure if and exactly how this will work like in the end (there was a heated discussion between social and computer scientists in the closing panel), I found the event very interesting and a lot of fun. Of course, the best surprise came right at the end, when our paper on Linked Data Usage (I had reported on early stages of this quite a while ago on this blog) was shortlisted as one of three papers for the best paper award! In the end we didn’t win (the prize went to the paper by Metaxas and Mustafaraj: From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search), but just to get the nomination was pretty awesome. I really didn’t expect this, considering that this paper had been in the pipeline for more that a year now, but never quite made it for any submission deadline, and was therefore delayed time and time again. This is great encouragement for continuing our work in this area!

Semantic Web Fridge Poetry

Posted by Knud on December 3rd, 2009

Someone in DERI brought back a set of Semantic Web fridge poetry magnets from a workshop! A joyous occasion for all SemWeb nerds, and there are plenty of those in DERI.

Fridge Poetry for Semantic Web Nerds

Semantic User Agents

Posted by Knud on October 8th, 2009

I’m still very much interested in the topic of analysing usage of linked data sites. To that end, an interesting question to ask is what kinds of agents access a linked data site. And here, apart from the usual categorisation into bots, browsers and such, it makes sense to differentiate between semantic and non-semantic agents. Very loosely, we could say that

Semantic agents are agents which are aware of RDF data and actively request it.

To know whether or not an agent requests RDF, we could look at the header of an individual HTTP request and check if the agent had specified Accept: application/rdf+xml. However, the Apache server log files unfortunately don’t tell us anything about the request header. Luckily though, there is an indirect way of finding out about this. If our linked data site uses best practice content negotiation and 303 redirects, we can look at pairs of requests in the log files. E.g., the Semantic Web Dog Food site uses a particular URI pattern for resources and their HTML and RDF representations. E.g.:


http://data.semanticweb.org/organization/deri-nui-galway

http://data.semanticweb.org/organization/deri-nui-galway/html

http://data.semanticweb.org/organization/deri-nui-galway/rdf

If the plain URI is requested, the server will either redirect to the HTML or the RDF representation, based on what was specified by the agent. Therefore, if we find a request for a plain URI and a request for the corresponding RDF URI, from the same IP address and the same agent, within a short time frame (e.g. 5 seconds), then we can infer that the agent had requested application/rdf+xml and can therefore be classified as a semantic agent.

90.21.243.141 - - [06/Oct/2008:16:07:58 +0100] "GET /organization/vrije-universiteit-amsterdam-the-netherlands HTTP/1.1" 303 7592 "-" "rdflib-2.4.0 (http://rdflib.net/; eikeon@eikeon.com)"
90.21.243.141 - - [06/Oct/2008:16:08:02 +0100] "GET /organization/vrije-universiteit-amsterdam-the-netherlands/rdf HTTP/1.1" 200 45358 "-" "rdflib-2.4.0 (http://rdflib.net/; eikeon@eikeon.com)"

The example above shows this: the “rdflib.net” agent requested the plain URI .../organization/vrije-universiteit-amsterdam-the-netherlands and was 303 redirected to .../organization/vrije-universiteit-amsterdam-the-netherlands/rdf a few seconds later. From this we can automatically infer that “rdflib.net” is a semantic agent.

A list of 423 semantic agents found in this way for the dog food site from 10/2008-10/2009 is here. Looking at the list, we can find a lot of agents that are clearly “semantic”, such as the “SindiceFetcher” or a SIOC browser. However, most of them are actually not what I would normally consider “semantic”, such as hordes of “Mozilla”-branded agents or dodgy looking bots. More research is awaiting…

Growth of the Web of Linked Data

Posted by Knud on September 4th, 2009

Bob DuCharme points out nicely how much the Web of Linked Data has grown in the past year by comparing to versions of Richard Cyganiak’s LOD cloud diagram. It looks pretty impressive when you compare the two versions side by side!

The Extended Semantic Web Conference

Posted by Knud on June 9th, 2009

Apparently, the European Semantic Web Conference will be renamed to Extended Semantic Web Conference. That is fantastic news, the original name was so boring. However, renaming to extended seems a lost opportunity to me: the organisers of all major Semantic Web conferences should come together and adopt far more exciting names. Some suggestions came up:

  • International Semantic Web Conference” to “Incredible Semantic Web Conference”
  • Asian Semantic Web Conference” to “Amazing Semantic Web Conference”
  • European Semantic Web Conference” to “Extraordinary Semantic Web Conference”