Saturday, 21 November 2009

Free the data!

We have grown use to the idea that information is freely available on the internet and there are two news items this week that demolish or reinforce that presumption.

Rupert Murdoch and News Corporation have shown concern for some time that newspapers should charge for access to their stories. Added to this is a stance that news aggregators should also pay to reproduce the headlines from stories on other (such as News Corp's) web sites.

Tom Forenski of the Silicon Valley Watcher has an interesting analysis of how this might play out and why the 'simple' technical tactic of blocking aggregators with a robots exclusion file has been ignored in favour of attacking through the media.

I thought the thorny issue of whether taking headlines for links back to the orginal story infringed copyright had been settled years ago; it's certainly a common approach and is actively encouraged by RSS news feeds.

Murdoch Snr seems to be accusing organisations like, especially, the BBC, of getting news from the newspapers.

He is quoted in the Telegraph as saying "...most of their stuff is stolen from the newspapers now, and we’ll be suing them for copyright."

Oddly enough I see some resonance here, and not just with ITV's Michael Grade once saying that web sites that took broadcasters' content were 'parasites'. Broadcasters do scan the papers for stories and newspapers will regularly promote 'exclusive' stories into electronic media to build up interest in tomorrow's edition. TV coverage of the storm over Gordon Brown's sympathy letter was usually accompanied by extracts from an interview with the aggrieved mother which were brightly branded with the Sun logo. So the Murdochs (père et fils) must have a deeper concern, albeit one that I personally see brightly tagged 'self-interest' whenever the BBC is mentioned.

I doubt if many people think that traditional static-media-based notions of copyright can survive eternally in a connected world. I have shared the view of many that, as a classic example, Ordnance Survey's mapping data should be freely available. This includes the ability to map things to your location, usually by postcode. This isn't as simple as it might at first appear because OS get data from the Royal Mail to which they then map grid references. These dual rights are one reason it has been convoluted and expensive to access such data.

The OpenSpace project has opened OS data up a lot but this still has limitations to control the number of times you access their system, which has restricted large-scale use. Now commercial operations are starting using the system, which uses an application program interface (API) to access the OS data. To top this, the UK Government has announced that OS mapping will be freely available online from 2010  although I have not as yet seen the fine print.

The scheme's advisors are Sir Tim Berners-Lee (who of course always comes with his own tag of inventor of the world wide web) and Professor Nigel Shadbolt. Professor Shadbolt has reinforced the use of this data for what he calls hyper-locality; finding out what relates to your own street or postcode. He is also quoted on the BBC Web Site as believing that "... OS maps are more comprehensive in their coverage than other open source competitors which are already freely available online". This presumably doesn't refer to Google Maps (although rights issues do seem to limit how you can interact with them for hyper-locality) but is certainly true of the open map sites that rely on users inputting information. Cartography and GIS are a lot more complicated than most of us think.

It'll be interesting to see how this pans out.