Friday, 1 April 2011

Should we worry about metadata?

What is metadata? One way to think about it is as data about data. Metadata is to your data file as the contents, index and 'back of the title page' are to a book.

On the web there is metadata provision in every web page, in the form of the meta tags in the header of the html file. Here we can put key words, a description of the page, and a variety of other pieces of information that, to be honest, we often don't include. It is also possible to include metadata in image files, including JPEGs. When you take a photo with a digital camera, the camera will automatically include metadata about the settings when the shot was taken, and in some cameras you can set that to include the photographer's name and copyright information.

The use of metadata in online images is becoming increasingly contentious. One of the problems being that in some cases any metadata, including the data that might identify the copyright in the image, is being removed, by accident or by design. Many in the photographic community are worried that widespread posting of images on websites, especially social media sites, risks creating so-called orphan works; where the author is either not known or cannot be traced. This is because sometimes the internet seems to be viewed as a copyright-free zone. I particularly like the copyright notice on the middle photograph.

So, what should we be doing if we publish images on our web sites?
  1. Make sure published metadata includes attribution data (ie who took the photo) and the URL of your web site
  2. If you allow user uploading of images be sure to include any metadata in the original file
  3. If you resize images make sure any metadata is carried over
  4. Don't use images without permission unless you know they are out of copyright
There's a great online tool which can show you metadata for a bewildering range of file types, including the common image formats. It was written by Jeffrey Friedl in 2006 and you can find out about it on his blog and try it out for yourself.

[PS: Jeffrey Friedl wrote the O'Reilly book Mastering Regular Expressions. An essential programmer's read.]