Brussels / 31 January & 1 February 2015


Automating Attribution

Giving credit where credit is due

From blame logs to authors signing their names on their paintings; everyone appreciates and deserves credit for what they do. This has been codified in open licenses such as those from Creative Commons which put attribution as one of the requirements for re-use. Over the last two years, a number of libraries and tools have been developed to make attribution easier and seamless, and this talk demonstrates how they all fit together, how you can use them in your programs, and how a browser plugin can help you both find and attribute openly licensed works.

Commons Machinery is an initiative supported by the Shuttleworth Foundation to give some much needed attention to metadata support for creative works. First and foremost, the project has worked with images and metadata standards for images. The software developed include a catalog of openly licensed creative works, a browser plugin interacting with that catalog, and several libraries to support the managing of metadata for creative works. In total, this talk will skim across the following technologies and tools:

  • The Catalog - a free software backend for metadata storage, currently in production and seeded with metadata information about all openly licensed works from Wikimedia Commons, as well as other collections. The catalog API supports querying for images by URL or by a perceptual hash that will help you find images even if they've been resized.
  • Blockhash - a JavaScript, Python and C library and utility to calculate basic perceptual hashes of images, with particular emphasis on verbatim reuse scenarios. This means that images that have been rescaled (sometimes heavily) or changed format, should result in identical or near-identical hashes, whereas images that have been modified to create derivative works should result in more different hashes.
  • - a Firefox and Chromium plugin that can query the Catalog for information about images to determine if they're openly licensed or not, and show you which images on a web page are openly licensed, as well as give you source links and a way to automatically attribute those images when you re-use them.
  • hmsearch - A C++ implementation (with Node and Python interface) of hamming distance algorithm HmSearch using Kyoto Cabinet, used in the Catalog to search for hashes which are nearly identical.
  • libgetmetadata - a library to retrieve metadata about images based on their URL
  • libcredit - formats appropriate credit statements from an RDF graph with metadata


Photo of Jonas Öberg Jonas Öberg