Ordnance Survey’s GB Postcode to Longitude / Latitude Dataset

Apart from the crippling UK debt, another legacy Gordon Brown’s government has left the British people is the rather wonderful data.gov.uk – credit where credit is due, the PM realises that the UK – the country that had the first industrial revolution, created the Royal Society, discovered & refined penicillin, created RADAR and the first modern computer, has of late been lacking on the innovation front; so he decided to have a chat with Sir Tim Berners-Lee and asked him how the Government could utilise the Web.

Berners-Lee has for the last few years been working on his vision of the future direction of the web – the Semantic Web – or more rather Semantic Data – so of course the first thing he asked the PM to do was liberate all the silos of data the Government has to allow the tax payers to have access and start to innovate. One of the biggest surprises was that the PM announced that he would be opening up the Ordnance Survey’s data sets (which generates several million pounds in revenue for the Government).

Ordnance Survey have the richest GIS data of the UK – the detail fair exceeds anything like Google or Bing Maps. While I’m sure the announcement sent shock waves through companies like Experian and many other who have built their businesses around supplying UK GIS data especially the converted Postcode / address information – the mother-load – has yet to be released – although in the first wave the Code-Point Open dataset has been published. Code-Point Open contains over 1.6 million GB postcodes (Isle of Man, The Channel Islands and Northern Ireland are not yet included) and their Longitude & Latitude. The data is available to download in CSV format.

This is a wonderful move and will benefit any UK developer who is interested in building in spatial services to their applications – but there is a massive difference between raw data and usable data. The Code-Point Open dataset comes in 120 separate CSV files, the data hasn’t been cleansed so some postcodes appear with ” and others don’t (which would affect searching) and there are many fields which wouldn’t be strictly relevant for developers doing spatial work.

So I set about the task of importing the data SSIS to deal with the bulk load – using some wonderful code supplied by SSIS MVP (and ex-colleague) Jamie Thompson. Once the data was in SQL Server it’s was much easier to convert it into various other formats:

  • SQL Server 2005* – with separate longitude / latitude columns.
  • SQL Server 2008* – utilising the Geography type for longitude / latitude.
  • MongoDB – in a format that works with MongoDB’s geospatial functions.

These datasets, the SSIS package and the console app for importing into MongoDB are available on my GitHub Repository.

Work smart, not hard.

* These are large SQL Files – I would advice using the SQL Command Line to import the data – as the files are large enough to crash Management Studio, should you try to use it to open the files.


About this entry