Friday, 24 January 2014

European word translator

I'm a big fan of the etymology maps on reddit. These gave me the idea of creating a map that would translate any English word (or two) to other European languages using Google Translate. The results are often far from perfect - the screenshot below shows that "a bug in a rug" becomes "an error in a rug" in Spanish. This may be partly because Google Translate has little context to work with when only one or two words are entered, and partly because it just isn't as smart as a human translator. Still, hopefully it's fun to play with!

http://ukdataexplorer.com/european-translator/


Treemaps of emissions and Reddit posts


A couple of my recent experiments have been treemap visualisations using D3. Treemaps are useful for showing part-to-whole relationships when there are many data points and the data points can be categorised (countries by continent, stocks by sector and sub-sector etc). The Map of the Market is a nice example.

The interactive treemap I created of emissions and population is shown below. I think it does a reasonable job of showing the data, but perhaps it could be improved by dividing the countries into categories such as continents. I also wonder if side-by-side bar charts might be a simpler better way to show the data.

My other treemap shows the top 200 reddit posts of all time. It's just a bit of fun, but I'm not sure if it's really a successful visualisation; to me, the design of a treemap implies the relationship of parts to a whole, but in this treemap "the whole" is arbitrarily chosen to be only the top 200 posts.

For more information and links on treemaps, see this article by their inventor, Ben Shneiderman.

Monday, 16 December 2013

Embeddable economic indicator charts

Update: Unfortunately these charts are not currently being updated.

I've re-created the Headline Economic Indicators page on UK Data Explorer using D3. Improvements include:
  • The charts can now show more than one data series each.
  • The number and date formats have been improved (for example, the dates for monthly unemployment rate data now correctly show three-month periods such as Feb-Apr 2013).
  • Individual charts can be embedded on any site or blog using an iframe; instructions are here, and an example is below.

Wednesday, 23 October 2013

A few visualisation links

  •  Andy Kirk's Visualising Data site has a useful section listing visualisation resources. The site also has an excellent blog, which includes a monthly selection of data visualisation links.
  • I've been learning ggplot2 recently; it is an excellent R library for creating static graphics. Both of the books listed on the ggplot2 home page are very good (although I haven't read either from cover to cover). Some of the information in the R Graphics Cookbook is available online at Winston Chang's Cookbook for R site.
  • If you haven't used R before, Paul Teetor's R Cookbook could be a good place to start. R in Action (Robert Kabacoff) and The Art of R Programming (Norman Matloff) also get good reviews, but I haven't read either of these two.
With coursework and wedding planning, I'm not likely to have time to write many posts over the next few months.

Wednesday, 9 October 2013

Experimental interactive Census maps of English wards

I've put maps for all of the English regions online here. They're at a very early stage, but hopefully there aren't too many bugs! I made use of Alex Singleton's Open Atlas Project code for downloading and reshaping the data.


Monday, 7 October 2013

Mapping 2011 Census Data for England and Wales

In addition to labour market statistics, Nomis includes detailed data tables for England and Wales from last four censuses. The site has a handy built-in mapping tool. As an example, here's how to create a map of home ownership in West Midlands. (The image is a screenshot using Nomis's medium size option. The maps quality is even better in extra large size.)
  1. Go to the Census data page (also linked to from the Nomis home page).
  2. Choose Key Statistics, then select Tenure from the list.
  3. In the left column, under Explore, click Advanced Query.
  4. In the geography section, choose select areas within, then select 2011 super output areas - mid-layer in the second drop-down list. From the options that appear, choose regions  and West Midlands.
  5. In the tenure section, select only Owned.
  6. In the percent section, tick percent.
  7. In the format/layout section, select Map.
  8. Click download data, then View map.
The Office for National Statistics has a page of Census visualisations. Alex Singleton at the University of Liverpool has created an atlas of Census maps for each local authority area using Nomis downloads and R.

ONS's Census data publishing strategy is here.

Monday, 30 September 2013

Renewable electricity sites mapped

Interactive maps of UK renewable electricity generating sites are published by the Department of Energy and Climate Change and RenewableUK. I've had a go at creating a slightly different style of interactive map with the DECC data, with circles sized in proportion to electricity generating capacity.

You can view one type of generation at a time by hovering over the legend. The north-south differences (for example in wind, hydro and solar generation) are striking but not completely surprising given the differences in climate and landscape.


The map uses D3 and Leaflet, and the code is based on this page by Mike Bostock.

Thursday, 26 September 2013

A big list of Office for National Statistics publications

I've put together a categorised list of over 100 regularly-published Office for National Statistics releases, covering topics including the economy, demography, and health. It's hopefully helpful as a quick overview of the breadth of information that's published.

Monday, 23 September 2013

Sankey Diagrams

Sankey Diagrams are useful for showing flows, such as energy flows or movements of people. My favourite example is the Energy Flow Chart, produced annually by the UK's Department of Energy and Climate Change.
Versions of the chart going back to 1974 are available from the National Archives. It's interesting to see the shift from coal to gas and the reduction in energy consumption by industry over this period. Note that different units are used in the 1974 and 2012 charts; 1 toe equals approximately 397 therms (DUKES 2013 page 229). 
DUKES Annex H includes detailed flow charts for individual fuels.

There's an entire blog on Sankey diagrams at sankey-diagrams.com, which is useful for inspiration and for advice on design. The site includes a list of software for creating the diagrams.

The D3 Sankey plugin is fairly easy to use if you know some JavaScript; see Mike Bostock's example. The 2012 UK flow chart (above) was created in Adobe InDesign.

Friday, 20 September 2013

A map of the Welsh Index of Multiple Deprivation

I published a map of deprivation in Wales yesterday, using data from the Welsh Index of Multiple Deprivation 2011.
In my previous posts on deprivation maps, I have focused on the map design themselves rather than the important information that the maps contain. This is simply because I feel more qualified to comment on the presentation rather than the data itself.

Alasdair Rae's site on the Scottish index has an explanation page which is useful for gaining a better understanding of deprivation maps in general. The Welsh Government's WIMD 2011 page includes a guidance document which explains how the deprivation indices are calculated.

The Joseph Rowntree Foundation yesterday published a report on poverty and social exclusion in Wales.

Wednesday, 18 September 2013

World Development Indicators

World Development Indicators is a database compiled by the World Bank, containing 1289 indicators (by my count) in the categories of world view, people, environment, economy, states and markets, and global links. The database covers the world, regions, and countries, contains annual data, and is updated in April, July, September and December each year. Edit: There are also some updates between these dates; see here for details.

Data visualisation tools are available for a selection of the indicators, including Google Public Data Explorer. There are also nice mobile apps for viewing the indicators.

There are a few options for downloading data:
  • The Databank is an online tool for selecting, viewing and downloading data.
  • The WDI package for R can be used to search for and download data directly to R. See the README page for a quick tutorial.
  • There is also a zip file containing all of the data and metadata in csv format (39 MB download). This could be useful if you plan to use the WDI often.

Tuesday, 17 September 2013

A map of the Scottish Index of Multiple Deprivation

The screenshot below shows my interactive map of SIMD 2012, which is based on the design I used previously for a map of London.

I am aware of three existing interactive maps that use this data set:
  • The Scottish Government's map has many features, including the ability to show changes over time. The map's downside is that it uses only a relatively small portion of the browser window.
  • Alasdair Rae's site was obviously a major source of inspiration for mine. The map's tooltips show change in rank over time, and the site also has some advice on interpreting the maps.
  • Oliver O'Brien's map uses an unusual and effective technique of shading only buildings.
I've also created a visualisation of the the SIMD ranks of data zones in each local authority area, based on the barcode charts in the Scottish Government's SIMD publication.

Monday, 16 September 2013

Eurostat data in R

Eurostat, the statistical office of the European Union, collects and publishes national and regional data for Europe. The site's features include Regional Statistics Illustrated, a data tool with interactive maps and charts and Statistics Explained articles.

It's possible to download data from the Eurostat database to carry out your own analysis and visualisation. There are two possible approaches to this:
  1. The browse/search page. Within this, the Database sections are customisable, while the Tables are ready-made. The advantages of downloading data by this route is that very little processing is required to get the data in the correct shape. The disadvantage is that you need to use a web browser to get data rather than downloading straight to software such as R. But the process can be made very quick by using bookmarks.
  2. From the bulk data downloads section, you can download full datasets for use in statistical software.
I'll begin by discussing option 1. When you click on a dataset from the browse/search page, a new window opens (see screenshot). In this window, you can choose the variables, countries etc that you are interested in.

To find out more about the dataset, click "Explanatory texts (metadata)".

It's possible to bookmark your data selection for future use.

To download the data for use in a stats package, click the Download button and choose to download in csv format.

This is an example script for plotting data downloaded from Eurostat. Here is the result:

For downloading from the bulk facility, this is useful blog post by Johannes Kutsam. Here is an example script I wrote using Johannes's function. The result is very similar to the chart above, although for some reason estimated values seem to be missing from the downloaded data.

Wednesday, 11 September 2013

Using the Nomis API with R for regional labour market data

Nomis, a site run by the University of Durham on behalf of the Office for National Statistics, is a fantastic source for labour market statistics, including local data.

The site's features include regional profiles and web-based tools for querying the Nomis database. In this post, I'll briefly discuss the Nomis API, which is useful for downloading the latest data to stats software, web apps etc. Spencer Hedger, who works on the Nomis team, has written several helpful blog posts on using the API. Also see the API reference pages which are linked to from your Nomis account page.
Chart created using R and Nomis API

This is a quick example of downloading data to R using an API link. The API link was generated using the process described in Spencer Hedger's blog post; see the link above. The resulting chart is shown to the right.

The API has several format options in addition to CSV, such as Google Visualisation JSON and KML.