Where should an aspiring data journalist start? | Online Journalism Blog

In writing last week???s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I???d publish them in full here.

The Telegraph???s Conrad Quilty-Harper:

Start reading:


Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts.

Find out where hidden data is and try and get hold of it: private companies looking for publicity, under appreciated research departments, public bodies that release data but not in a granular form (e.g. Met Office).

Test out cleaning/visualisation tools:

You want to be able to collect data, clean it, visualise it and map it.

Obviously you need to know basic Excel skills (pivot tables are how journalists efficiently get headline numbers from big spreadsheets).

For publishing just use Google Spreadsheets graphs, or ManyEyes or Timetric. Google MyMaps coupled with http://batchgeo.com is a great beginner mapping combo.

Further on from that you want to try out Google Spreadsheets importURL service, Yahoo Pipes for cleaning data, Freebase Gridworks and Dabble DB.

More advanced stuff you want to figure out query language and be able to work with relational databases, Google BigQuery, Google Visualisation API (http://code.google.com/apis/charttools/), Google code playgrounds (http://code.google.com/apis/ajax/playground/?type=visualization#org_chart) and other Javascript tools. The advanced mapping equivalents are ArcGIS or GeoConcept, allowing you to query geographical data and find stories.

You could also learn some Ruby for building your own scrapers, or Python for ScraperWiki.

Get inspired:

Get the data behind some big data stories you admire, try and find a story, visualise it and blog about it. You???ll find that the whole process starts with the data, and your interpretation of it. That needs to be newsworthy/valuable.

Look to the past!

Edward Tufte???s work is very inspiring: http://www.edwardtufte.com/tufte/ His favourite data visualisation is from 1869! Or what about John Snow???s Cholera map? http://www.york.ac.uk/depts/maths/histstat/snow_map.htm

And for good luck here???s an assorted list of visualisation tutorials.

The Times??? Jonathan Richards

I???d say a couple of blogs.

Others that spring to mind are:

If people want more specific advice, tell them to come to the next London Hack/Hackers and track me down!

The Guardian???s Charles Arthur:

Obvious thing: find a story that will be best told through numbers. (I???m thinking about quizzing my local council about the effects of stopping free swimming for children. Obvious way forward: get numbers for number of children swimming before, during and after free swimming offer.)

If someone already has the skills for data journalism (which I???d put at (1) understanding statistics and relevance (2) understanding how to manipulate data (3) understanding how to make the data visual) the key, I???d say, is always being able to spot a story that can be told through data ??? and only makes sense that way, and where being able to manipulate the data is key to extracting the story. It???s like interviewing the data. Good interviewers know how to get what they want out from the conversation. Ditto good data journalists and their data.

The New York Times??? Aron Pilhofer:

I would start small, and start with something you already know and already do. And always, always, always remember that the goal here is journalism. There is a tendency to focus too much on the skills for the sake of skills, and not enough on how those skills help enable you to do better journalism. Be pragmatic about it, and resist the tendency to think you need to know everything about the techy stuff before you do anything ??? nothing could be further from the truth.

Less abstractly, I would start out learning some basic computer-assisted reporting skills and then moving from there as your interests/needs dictate. A lot of people see the programmer/journalism thing as distinct from computer-assisted reporting, but I don???t. I see it as a continuum. I see CAR as a ???gateway drug??? of sorts: Once you start working with small data sets using tools like Excel, Access, MySQL, etc., you???ll eventually hit limits of what you can do with macros and SQL.

Soon enough, you???ll want to be able to script certain things. You???ll want to get data from the web. You???ll want to do things you can only do using some kind of scripting language, and so it begins.

But again, the place to start isn???t thinking about all these technologies. The place to start is thinking about how these technologies can enable you to tell stories you otherwise would never be able to tell otherwise. And you should start small. Look for little things to start, and go from there.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s