If there’s one thing we’ve learned in the last few months it’s just how much data there is out there that’s not being used.
In fact by aggregating data across multiple online sources we were able to create the most comprehensive, interactive map of the Kenyan dairy industry. All for free and without ever leaving our front room.
This is just one example of how we can start investing in the data we’ve already got.
The Challenge
A Google search for “Kenya dairy industry” threw up multiple different reports written over a decade. Most of the reports centre around the value chain diagram which shows the flow of produce from the farm to the consumer.
Register now for MERL Tech DC to learn how to visualize your data
The reports estimate there are around 400 different organisations (co-ops, farmer groups, coolers, processors) working in the dairy industry in Kenya today. So rather than draw out another one of these value chain diagrams we wanted to go one stage further.
The challenge we set was to see if we could name and locate every single one of the 400 organisations in the dairy value chain just by mining the data that’s already out there online.
Finding the Data
When we started this challenge we had little idea as to where we could find this data. We started by pulling data from some of the reports we found online. However, most of these only listed the names of the biggest processors and dairies in the country.
The second place we turned to was Google Maps and we were amazed by the amount of data we could mine. Purely by searching for keywords such as “dairy” and “cooling centre” we were able to locate around 200 different co-operatives, farmer groups, dairies and processors.
Adding this together with the information we found in the reports we had a total of around 250 different organisations but we were still well short of our 400 target.
The last step was the most tricky and involved using some of the latest data mining tools to comb the web for data on Kenyan dairy organisations. We searched through things such as business directories and agriculture websites.
It took a little longer but by the time we finished we realised that in a couple of hours we’d found the name and GPS co-ordinates for every single co-op, cooler, dairy and processor in the country.
Our database had 350 dairy organisations in it, almost 90% of the estimated total in the country. The only ones we were missing were some of the smaller farmer groups.
Visualising the Data
Rather than creating more slides and reports we wanted to create something that would be interactive and usable for people looking for information on this sector.
We plotted the points on a map using a free online tool called CartoDB, created a Sankey diagram of the value chain using SankeyMATIC and then wrote some JavaScript to filter the results at each stage of the value chain.
Click on this link to see the interactive version
So What?
This experiment made clear that it’s time for a radical rethink in the way we create and use data in agriculture research. There are hundreds of expensive reports written every year and yet this data is almost impossible to find and use.
By contrast we were able to create the most comprehensive, interactive map of the dairy value chain in the space of just a few hours, using data that’s already out there online.
So to all those thinking about commissioning reports this year – let’s start investing in the data we already have and presenting it in dynamic, usable ways. Only this way can we really move this sector forward.
By Georgia Barrie of Farm.ink and originally published as Why we need to start investing in data we already have.
Sorry, the comment form is closed at this time.