They say that hindsight is always 20:20. It's true. I'd been given 5 minutes in front of some of the most influential people and Europe to make my case, and 30 minutes after my nerve wrecking speech, I sat down and thought about everything I couldn't cram into the 5 minutes I had been given.
It's been a while since I shared the press release from the European Commission which highlighted the talk I gave and the work my team did as part of the EIT Young Leaders program. I owe you all an explanation of what I was working on, why I feel so passionate about it, what I said during the speech and what I wished I had the time to cover.
It was nerve wrecking as I stood up to join the other speakers on the stage, Kenneth Cukier, Economist editor and author of the big-data-book.com, Alfred Spector, Vice-President of Research and Special Initiatives at Google, and finally Gavin Starks, CEO of the Open Data Institute. I was nervous, these are big names and the audience was just as impressive, consisting of CEOs, CTOs, VPs, representatives from industry, EU Policy makers and European Commissioners.
I had been given 5 minutes to talk about how Big Data can be used as a catalyst for social change. Months of work, research and internal team battles turned into just 5 minutes. It is an eternity when you’re standing in front of an audience, but in reality 5 Minutes is just less than three paragraphs of text. That's not much to covey everything the 6 of us, on our team wanted to say.
The Unrealised Opportunity
When you speak to advocates of big data they all preach about how the computing power of today is cheap, digital storage is cheap and we've lots and lots of information about everything we all do every day. All we need to do, they say, is to study the data to see the great advances we can make. The problem is, when you study Big Data, to see lots of advocates, but not as many success stories. Why? What's wrong with the Big Data industry? Why aren't these fantastic opportunities being realised and what can we do about it? - This was the question which led the team and I on a fantastic journey.
Investigating the Obstacles
Let’s depict a vision of the future; a vision we want to create. Picture a future beach, with this little boy. A little boy who has less chance of suffering from Cancer than anyone today. A little boy who has less chance of suffering from Diabetes than anyone today.
In the future we've taken 25 years of shopping habits for over 1 million people and combined it with health records to reveal lifestyle and dietary choices which increase the risks of diabetes and cancer. We've shared this information with this little boy and his parents and allowed them to change their lifestyle to avoid many of these risks.
This little boy also gets more time with his parents than anyone today. Because his parents spend less time in traffic than we do. They have a smartphone application which provides them with a forecast of road traffic before they leave the house.
Now we have the vision, my team tried to create it. Let’s start with the traffic prediction application, how could we build that?
Building the Future
To build this application we need access to energy consumption data. The application combines electricity consumption data with traffic. As we all wake in the morning we turn on TVs, Radios, Kettles and showers. This causes a spike in the amount of energy that we use. This spike occurs between 1 hour and 30 minutes before we leave our house. This can then be used as an indication of when we will leave home and therefore what future traffic will be like. This electricity data exists, we know this - because the energy companies capture it to bill us, and the energy distributor’s capture it to ensure they supply the right amount of power to the right places at the right time. But it's not freely available - in fact it is locked away.
There are initiatives which try to unlock this data. Schemes like Green Button in the US or MIDATA in the UK which allow individual users to download their data from energy suppliers. These initiatives are focused on providing the data back to the individual. Using these initiatives to power our application we would need to contact each house hold individually and ask for their energy consumption data. For a city like London this could mean asking 3 million households for their data.
As an entrepreneur I've a lot of ideas and very little cash, and asking 3 million people is expensive, and risky. There is a good chance that not everyone will respond - I won't have all the data I need, and contacting 3,000,000 households is very expensive.
What if we could just ask the companies which already own this data? - The energy supply companies and the energy generators. It turns out that there are several issues making them reluctant to release this data:
Competitive advantage; what if we shared their data with a competitor. They would know who their biggest customer is?
Privacy issues; the energy company customers have provided the data only for billing and not for anything else, data should only be used for the purposes disclosed when it was originally captured.
Currently Open Data Initiatives focus on releasing data for free in a hope that this will spur innovation, and kick start ecosystems. But we haven't seen a huge uptake in this area. There are a number of factors preventing this:
Data reliability: The data is supplied for free. Usually this is a best effort delivery. That sounds ok, until you realise that in order for this data to be used commercially it is important to have confidence in it. Essentially the open data community is asking developers to take a leap of faith, to trust their livelihoods, homes, and families will all be safe and secure based on the income generated by a best effort data release. That is a big ask.
Timely data: Open Data Initiatives often partner with data producers and manually scrub data of information which may contain individual data, or other commercial sensitive information. This process takes time and as a result a number of the open data Initiatives provide "canned data" from historical data sets. This limits the applications to which this data can be used. We couldn't create a real-time traffic prediction application on this type of historical data alone.
What if we were to invert the question?
Privacy is also a concern; we don't want nor need to pry into an individual’s details, in fact for our app, seeing an individual’s energy consumption data is next to useless. It is like looking at a grain of sand when what we want to see is the beach.
Big Data is by definition big, getting a copy of this data is expensive and slow, and we don't need it. While computing is getting cheaper, and cloud computing is even more efficient, it is still not free and big data requires lots of it.
What if, rather than providing a copy of the data could simply get an opportunity to do some statistical analysis of the data. The data wouldn’t move it would stay within the owning organisation. We could create a software infrastructure which ensured that access to the data was safe and privacy compliant. It would reduce the cost of access for a small start-up, and address the privacy issues and mitigate against the risk associated in disclosing data seen as a competitive advantage.
The data would be safe, but the statistical value of the data would be set free. We would liberate the value of the data.
Free as in Beer (which you have to pay for)
What I'm going to propose sounds, at first a little nuts. I think free data isn't the right thing to base an ecosystem on. In fact I think it discourages the ecosystem from taking root. As mentioned above there are a number of problems with free data which prevent developers from taking dependencies on it. One of the best ways to address is this to pay for the data. Once money swaps hands, SLA (Service Level Agreements) can be put in place and if there is an error in the data, or it is provided late - then there is a rout of recourse and the developer can chase the data provider for recompense. This shared risk and charging model allows the ecosystem to grow. It encourages new data providers to enter the market and allows developers to more confidently base their family’s future on the data they provide.
This charging model also has the opportunity to disrupt existing market places as it provides additional revenue and a new way for businesses in existing ecosystems to generate revenue. We considered the energy market place in Europe. Drawing the value chain for this market place we see the following:
Power generation companies which sell their energy on via power transmission companies to a Power Distribution Operator (DSO) the DSO in turn supplies the end user via the retail companies we all subscribe to.
For the purposes of our traffic prediction application it is the DSO which possesses all of the real time information we need. The DSO needs to load balance its network to ensure that the right customers get the right amount of power at the right time. To do this the DSO has real-time, live information about power consumption. However the DSO never gets to communicate this information to the end user.
Getting the DSO to release this information, even via the statistical analysis method described above would not be trivial. There are costs associated with it. To deal with this our team proposed the creation of a data broker. The broker could amortise the costs of the technology across a number of data sources. It would provide the marketplace for data services from a range of different industries and it would provide the data providers with additional revenue.
Adding this additional revenue stream into the originally presented value chain we get this: A situation where the DSO is generating income from two different sources. Of course similar additional revenue streams can be obtained by all of the energy companies in the original value chain. This could change the relationships between each of the companies and disrupt the current status quo.
As the broker expands into new industries it will create a market place of data producers which compete for data consumers. This competition should help ensure a low enough price point for the data. Keeping a low price point is important in order to encourage entrepreneurs to really become involved in the market place.
The key to obtaining this future – the future on the beach, is about allowing companies or individuals to come up with ideas, test them, fail quickly, or succeed in a big way. In order to do this we need to create a working ecosystem in which they can experiment. This will provide society with insights and benefits beyond what I or anyone else can outline, and will really make the vision of the boy on the beach a possibility.
The real beauty of Twitter is the ability to discover cool people, with shared interests and likes. It's like finding that cool group of people in a crowded party. Meeting someone new and really getting to know an area they are interested in and sharing their passion.
Now a music application which lets you do that, that would be cool. It is what Twitter's new music application could have been.
Instead we've an application which provides a bland pastiche version of a generic radio station - one that plays the over hyped, almost factory farmed tunes. - Its like being back at the party but not being able to talk to anyone because of the booming music played by the host.
Now if only there was a cool way to share great new music with those cool guys in the corner ... maybe sound cloud has the answer?