Big data is just two words and yet these two words are driving hysteria throughout the known world at a level not seen since Y2K. We supposedly create 2.5 quintillion bytes of data every day now and at this rate we can conclude that 90% of the data in the world today has been created in the last two years alone. This does sound apocalyptic as it suggests that we could drown in data far before global warming floods our coastal cities into oblivion. Are we being rational? I think not and this calls for an explanation.
For the typical CTO, the term “big data” implies the need for a strategy to deal with large quantities of data. The term is also used to describe the new platform of tools on offer to successfully tame the beasts of big data, such as Apache Hadoop and, well let’s face it, just about every product that wants to be taken seriously by the market at large. To be fair, big data as a loosely defined term describing data sets so large and complex that they become awkward to work with using current technologies is a valid concern. Problems such as capture, storage, search, sharing, analysis, and visualization become insurmountable. What I find disturbing is the lack of dialogue around the processes that are generating this amount of data and any efforts to apply any standards of value to determine what data is valuable and what can be let go of.
According to a June 2011 Economist Intelligence Unit survey of 586 executives, one percent of respondents reported no increase in the amount of data they collected throughout the previous year. The typical assumption that I am confronted with is that the 99% who do have data growth need to be enabled through technology. Perhaps we should look at the 1% who don’t have data growth and see if we can learn something from them. Just maybe the 99%, with their posters, placards and shanty towns are missing something.
Crops vs. Weeds
As a country boy, I grew up around a lot of people who concerned themselves with weeds among their crops. The weeds would crowd out the crops, consume valuable water and in general reduce the productivity of any farm so it was no surprise that farmers spent a large amount of effort to remove weeds from their fields. I remember that they did not spend their time listening to suppliers talk to them about how they could cultivate their weeds and otherwise promote weeds as a “hidden resource” deserving the latest planting, harvesting and storage technology. Just think of all the agricultural technology opportunities that were missed by not exploring how to virtualize their tractors and thin provision their storage silos. Wow, what simpletons! They just deleted the weeds and enjoyed the increased profit from greater production of the cash crops. The operative term here is “cash crop”. They all had one.
Alas, I grew up, went to college, flew jets in the Navy and …. well it’s a long story, but here I am looking at the corporate farms from an enterprise data management perspective and somewhere along the way we lost the ability to tell the weeds from the crops. How did it happen? Maybe we ran out of crops and weeds are all there is. Maybe we are making lemonade from lemons or maybe we just got carried away. Two and a half quintillion bytes of data is a lot of corn. Back in the “old days”, this would have indicated that someone was making a lot of tortillas. This is where I have to hoist the proverbial BS flag. There aren’t that many tortillas on the counters of our economy so I can only conclude one thing: we like eating weeds. Wow! I hadn’t realized how effective Madison Avenue was and lest you think I’m kidding, I can only say that we must like weeds. Why else would we spend so much time cultivating them? Imagine the farmers of my boyhood harvesting the corn and the weeds and storing them in the silos just in case. The result would have been massive lose of corn through mildew and other forms of cash crop degradation. Remember, they had a cash crop! What they did was buy technology that helped them harvest the corn separate the weeds and even the corn stalks, along with the husks and stripped the kernels from the cobs so that only corn kernels were stored in the silos. These were high value corn kernels that could be stored for long periods of time and used for various products (including tortillas).
All humor aside, if your business model involves large data sets, then you need to have ways to deal with them just like large farms needed mechanization to help harvest large quantities of corn, or cotton or wheat. The difference is that the farmers first considered what crop they were growing before they looked at equipment. Today, we just lump all our data problems into a great big vat of “BIG DATA” and every vendor throws their logo on it and says “me too, me too”.
Suck it up folks! If you have large data sets then let’s talk. There are lots of ways to deal with large data sets but “BIG DATA?” Leave that to the folks who make kids’ board games for entertainment. Real men and women look over their farms, the market and costs and decide on a productive crop and equip themselves appropriately. When you do that you just might find that your data is not as big as you think it is. Otherwise the cost of your data is going to exceed the revenue of your product.
Share this article!