McKinsey & Company released a report last year titled BigData: TheNextFrontierforInnovation, Competition, andProductivity. One of the conclusions in this often-quoted analysis is that Government has the potential to see a larger benefit from the use of Big Data than any other economic sector except Finance and Insurance.
In part 1a of this blog, I’ll address, at a very high level, what makes Big Data different from the data of ten years ago and why Big Data solutions unlock new approaches to turning data into information and information into knowledge that can be acted upon.
In part 1b, I’ll offer practical examples of how government can leverage the Big Data techniques being aggressively exploited in the commercial world.
In Part Two, we’ll take a look at the technology stack for Big Data, frequently referred to as the SMAQ stack: Storage, MapReduce and Query. I will also answer the question that you’ve wanted to ask: Why is Pig Latin the key to the SMAQ stack?
In Part Three, we’ll cover the array of Big Data products and services that DLT Solutions offers in partnership with Oracle, RedHat, NetApp, QuestSoftware, Google, Informatica, Quantum, SoleraNetworks, and Amazon. Big Data GPS products from our vendor partner TomTom will be covered in Part 1.
What is Big Data and how is it different from my data?
Although there is no ‘industry standard’ definition for Big Data, it is generally referred to as data that has one or more of the following characteristics that contrast with the data you typically use:
Volume – multiple terabytes or petabytes.
Velocity – streaming input from ‘always on’ sensors, continuous video feeds, and social media sources like Twitter or Facebook.
Variety – multiple file types, unstructured text in a wide range of formats from a range of sources, structured data from databases and spreadsheets mixed with unstructured data.
With the exception of government agencies involved in intelligence collection or major science projects, few have had to solve Big Data problems. However, Google and Yahoo have approached the problem from non-traditional perspectives and developed parallel processing techniques that can be employed on computing grids locally or in the Cloud. Open source software for these techniques have been formalized and brought to market by companies such as Cloudera and Hortonworks . Additionally, in the past 2-3 years, many major software brands have specialized in addressing one or more aspects of Big Data solutions. Selection of components for your Big Data solution, however, will include several issues that you would not normally address in selecting a specific Relational Database Management System RDBMS. More about that in part two.
What is the value of Big Data?
Big Data solutions are all about identifying valuable information in large datasets of varying quality. Big Data techniques do well at detecting information in low signal-to-noise ratio environments. One analyst’s noise may be another analyst’s gold nugget. For example, if an agency maintains for internal dissemination only the (unstructured) news articles about itself, the agency may miss learning of relevant, innovative solutions developed by a state agency or foreign government agency working on a closely related problem. Since the information sought by any one analyst today may be very different from that sought by another analyst in a month’s time, inexpensive Big Data storage solutions enable the enterprise to keep all data versus only the subset initially perceived to have value.
And it’s not just about storing and searching large amounts of static data. Mining streaming social media for real-time information on disease outbreaks, for example, can provide critical public health data that may save lives.
Now that I’ve summarized the basic Big Data concepts, part 1b will take a close look at some ways that Big Data might be exploited in the public sector.
Share this article!