Big Data – plenty of it,… but still missing

Spring 2010, at a small Croatian town there is an unusual meeting going on in a factory that we will call PPP. The meeting room at the first floor of the administrative building, just next to the gray production halls, hosted a group of about 10 different people. Individuals in the room take part of a fiery debate about the presentation that’s been projected on the wall. The discussion of a group of people in white coats, most probably production and development engineers, is obviously driven by the two most active members, often arguing with conflicting views among each other. There is a few people dressed in suites. One would guess that those are consultants and the company’s management team. Part of that group is quiet; they are just listening, and nodding from time to time to show their mental presence in the debate. Two persons from the group in suites ask a lot of questions. Some other participants are very active as well. They draw on the board and answer the questions with lively gestures. Those are mostly members of the academic community that take part of one of the EU “cross-border cooperation” projects, which is actually the reason of this colorful meeting.

Data Explosion1

Let’s add sensors
In order to improve the efficiency of the production process and product quality, PPP initiated enrichment of certain phases of production by additional sensors, PLC and SCADA elements. By increasing the number of sensors from 12 to 35 per production machine, PPP started one of numerous initiatives around the world that contribute to the enormous global growth of machine generated data, the one we like to call Big Data. At one point, a temperamental professor with a French beard took stage. He passionately explains to the group recent results gathered from mining of the newly established data sets based on the increased number of sensors. No matter how colorful graphs were clear and despite the insight that was much above the previous findings, it was hard not to recognize the indifference on the faces of other participants in the meeting. Something is missing!

Data model or a Swiss cheese?
The whole initiative should provide, if not revolutionary, then at least usable insights. “We need to close the circle!”. All of a sudden, eyes of the participants were turned on the consultant who had been silent so far. “We need to close the information circle. You have all the parameters of the machine, but you really should start from the goals. You have to ensure traceability and link quality of the products with different stages of the production process and their parameters. Otherwise the new parameters won’t have much to say.” It is difficult to add IT tags to the hot metal castings that are being produced by the machines at PPP, so the data that was supposed to link the quality achieved and the level of waste with the 35 newly established parameters was simply missing.

Big Data: new methods, old constrains
Concluding superficially, Big Data might be perceived as a cure for everything: “now that we have so much information available, it is enough to develop mathematical algorithms and we will find all the answers.” But the truth is exactly the opposite. Today we have plenty of mathematical algorithms – from those that recognize your face, the tone of your voice or your fingerprint to those which understand the context of human speech, but the ways in which we traditionally collect data (processes) are not aligned with the technological capabilities of finding data patterns and filtering it through massive parallel processing (technology). More specifically, Big Data technologies will surely find patterns through a large amount of data, but those will not always propose answers to your problem or give you new relevant insights. In the same way, the data mining in PPP provided insight into the machine behavior such as stability patterns of certain parameters during the production cycles, including some insightful deviations. But it offered no answers about how those deviations and patterns affected the only thing that really mattered – the quality of the product. The answers must be included somewhere within the data set that we explore. They have to take part of the meta model of the entity that we analyze, or we must be able to deduct it from attributes of other entities that are similar enough to the one we study (i.e. the data on the quality of the product of hundreds of similar or identical machines worldwide, in case of PPP).

You can read more in the May 2013 issue of the Mreža magazine (Croatian language only), or later during the year translated to English at Alen’s Thing Place.

This work is Copyright of Alen Gojceta. You are not allowed to use the article, or any of its part in commercial or academic work without citing the author and this link.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)