So, back to the premise that there is a lot of data that we can't get through to figure out what is exactly causing El Niño. The video above shows a good overview of what "Big Data" is (or the problems inherent). Basically, the (typical) problem with big data is that relevant information is usually destroyed before it becomes of any use. The difficultly in deriving meaning from data, means it is disposed of to create more space for newer data, which will also be disposed of due to lack of processing ability.
If we go back a few weeks ago, I created a list of data sets that might be attainable for researching El Niño's causes. Even if I were to obtain that information, there is a very real risk of the data just dropping through the analysis without being caught in whatever fish net (a.k.a., analysis) we're using to sort the information.
So, the temptation with Big Data is to story the analysis and dump the data. For example, let's say we are using a storage unit. Every time we fill the storage unit, we create an inventory list of everything in the unit. However, when we empty the storage unit, we destroy that inventory list. Instead, we just have a record that the storage unit was filled and did contain something. Then the storage unit is refilled and a new inventory list is created. We have no way of knowing if what was in the storage unit the first time was in anyway related to what was in the storage unit the second time. We no longer have that first inventory list.
In a way this makes a lot of sense, because there seems to be a terminal point at which data is of use. For instance, I was taking a marketing class. We were using SPSS to analyze some number set. There were a number of different variables. There were so many variables that the way we were measuring the fit of our model (R^2) just kept improving, even though the fit of the model to the variable was NOT improving. There was simply no use for the 20 data sets we had access to because we could do better modeling with three of those data sets.
However, this becomes similar to taking the derivative in calculus. Yes, we get new useful data, but we can no longer see the big picture. If we keep analyzing the analysis, we go from a complex series of curves to a straight line.
WyzAnt Tutoring Graphic Showing an Example of Position, Velocity and Acceleration of a Particle |
No comments:
Post a Comment