Big data at work
Thursday, 16th August 2012
It's hard to look at an information industry publication or blog these days without running across the term "Big Data." Sarah Hinton provides a practical look at what this often vague term means to researchers and information professionals. This is a short version of a longer article on the same topic available as part of the FreePint Subscription. The longer article provides specific examples of the impact of Big Data on research projects, suggestions for incorporating Big Data principles into many types of organisations and suggestions for staying informed about this topic. Subscribers can log in to view it now.
Big Data is a short phrase that describes a vast mass of stuff, which, in many cases, hasn't yet become information: unedited, unsorted and raw. Big data intrigues me in a way that "information overload" never did. Perhaps this is because dealing with information overload was and still is, central to my work, whereas big data hadn't seemed quite so relevant – until this year.
Data collection devices for enabling personalised medicine and improved patient surveillance represent a huge market which is fast moving towards “pervasive adoption” and these devices are beginning to gain regulatory approval too. In some cases they are no more than just another smart phone app, but all have one thing in common: They are designed to collect huge amounts of data.
Along with the excitement at the potential of these apps and big data gadgets come many concerns, such as privacy versus data sharing, effectively using the data, the need to be selective and data security. I've gathered a few particularly interesting examples of how big data is being used today as well as some of the areas of concern that have arisen and will need to be addressed:
- They are very apparent in the energy sector with the growth of smart meters and connected appliances, which are supposed to lead to the "intelligent home" and, on an even bigger scale, the "smart city". There is much discussion around customer privacy issues. Utility companies will be collecting a huge amount of data on their customers, monitoring their energy usage in real time. This additionally brings issues of how to process, store and archive all this data.
- Anti-virus security firms are beginning to look at using big data analysis to decipher whether traces of activity on a customer's system could be from an infiltrator.
- In retailing – one of the early adopters of using big data for customer analysis – there is the debate about whether collecting even more customer data and then mining this data to gather trends and behaviours really will provide better information on how we all shop or what we want to buy. Some customer analytics experts question whether more is really always better.
- What about the corporate control of data? Scientists are raising the question of research data being made available. Scientific researchers are used to making the underlying data from their research openly available, but they fear that the vast amount of information being gathered by researchers at companies such as Microsoft and Facebook will be held back instead. The scientific community is concerned that this will prevent proper validation: How can you assess the quality and accuracy of someone's research if you cannot access their research data?
The complexity of big data projects is well-illustrated through the ambitious plans to scan every book, a vision first driven by Google and now being taken up by Harvard University via its Digital Public Library of America (DPLA) project. Technology is not the main issue here. These projects are caught up in the “tangle of legal, commercial and political issues” that surrounds the publishing world and likely the larger universe of big data as well.