I read a fascinating blog post this morning that offers a glimpse at some progressive uses of the public data from social networks. The individuals and companies that seek to capture this information and analyze it for insights into behavior and offline social networks are a part of the Big Data movement, which is enabled by using high-powered computer farms to “crawl” public websites and index the connections, interactions, and inputs from consumers.
Obviously, the storage and use of data without the explicit permission of the people who created it has some up in arms about privacy violations and the potential for unseemly uses of the data. The company profiled in the blog post, 80Legs, insists the data it sells is for research and informational purposes, but that hasn’t stopped Facebook and Twitter from blocking the crawlers the company uses and LinkedIn from threatening to do the same. For $350 a month, 80Legs sells the information and browsing patterns of tens of millions of users, enabling high-level statistical analysis of trends and behaviors that can produce fascinating—and lucrative—insights about the way people use certain websites.
While access to a wealth of data like that provided by 80Legs sounds like it could lead to significant conclusions about the networks people form and the relative value placed on different forms of content, Harvard Fellow Danah Boyd cautions that using the results from Big Data searches to derive insights is assumptive at best. She says that just because relationships seem strong online or behaviors seem to be habitual, the context could be convoluting the true nature of the interactions.
As a researcher focused on social networks and how they affect communication, my initial reaction is to be incredibly excited at the possibility of gaining greater understanding from access to Big Data projects. But as a concerned netizen, I also worry about the potential negative aspects. The Big Data movement has the potential to yield incredible business insights to companies if it is regulated and executed in a transparent and safe manner. But the data crawlers like 80Legs also expose the possibility—and near certainty—of nefarious individuals using the same data indexing techniques to collect personal information to steal identities or worse.
The use of large-scale information from online interactions will remain an important topic as the time people spend online and the power of crawling technology continue to increase. The debate on the correct uses and gathering techniques for public social network data is far from over, but it is clear that Big Data will have a profound impact on the understanding of online behavior and the operations of researchers and businesspeople alike.




