You are here

Creating a Big Impact with Big Data

Photo: Siddharth Kankaria / Research Matters


With the proliferation of technologies and their applications in our day-to-day lives, we have been able to overcome the constraints of time and space. Today, one can instantly talk to their loved ones sitting thousands of miles away or take the next flight to meet them in person -- all in a jiffy! With almost everyone using the Internet and wireless communication, we can safely claim that the humankind is now almost entirely connected,  bringing a significant sociocultural change.

Technology has also changed the way we understand other cultures, meet different people, communicate and maintain relationships with others. As a victory of this digital revolution that is  transforming us into an advanced networked society, the 17th of May every year is celebrated as the ‘World Telecommunication and Information Society Day’.

This day has an interesting history. On 17th May 1969, the United Nations (UN) commemorated the founding of International Telecommunication Union (ITU) -- a specialized agency of the UN responsible for Information and Communication Technologies (ICT) -- as ‘World Telecommunication Day’.  Of course, there was no Internet back then! By 2005,  with the advent of so many technologies, the UN General Assembly declared May 17th as ‘World Information Society Day’ to create awareness of the impact of ICT on the society. In 2016, realizing the overlap of interests and visions of the two events, it was decided to combine them into a single grand event -- the ‘World Telecommunication and Information Society Day’.

As every special event, there is a focused theme for ‘World Telecommunication and Information Society Day’ each year, and for this year, quite aptly, it happens to be ‘Big Data for Big Impact’. With so much of buzz around Big Data, this year’s theme focuses on harnessing the power of Big Data for the social good.

Big thanks to Big Data

Analysts and engineers believe Big Data has a humongous potential to improve our society – much like electricity or antibiotics! A simple definition of this buzzword can help us realize why this might be the case. Big Data is defined as extremely large amount of data which can be analyzed by computers to reveal patterns, trends, and associations, especially relating to human behavior and interactions. It can give us a much more accurate, timelier understanding of how our society works so that we can base our decisions on facts rather than uncertain assumptions.

Here is a bizarre but relevant example.“Have you ever figured how information-rich your stool is?” -- asks Prof. Larry Smarr, a pioneer astrophysicist turned computer scientist, who is now working on a project to explore his quantified self by continuously collecting the data about every function of his body. “There are about 100 billion bacteria per gram. Each bacterium has DNA whose length is typically one to 10 megabases—call it 1 million bytes of information. This means human stool has a data capacity of 100,000 terabytes of information stored per gram. That’s many orders of magnitude more information density than, say, in a chip in your smartphone or your personal computer. So your stool is far more interesting than a computer”, he adds collecting data in the granularity of minutes, and analyzing it through Big Data in the hope of shaping the future of health care.

Just like Prof. Smarr, several computer scientists have started collaborating with medical researchers to put the power of Big Data to save lives. For instance, Mount Sinai Medical Center in the USA is using Big Data technology to analyze the entire genome sequences of bacteria to develop antibiotics, which otherwise could have taken ages to complete.

Applications of Big Data can be seen as the building blocks of ‘smart’ cities to manage water resources, reduce traffic jams and improve public safety. The Miami-Dade County Parks department is expecting to save $1 million this year by reducing water waste through the identification and repair of leaky, corroded water pipes -- all using the power of Big Data. This is an example of the innovative trends of solving old school problems, with added financial gain that community can avail with the help of modern technologies.

The World Wild Fund for Nature (WWF) is harnessing the power of Big Data to protect the endangered wildlife, especially tigers. Satellite images for aerial surveillance of protected forests are combined with animal tracking collars and other sensors to deduce real-time insights on the health conditions of tigers and save them from other predators including humans. In India alone, for the first time in last four years, the number of tigers have increased by 30% after the adaptation of these tactics.

Big Data and the big problems

“With great power comes great responsibility”, says Uncle Ben, a character from the Spider-Man. This is true with Big Data too, which comes as a package with some possible negative impacts. Irrespective of how much we claim that we have nothing to hide, we all have something to hide for sure - whether it’s diseases, sexual desires, or personality traits. A constant fear of ‘someone’ watching over all your web activities or keeping a tab of what you buy at the supermarket can be unnerving for most of us, violating our sense of privacy and acting as tools for surveillance.

For instance, with 29 million streaming customers, Netflix is one of the largest providers of commercial media in the world. Undoubtedly, it has a trove of data treasure to advertisers -- details of what type of content each user watches, when he/she is watching, where he/she is watching and on what device, how often do each of them rewind, fast-forward or pause a streaming content, and how likely is it for one to stop watching entirely. If this doesn't scare you, imagine a representative of Netflix sitting behind your couch to observe your behaviors when you are enjoying a romantic movie with your partner!

The same holds true with Google, Amazon, Facebook or YouTube -- world’s leading online services. The advertisements and the recommendations/ suggestions provided by these services are entirely based on profiling of people on their past activities to predict their future interests. In other words, Big Data is also taking away our decision-making capabilities, besides privacy. Mass surveillance initiatives by intelligence agencies (e.g. NSA, GCHQ) take this power to the next level to knock every bit of personal space that you have in your physical world. Without Big Data, the scale at which such profiling is done today would have been impossible.

Big data creates big biases

Humans are complex creatures with dynamic behaviours, thoughts and actions, sometimes random and irrational, thus being tough to model. However, Big Data is thumps this challenge on the face and is now being used to screen people with specific social preferences. Hiring new employees based on their social media activities, providing insurance based on fitness tracker data, beefing airport security check ups and tracking future crime predictions based on cell phone call logs, are few of those instances.

Though many claim that these applications are for the social good, there are two fundamental problems with applying big data to the social sector. The first is related to the biases that creep into our decisions on selecting the specific social preferences. If such decisions made by millions of employers, police or judges over a long period are collected together, it brings in all those biases to a larger scale. With machine learning algorithms trying to deduce insights, possible biases on gender and race could cloud decision making, leading to a divided society.

The second problem lies with the accuracy and error rates with algorithms that run on Big Data. Consider this -- 1 in every 2 million travelers is in a database with pictures of potential terrorists and the facial recognition software built on Big Data, used in airports to scan travellers has an error rate of 15% on potential terrorists and 0.2% on non-potential terrorists. Going by these numbers, what do you think is the chance that a recognized person is actually in the terrorist database? The answer is 0.02%, or 1 in 5000! That is, 5000 people falsely accused of being terrorists though they may have nothing to do with terrorism!

Of course, these are not unrealistic examples, but a ground reality in today’s world. Many international airports experience this each day. A famous online service company was recently criticised for tagging people with black and brown skin as gorillas because its facial recognition software running on Big Data didn't contain enough samples of such people to recognize them as humans. While it is not possible to create 100% accurate models, having error-prone models might soon reverse the perceived benefits to the society.

Big data -- to celebrate or to criticize?

Big Data is undoubtedly a powerful tool that has the potential to shape every part of our society from health care and education to urban planning and protecting the environment. But like every other innovation of science, it has its dark side. Striking a fine balance between the two sides of the argument is what we need to strive for. Putting in the right checks and balances for quality of data, its sources, access control and anonymizing it to protect individual privacy, holds the key in realizing the complete potential of this disruptive technology and creating a an impact for the society at large.