What you need to know about Big Data

People and machines generate unimaginable amounts of data – currently 16.3 zettabytes per year, a number that will grow tenfold by 2025. “Big data” is an extremely valuable resource that is changing the way we communicate: We divulge ever more information about ourselves. There are advantages and disadvantages to this trend.

During your lunch break, you pull out your smartphone and check various apps to see what you missed. You order something from an online shop at the spur of the moment, because you just received a push message telling you that the price of the skiing helmet you’re interested in has been heavily reduced. What looks like a coincidence is actually the product of advanced math, analytical skills, and targeted marketing, in other words, applied big data.

Around the globe, people and machines produce vast amounts of data around the clock – in Internet browsers and social networks, while driving, when paying by credit card, during online shopping, while making calls on their smartphones, and much more.

At the same time, cameras monitor our cities, smart metering systems measure power consumption, stock market transactions are carried out on computers, medical devices record health data, and connected cars are linked to the Internet.

What is big data?

Each of the approximately 8 billion people on earth would need 44 brains to store the amount of their own personal data. (Source: KPMG, PM-MAGAZIN)

The term ‘big data’ stands for this mass of data, also referred to appropriately as mass data. It describes volumes of data that are too large, too complex, too short-lived and also too unstructured to be processed with a regular computer.

That’s because after the data has been collected, it is then analyzed to gain insights from it. That’s the actual objective: to derive benefit from the information. Big data is hence often synonymous with big data analytics, especially in marketing.

Big data in real life

Big data is defined by three main characteristics: volume, diversity, and speed. Growing digitization means a constant rise in the volume of data at an ever increasing speed. We currently produce and record 16.3 zettabytes of data each year. A zettabyte is one billion terabytes. By 2025, this number will increase to 163 zettabytes, according to a study conducted by IDC and Seagate. Written out in full, that number looks like this: 163,000,000,000,000,000,000,000 bytes.

Entire server farms are needed to reliably store this vast amount of data. Energy-efficient semiconductors from Infineon ensure that the costs to supply power to these server farms are significantly reduced.

In the future, the data volume will be measured in yottabytes, a number with 24 zeros. This rapid growth is also related to the increasing diversity of the data. In the beginning, digital data was mostly in the form of numbers and documents. With the invention of the Internet and digital cameras, photo, audio and web data was added. And ever since we’ve had mobile phones, smartphones, YouTube and Netflix, mobile data and information from social media and streaming services have also entered the picture.  Now, devices that are part of the Internet of Things, like fitness bands, smart thermostats and connected cars, are additionally feeding this ocean of data.

Another thing that has changed from the early years of the digital age is the speed with which the data can be accessed. First it was pooled, then it was periodically compressed, and now it’s available in real time.

But initially, all this data is merely collected information. It becomes usable only when it’s processed quickly and analyzed correctly. This is where big data offers advantages and disadvantages. But for whom exactly?

The advantages and significance of big data

Big data plays an important role in more and more areas of our daily lives. Scientists use data to examine climate change or the occurrence of earthquakes and epidemics. Government agencies and intelligence services comb through huge data volumes to find anything conspicuous that might reveal terrorists. We collect content data, metadata, transaction data, behavioral data, health data, financial data, measurement results and monitoring data. This data is of interest for the stock market, nuclear physics, regional transport, telecommunications, market research, energy supply, insurance companies, retail chains, the automotive sector, criminology, anti-terrorism, and marketing.

Data from the present is used to make forecasts for the future. This is where big data analytics comes into play, also referred to as data mining.

How big data works and is used in business

At its development center in Berlin, online retailer Amazon is working on machine learning and predictive analytics. Special algorithms evaluate purchases to date as well as social media posts. They identify the customer’s individual style and take account of will soon be in fashion. This knowledge is then used to send out personalized product recommendations.

Google and the flu epidemic

It’s always about finding correlations in large data volumes where the human mind would be unable to see any connections. Google uses specific search queries to identify the spread of influenza. The underlying idea is that people first start to look for information about the disease once they or their family are affected. An analysis of search queries and disease data actually showed a connection. Google was able to predict the course of the flu epidemic up to two weeks sooner than the health agencies. However, this approach hasn’t worked with the same level of precision every year.

Farmers optimize the use of fields and barns

Another example comes from agriculture, as farmers increasingly digitize their operations. Big data helps them get the most out of their fields and barns. To prevent excessive irrigation, sensors measure the humidity in the ground so that each square foot can receive a different amount of water and fertilizer. Other systems collect data to determine the health and fertility of cows. This saves time and allows farmers to detect diseases at an early stage.

BMW and big data on the factory floor

Car manufacturer BMW also relies on big data in its production operations: IDs are assigned to body parts to allow fine adjustment of the presses. The latter thus respond to the respective sheet thickness, stiffness or surface characteristics, and prevent tears.

Today's biggest data producers

The U.S., Western Europe, China and India are producing far more data than the rest of the world does in sum. (Source: KPMG, "Going beyond the Data")
Big data in practice: The 2016 U.S. elections

Big data in practice: The 2016 U.S. elections

Many people believe that Donald Trump also won the election with the help of big data, since precise targeting enabled voters to be addressed directly and quickly on Facebook or Twitter. Analyzing users’ “Likes” or what they respond to makes it possible to provide them with customized information.  It’s the alternative to conventional election advertising: Instead of printing the same message in a brochure for everyone, or showing the same ads over and over on television, users are targeted specifically. Allegedly, Trump’s campaign went even farther with the help of Cambridge Analytica, using what are known as “dark ads”. This form of extremely personalized advertising on Facebook supposedly works not only with big data, but also with the evaluation of personality profiles. That, for example, is allegedly how opinion on Hillary Clinton was manipulated: African-American women who sympathized with her received a video in which she described black men as predators. Whether Cambridge Analytica and “dark ads” really played a significant role in the election is still unclear.

From big data to smart data

Big data involves huge data volumes that existing methods are unable to analyze or process. They often occur in real time. Smart data, on the other hand, goes beyond this term. It describes useful, verified and high-quality data identified from sets of big data. Big data is therefore the database, a kind of raw material that needs to be processed so that it can be refined into smart data and develop its full economic potential.

Intelligent algorithms are needed to bring order to this chaotic flood of data, because big data has to become smart data: It simply starts out as immense quantities of data. But that are no use to anyone unless they’ve been analyzed. Only then does the data become high-quality data. The aim of the algorithms is to detect patterns and statements that are then interpreted and evaluated by analysts. Companies use this approach, for example, to identify and fix weak points in production. That may give them an edge over the competition.

Companies benefit from this data collection and analysis, as noted by the German Federal Ministry of Economic Affairs and the Fraunhofer Institute for Intelligent Analysis and Information Systems. That’s because

  • more real-time analysis leads to optimized workflows and hence more efficient management.
  • Personalized services can be offered when systems turn masses into people with individual needs.
  • Products with preinstalled big data intelligence process data independently and pass it on.

Benefits for consumers

Big data also delivers benefits to consumers. Fitness trackers and apps are very popular these days. People use them to monitor their sporting activity, sleeping pattern, blood sugar levels, blood pressure, eating habits, and much more. This information is analyzed, and the tools provide tips for healthy behavior.

Consumers can also save money with big data, for example, when smart meters record their power consumption and identify energy wasters. Or when online shops store their habits and preferences in order to offer personalized vouchers or specials. It’s nerves, not money, that people save in smart cities in which traffic data is analyzed in real time: congestion is avoided, and alternative routes are proposed for passengers and drivers. Connected cars can even brake on their own in an emergency and report accidents automatically.

Big data and data protection

How much transparency do we want?

Big data always has two sides: One that produces the data and another that analyzes and uses it. Industry and science benefit from big data. But what price do the users pay whose data flows into these analyses? There are two aspects to this: Firstly, consumers are often unaware of how carelessly they divulge information and what it can be used for. Secondly, they are often not asked for their consent to use the data.

Date of birth in connection with their place of residence, device identification of their smartphone, cookies in the browser, IP addresses, chat and text messages, and their own posts and profiles in social networks: Smart tools add up all this information and turn it into a comprehensive picture of each individual. Even if data records are anonymized, they can easily be assigned again to the original persons. What remains are highly “transparent” users. And they are then classified into categories, for example, by insurance companies and banks before they sell a policy or issue a loan, or by employers looking for new people to hire.

Objections to big data

Objections to big data

Data protection campaigners like Germany’s “Digitalcourage” association see inherent risks in using big data analyses.  They claim that data is being misused, personal rights violated, and people discriminated as soon as they fall into certain algorithmic categories. After all, there is no guarantee that the results of the algorithm match reality. Data is always quantitative and relatively meaningless without context.

At the same time, the data can contain anything, including sensitive information. That gives power to those analyzing it. Facebook programs algorithms in such a way that users automatically see only those things that confirm their world view. Google’s search results are also biased thanks to the search engine’s algorithms and the user’s previous queries.  Experts refer to that as the “filter bubble”.

European data protection law does at least specify  that data may only be used for the purpose and within the offering for which it was collected. This is the principle of purpose limitation. For example, the order data of a pizzeria may not be linked to that of a connected car in order to send offers to customers in the car as it drives by.

Many consumers still lack awareness of what it means to divulge information. Everyone should be aware that “only non-collected data is secure data,” says the Digitalcourage association.

Consumer protection organizations do, however, also warn against playing off big data and data protection against each other. The analyses could be hugely beneficial for consumers, for example, when connected cars automatically report accidents or avoid traffic backups. On the other hand, all those in possession of data could manipulate and control consumers. Germany’s Federation of German Consumer Organizations advocates that each individual should have the right to determine which data they divulge and how it may be used.

Development and future of big data

Whether with or without data protection guidelines, big data will become even bigger in the future. Data is the gold of the future. The efficiency of data analytics will continue to improve, and more and more companies will rely on data analysis to keep up. Specialists who analyze and interpret data correctly will be in even more demand.

Algorithms for facial reconstruction already exist, making it possible to compare profile pictures from social networks with a medical MRT image. We will also soon see the first text mining algorithms that can assign sentences in Internet forums to a person. The trend is toward combining big data with artificial intelligence. Robots and machines are being programmed to learn independently (“machine learning”), thus allowing them to rapidly process data themselves and respond to it. Connected cars and autonomous driving are examples of this.

Also on the rise is “contextual awareness” on mobile devices. Apps analyze specific information and then know what the user needs next. For example, when the user types in a customer’s name, the device instantly brings up all previous e-mails and appointments with that business partner, or offers to call them.

Big data provides users with a host of benefits and new possibilities. The challenge lies in seizing these opportunities, but without losing sight of the risks. This is where users themselves can help by handling their own data with care and caution.

Big data technologies you should know

Big data technologies you should know

To analyze web server logs, social media activities, itemized bills for mobile phones, and information from sensors, companies often use the following programs: open-source software frameworks (Apache Hadoop, Spark, NoSQL databases, Map Reduce), database systems (Big Table, Cassandra), graph databases, and distributed file systems.

Further topics