How is Data Collected – Data Capture and Collection Systems

We’re giving away more data than ever before, businesses and brands collect and handle more data than they’d ever able to make use of, and there are increasingly more ways of making use of this data, be it for health advice, advertising or to analyse emerging trends. As a result, data has become extremely valuable and is often sold for large sums of money or used in ways to make more money out of individuals. Its new uses also carry great risks with them, both for us as individuals and for society as whole. It’s during this time that data privacy, too, has become more important than ever before.

In this series, we’d like to introduce you to, and help you understand, the Data Privacy Basics – key concepts that we should all know around our personal data, regulations that are in place to protect that data, and what Data Privacy means for us. In part two, we’ll look more closely at why and where data is being collected, and focus on some of the biggest data collectors that there are.

Why is data being collected?

Data collection long precedes the digital era. What used to be only stored in physical filling cabinets, nowadays can be accessed through the cloud, transferred through the internet or downloaded onto hard drives.

For businesses the main reasons for data collection used to be as simple as wanting to keep consumers’ details on file to make transactions easier for when they return, or to be able get in touch about new products or outstanding payments.

With the emergence of the internet economy, the collection and use of data has seen entirely new purposes. Businesses now collect data on our online activity to improve their own services, by e.g., recording flaws in website functionalities and tracking the time users spend on individual pages or on looking at specific products.

Personal data now also serves the advertising business. By collecting data about our interests, hobbies, behaviour and other preferences, it is possible for large data collectors to create profiles about us with which tailored and targeted advertising on websites becomes possible. This ranges from the recommendation of a product in a YouTube ad that we have previously googled on our phone, to more complicated connections which link our gender, age, nationality, and other preferences to a certain range of products or services that we might not have considered ourselves before.

This is also the reason why some businesses do not just collect data about us for themselves, but bundle and sell this data to 3rd parties. Even if this packaged data does not identify us personally, i.e. it is anonymised or pseudonymised, the insights that can be generated from this data may still be incredibly valuable to the buying party. If we say, for example, a dataset reveals that men of a certain age range in a particular country are predominantly interested in one major sport, this could serve the business in recommending ads to consumers with this particular profile. In return, data we have given away collectively as a group may very well be used in a way that affects us individually.

Data collection has also become critical for businesses with advisory functions, such as credit scoring or insurance providers. By analysing diverse sources of data that can range from facts that we immediately connect to the service, to information that seemingly does not have anything to do with it, these businesses can make decisions and recommendations about us that we do not have sufficient power over anymore. For example, the details about financial transactions from our bank accounts and any arrears in payments are critical for determining our credit scores. However, increasingly, businesses now also look to our social media data to evaluate our credibility. This screening can include our network activity, hobbies, and preferences, as well as our circle of friends and whether those friends have proven credible or not.

Finally, collecting data can also have illegal purposes. These can include scamming schemes and identify theft, as well as other crimes which often manifest very differently when conducted online instead of offline but may well be just as dangerous and severe.

Where is data collected?

Data is being collected wherever we go, especially online. However, broadly speaking we can divide between physical and virtual data collection. Data Privacy is concerned with any kind of data collection, even if many of us may only think about it in terms of online data.

Physical data collection takes place in many different ways: When we fill out a ‘new customer form’ at the Spa, when we sign up for a raffle at a fair or when we first sign up at our GP's practice in person. While some of these examples will always remain data in physical or paper format, many may initially be conducted physically and are digitalised at a later stage.

Relatively speaking, physical data collection makes up a tiny part of the data that is being collected nowadays. This is predominantly because all areas in our lives have become (more) digitalised, and more than 53% (4.1 billion) of the global population use the internet. In developed countries, the amount of people using the internet has reached more than 86%. Hence, digital devices, which are usually used to connect to the internet, are where most of our data is being collected. Depending on the kind of digital device we use and the online activities we spend the most time with, this data collection happens in different ways.

Mobile devices, such as smartphones or tablets collect and store information about our activities for various reasons: They might run location services in the background to feed into Google search results, or record sound to be able to respond when we want to activate voice assistants, like Apple’s Siri. They may also collect data on the functionality of its hard- and software to report back to the company for analytics. Sometimes mobile apps also ask for your permission to access other data on the phone, e.g. your contacts, image library and the phone’s camera (for facial recognition) and microphone (to voice recognition). This access they can use to collect even more insights about you and your preferences.

Data collection through computers work similarly. Here, most of it happens in browsers and during online activities. Just like it is the case when you browse on your mobile device, your personal data is collected during every step of your computer’s online activity. Even if you never enter any details about yourself on a website, your browser still collects data about you in the form of ‘cookies’ and uses these to track, personalise and save information about your activity. If you set up an account with your e-mail provider, online shopping portal or social media network, you create even more data that a business or organisation can collect about you.

Both mobile devices and computers now serve an important purpose: connecting us to our networks, particularly those on social media platforms. These platforms serve as virtual public spaces and make it possible to share information about our lives with our families, friends, and the general public (depending on the settings). Besides sharing these things as content with other people, though, we also share them with the platforms we use, and pretty much with any other business and organisation that has access to it. Contained in these posts, stories, tweets, and other forms of content sharing is a lot of information about our interests and personal preferences, including those that we may not reveal about ourselves but that an algorithmic system could infer from our information anyway. Simply speaking, any text, images and/or videos that we share can be and are collected as data about us (and anyone else who is featured in them).

Apart from the information we share deliberately, there is also data we give away by engaging with the platform, advertisements, and the content that others have shared. Since many people spend the majority of their time online on these social media platforms, not just to post their own information, but to consume and engage with that of others, these platforms are among of the primary places for data collection of our time. For example, a social media platform may collect information about your personal preferences by recording the time you spend reading a particular post, whether you hit the ‘Like’ button or even choose to ‘share’ or ‘re-tweet’ the respective content. It may, then, go ahead and recommend similar posts to you in the future, always recording and re-evaluating whether the content and topics engage you enough to spend time with them. Over time and combined with the information you share about yourself deliberately, the platform builds up a profile on your likes and dislikes, as well as other characteristics on your personality.

Other devices that can collect data are so-called IoT (Internet of Things) devices: Physical objects, which connect to the internet and use their sensors, software, and other built-in technologies to collect and transfer data to other devices. Examples are voice assistant speakers, GPS devices, and smart home devices, such as home security systems and smart door bells.

New technologies like those of IoT devices, as well as our growing interconnectedness of computers, phones, tablets, and other devices, have also enabled ways of even more intrusive data collection, including cross-app and cross-device tracking. These forms of tracking individuals’ activities and behaviour allow those collecting the data to follow the individual across the apps on their phone or across their multiple devices and connect the respective data in a way as to find out more about the individual. For example, if you google certain products on your phone, there is a good chance targeted advertisements for those products will pop up during browsing on your computer. Similarly, if you instruct your Alexa voice assistant to look for a thing while sitting at your dinner table, this, too, might subsequently feed into targeted advertising.

While the above are just some of the examples of how businesses collect our data, much of our data, including very sensitive data, is collected by the state in, for example, electoral registers, household surveys, or census data.

Who are the biggest data collectors?

From the above you can tell that virtually all businesses and organisations act as data collectors in one way or another. Non-profit organisations, government institutions and health care providers collect data about us just as much as consumer brands, mass retailers, and financial institutions do. However, as much or as little as these entities collect, the biggest and most powerful data collectors are Big Tech and Social Media companies, including Alphabet/Google, Meta/Facebook (including apps like Messenger, Instagram, and WhatsApp), Apple, Amazon, Twitter and Bytedance/TikTok.

For one, this is because their products and services are used by millions (Facebook even by billions) of people. Hence, the amount of data that can potentially be collected is vast.

It is also because some of these Big Tech companies, particularly Meta with its social media networks Facebook and Instagram, have based the majority of their business models on the monetisation of data and make the most of the money through targeted advertising based on that data. A 2021 study into how much data brands really collected, for example, found that Facebook and Instagram were leading by far in a ranking on how much of the data that can legally be collected actually is collected by brands, with Facebook collecting 79.49% and Instagram collecting 69.23% of that data.

However, some tech companies, first and foremost tech giant Apple, are now responding to the growing concern of people about data privacy and the excessive monetisation of data. They do so by limiting their own data collection and sharing practices to a minimum, and, in Apple’s case, even allow for their devices’ users to opt-out of cross-app tracking those other businesses would use to collect your data. For Apple, this move might have come more easily than for other Big Tech companies as its business model has always relied more on the sale of their high-end hardware devices than on the monetising of data. However, it is still refreshing and important to see a Big Tech company strengthen their data privacy practices, particularly as this can have a trickle-down effect on other brands. Hopefully, this example among others will push for the whole industry to move towards stronger and more ethical data privacy frameworks.

If you would like to see some neat stats on which and how much data is collected by the above-mentioned and many other companies, check out these two blogs on ‘Big brother brands report: which companies might access our personal data the most?’ and ‘The Data Big Tech Companies Have On You’.