Data Scraping: How to Use Publicly Available Personal Data Compliantly

Our Head of Consultancy Louise Brooks shares her thoughts on the data scraping statement issued recently.

Joint statement on data scraping

On 24 August, the ICO, together with the 11 other members of the Global Privacy Assembly’s IEWG (International Enforcement Cooperation Working Group), published a joint statement on the practice of data scraping. While the statement is primarily aimed at social media companies and website operators, it is a timely reminder to all organisations of the perils associated with using publicly available personal data. This blog highlights the key legal principles referred to in the statement. It also sets out some of the pitfalls of using publicly available personal data and provides guidance on the ways to build a compliance framework to avoid them.

What is data scraping and why was the statement issued?

Data scraping is the process of automatically extracting data from the Internet, usually using specific technologies created for this purpose. The data collected is publicly available on social media sites and websites.

The statement doesn’t make clear why it has been issued. However, we can make an informed guess from recent press coverage that it may be because of the rise in AI technologies and the need for those to be fed by enormous volumes of information to function at their best. For example, Google’s privacy policy updates in July that imply it plans to acquire everything it can to improve its AI models and X’s decision to impose “rate limits” that restrict the number of posts certain accounts can read in a day “to address extreme levels of data scraping and system manipulation”.

The statement doesn’t carry any legal or legislative weight, but it does indicate a response is welcomed by IEWG within a month from those organisations it sent a copy to.

If personal data is in the public domain, surely it’s fair game?

Wrong. It is an all-too-common misconception that organisations can freely use personal data in the public domain without giving data protection law a second’s thought.

Publicly available personal data is still covered by data protection law, so any organisation wishing to process it must do so in compliance with the data protection or privacy legislation applicable to their jurisdiction. In the UK, that is the UK GDPR, DPA (Data Protection Act) 2018 and PECR (Privacy and Electronic Communications Regulations) (EC Directive) 2003. Any organisation wishing to collect and use personal data from the Internet or social media must consider and be able to demonstrate their adherence to data protection compliance requirements.

What does that mean in practice?

The statement includes references to some of the legal ideas that we need to consider when seeking to collect and use publicly available information:

  1. Purpose limitation: this is about understanding what you want the personal data for; it is the first thing I often advise clients to do. Getting to grips with the objective you want to achieve and how you might go about it informs many aspects of data protection compliance, so it is worth spending time thinking everything through thoroughly. This includes future proofing your purpose and processing too, as changing your purpose in the future can be difficult. Practical actions you might take to consider your purpose are to map proposed data flows, look at the systems you might deploy or third parties you might need to engage, consider whether you might combine the publicly available personal data with other information you might hold, and whether combining information in that way might make the proposed objective unexpected by individuals. Planning the activity in detail at the outset will mean you have a solid foundation upon which to build your compliance and get you thinking about how you might show that compliance. For example, will you need a DPIA (data protection impact assessment)?
  2. Lawfulness, fairness and transparency: there are a few things to unpack here.

    First, an organisation must have a lawful basis to process the personal data before it is collected. This means understanding what you want to do with the personal data and how that objective will be achieved at the outset. This goes hand in hand with thinking about the purpose limitation principle above.

    Second, you need to consider the fairness of your proposed objective. Here, we might put ourselves in the shoes of the data subject and think about whether what you intend to do with their personal data is in their “reasonable expectations”. Or put another way, do you think they would be surprised that you are using their personal data? What might they think about your intended use for it?

    Third, even though the personal data is in the public domain and individuals themselves may well have put it there, an organisation cannot collect and use it without telling individuals what it intends to do with it. This means providing privacy information to the person concerned unless the organisation can demonstrate that one of the limited exemptions in Article 14(5) might apply.
  3. Data minimisation: this means collecting only the personal data you need for your purpose. If an organisation is clear about the purpose for processing, it should be easy to achieve this principle. If the purpose is clear, your organisation will naturally only collect personal data that is necessary to achieve the objective.
  4. Security principle: the statement discusses the privacy risks associated with using publicly available information. These are not to be dismissed, particularly if your organisation is combining the publicly available information with other data that might enhance a malicious actor’s understanding of a person. The statement includes several examples of security measures that might be considered; however, don’t assume that any of these options are appropriate. It is important for organisations to consider their context for processing and not simply pick options from the list.

Whose responsibility is it anyway?

The statement rather unhelpfully includes a section on the steps individuals can take to minimise the risks associated with their personal data being scraped. This section includes advice such as reading privacy policies, thinking before posting information online and using privacy settings on devices to restrict personal data collection. But in a data-driven world, is this realistic?

Printing Google’s, Facebook’s and Microsoft’s privacy policies would require 316 pages of paper. That’s a lot of information to process! Some organisations are making strides in providing more accessible content in the privacy space (for example, the videos in Google’s policy) and there has been a shift away from the over-reliance on legalese. However, when the business model is fundamentally focused on collecting as much personal information as possible from as many people as possible, is it fair to expect the average user to commit hours to drilling down into how their information is used?

What might happen if you get it wrong?

You may remember the charity fundraising scandals between 2015 and 2017 that resulted in the ICO fining 13 not-for-profit organisations for failing to comply with data protection law. At the heart of the non-compliance was the use of publicly available personal data to build profiles of people to understand their individual wealth and propensity to donate, and the charities under investigation did not tell people they were doing this. I work with many charities and saw first hand the devastating impact this had on their activities. Many charities have only recently looked at this practice again, such was the effect of the investigations and the negative press. Although these investigations were sector-specific, they serve as a warning to all organisations that use personal data. Failing to comply with data protection law can affect your reputation, your bottom line and, ultimately, how you do business.

How can DQM GRC help you?

We have extensive experience in supporting customers with achieving business objectives while complying with data protection regulations. Get in touch with us to discuss your plans and find out how we can guide you.

Interim and seconded consultants: If you are considering a project and need data protection professionals to lead it, fill a skills gap or cover a leave of absence, we can help.Bespoke solutions: We support a wide variety of activities through our bespoke consultancy solutions. From DPIAs relating to a data-scraping activity to third-party due diligence reviews, we can help.Data seeding solutions: We can help monitor licensed data usage by your customers, employees and supply chain by seeding data to track it and audit its end use.


Add a Comment

Your email address will not be published. Required fields are marked *