“All Your Content Belongs to Us”: Google’s Privacy Policy Update Raises Concerns About Data Scraping and AI Models

Google's Privacy Policy Update and the Implications for Data Privacy

0
96
Google's Privacy Policy

Introduction

In the ever-evolving landscape of technology, organizations are often wary of embracing AI tools due to various concerns. Legal uncertainties surrounding the rights of AI models to use data they are trained on, and the possibility of generating content that includes protected elements, are among the key reasons for hesitation. Recently, Google made waves with its privacy policy update, which seemingly grants the company the ability to scrape content from the entire internet for the development of its AI products.

Google’s Privacy Policy Update and the “Right to Scrape”

The updated Google’s privacy policy represents a subtle yet highly significant change to language that has been in place for several years. Previously, Google stated that it utilized public internet sources to train “language models,” citing Google Translate as an example. This statement did not attract significant attention or controversy.

However, the revised Google’s privacy policy now refers to “AI models” instead of “language models,” explicitly mentioning ongoing projects like Bard and Cloud AI. The updated policy implies that Google thinks it can use any content on the public internet that it can access to improve its own products, while one may think that only content shared on Google’s free services, such as Blogger and Sites, would be used like this.

This assertion has ignited a long-standing debate surrounding the “right to scrape.” Clearview AI, for instance, created a vast biometric facial recognition database by scraping publicly posted images without individuals’ knowledge or consent. However, Clearview AI has faced substantial setbacks thus far, with social media platforms like Facebook blocking its access due to violations of terms of service and privacy policies. The company is even prohibited from working in some countries, such as Canada, and some states, like Illinois. Additionally, Clearview AI has encountered significant fines in the European Union for privacy and data handling violations. The company’s survival largely depends on the absence of a federal-level data privacy law in the United States.

Although biometric information enjoys a higher level of legal protection compared to random blog posts or articles, the crux of these cases revolves around users’ reasonable expectations regarding the use of their content by private entities when it is shared on the internet. Absorbing such content into a private, for-profit database does not always align with those expectations. The situation becomes even more intricate when AI models come into play, as they may produce outputs derived from scraped information that can have legal consequences.

Anticipated Legal Challenges and Implications

As AI models and products continue to advance and roll out, it is widely expected that legal challenges pertaining to scraping and data usage will arise in the coming years. Some cases are already progressing in the legal system. OpenAI, for example, is facing a class-action lawsuit in Northern California for its extensive scraping activities. The outcome of this case will directly impact Google’s plans. Additionally, OpenAI is contending with a separate lawsuit filed by a group of authors who claim that the company specifically scraped their protected works. Microsoft is also heading towards litigation over its training of Github Copilot, with plaintiffs arguing that the software appropriated open-source code without adhering to the necessary licensing agreements.

Timothy Morris, Chief Security Advisor at Tanium, points out that legal challenges related to “deepfakes” and Google’s privacy policy may pose further hurdles for AI models. From a privacy standpoint, the ability of AI to create new works tests the definition of “public.” Taking publicly available images and information to generate new works raises legal concerns, potentially leading to a demand for better regulations. The issue of deepfakes that employ public content serves as an example of these challenges.

The Impact of AI Models on the Internet Landscape

While the resolution of court cases and the subsequent regulation of AI models are likely to take years, tools like ChatGPT are already making waves across the public internet. Competitors’ AI scraping has prompted Twitter and Reddit to transition to entirely pay-for-play APIs. Twitter took more drastic measures during the July 4th weekend, making users sign in to see tweets and limiting users to seeing only a few posts per day.

While the narrative surrounding AI models scraping content is often sympathetically framed as a struggle between the “little guy” and AI models that threaten to render certain jobs obsolete, the recent actions by Twitter and Reddit suggest that legal battles over scraping may ultimately resemble a clash between tech giants, similar to “Godzilla vs. King Kong,” with each blaming the other of breaking their respective rules of service. Regulators may treat scraping similarly to violations involving poorly secured APIs, deeming it a privacy policy failure on the part of content hosts.

AI models like Google and Bing’s new assisted search tools have also disrupted private industries built around search traffic. These tools often provide AI-generated summaries scraped from websites, diverting users from visiting the original sources. Google, in particular, faces legal and regulatory threats related to antitrust concerns due to its estimated 90% market share in search advertising. The beta version of Google’s “Search Generative Experience” tool indicates that AI-generated text will dominate the entire first page of search results, accompanied by advertising links. Actual website links will be located “below the fold,” necessitating user scrolling and clicking a “show more” button to access them.

Conclusion

Google’s privacy policy recent update, which suggests the company’s intention to scrape content from the internet to enhance its AI models, has raised significant concerns and ignited debates about data scraping and privacy. As legal battles unfold and regulations surrounding AI models become clearer, it is crucial for stakeholders to navigate this complex landscape. Striking a balance between innovation and user expectations while addressing privacy concerns and ensuring legal compliance is paramount for the responsible and ethical development of AI technologies.

Also Read: More Technology Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here