This assertion has ignited a long-standing debate surrounding the “right to scrape.” Clearview AI, for instance, created a vast biometric facial recognition database by scraping publicly posted images without individuals’ knowledge or consent. However, Clearview AI has faced substantial setbacks thus far, with social media platforms like Facebook blocking its access due to violations of terms of service and privacy policies. The company is even prohibited from working in some countries, such as Canada, and some states, like Illinois. Additionally, Clearview AI has encountered significant fines in the European Union for privacy and data handling violations. The company’s survival largely depends on the absence of a federal-level data privacy law in the United States.
Although biometric information enjoys a higher level of legal protection compared to random blog posts or articles, the crux of these cases revolves around users’ reasonable expectations regarding the use of their content by private entities when it is shared on the internet. Absorbing such content into a private, for-profit database does not always align with those expectations. The situation becomes even more intricate when AI models come into play, as they may produce outputs derived from scraped information that can have legal consequences.
Anticipated Legal Challenges and Implications
As AI models and products continue to advance and roll out, it is widely expected that legal challenges pertaining to scraping and data usage will arise in the coming years. Some cases are already progressing in the legal system. OpenAI, for example, is facing a class-action lawsuit in Northern California for its extensive scraping activities. The outcome of this case will directly impact Google’s plans. Additionally, OpenAI is contending with a separate lawsuit filed by a group of authors who claim that the company specifically scraped their protected works. Microsoft is also heading towards litigation over its training of Github Copilot, with plaintiffs arguing that the software appropriated open-source code without adhering to the necessary licensing agreements.
The Impact of AI Models on the Internet Landscape
While the resolution of court cases and the subsequent regulation of AI models are likely to take years, tools like ChatGPT are already making waves across the public internet. Competitors’ AI scraping has prompted Twitter and Reddit to transition to entirely pay-for-play APIs. Twitter took more drastic measures during the July 4th weekend, making users sign in to see tweets and limiting users to seeing only a few posts per day.
AI models like Google and Bing’s new assisted search tools have also disrupted private industries built around search traffic. These tools often provide AI-generated summaries scraped from websites, diverting users from visiting the original sources. Google, in particular, faces legal and regulatory threats related to antitrust concerns due to its estimated 90% market share in search advertising. The beta version of Google’s “Search Generative Experience” tool indicates that AI-generated text will dominate the entire first page of search results, accompanied by advertising links. Actual website links will be located “below the fold,” necessitating user scrolling and clicking a “show more” button to access them.
Also Read: More Technology Related Articles