journalfront

OpenAI Erased Data in NY Times Copyright Suit (Updated)

2024-11-23

Lawyers representing The New York Times and Daily News have taken legal action against OpenAI, alleging that the company scraped their works to train its AI models without permission. This has led to a complex legal battle with significant implications for the future of AI and copyright law.

"Unraveling the Data Deletion Conundrum in the AI War"

Section 1: The Alleged Data Scraping and Training

Lawyers for The New York Times and Daily News are firmly in court, claiming that OpenAI's actions have violated their copyrights. OpenAI's alleged scraping of their works to train its AI models without permission has sparked a heated dispute. This incident has raised serious questions about the ethical and legal boundaries of AI development.These newspapers have been at the forefront of the digital media landscape, and the potential loss of their intellectual property through OpenAI's actions is a significant concern. The lawsuit highlights the need for clear regulations and guidelines in the rapidly evolving field of AI.

Section 2: The Data Deletion Incident

Earlier this fall, OpenAI agreed to provide two virtual machines for the publishers to search for their copyrighted content in its AI training sets. However, on November 14, OpenAI engineers accidentally erased all the publishers' search data stored on one of the virtual machines. This was a major setback for the publishers, who had already spent over 150 hours since November 1 searching OpenAI's training data.Although OpenAI tried to recover the data and was mostly successful, the loss of the folder structure and file names made the recovered data unusable. This has forced the publishers to recreate their work from scratch, using significant person-hours and computer processing time. The incident has underscored the importance of proper data management and security in the AI industry.

Section 3: OpenAI's Response and Denials

In response to the publishers' letter, OpenAI's attorneys unequivocally denied that OpenAI deleted any evidence. Instead, they suggested that the plaintiffs were to blame for a system misconfiguration that led to a technical issue. OpenAI's counsel argued that implementing the plaintiffs' requested change resulted in removing the folder structure and some file names on one hard drive, which was supposed to be used as a temporary cache.However, the publishers' counsel remains skeptical and believes that OpenAI is in the best position to search its own datasets for potentially infringing content using its own tools. The ongoing dispute between the two parties highlights the challenges of navigating the complex world of AI and copyright law.

Section 4: Fair Use and Licensing Deals

In this case and others, OpenAI has maintained that training models using publicly available data is fair use. The company believes that it isn't required to license or otherwise pay for the examples used to create models like GPT-4o. However, OpenAI has also inked licensing deals with a growing number of new publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp.Although OpenAI has declined to make the terms of these deals public, one content partner, Dotdash, is reportedly being paid at least $16 million per year. This raises questions about the fairness and transparency of OpenAI's licensing practices and the potential impact on the publishing industry.

Welcome to TechCrunch's regular AI newsletter! Every Wednesday, we bring you the latest in the world of artificial intelligence. This week, we're diving into the world of Thanksgiving and how chatbots can help create a unique and memorable feast. Whether you're the designated cook or just looking for some inspiration, read on to discover how AI can take your Thanksgiving to the next level.

Unlock the Potential of AI for Thanksgiving Delights

ChatGPT's Thanksgiving Menu

ChatGPT starts off with a fancy cocktail hour featuring whipped sweet potato and goat cheese crostini. For the appetizer, it suggests "pumpkin soup shooters with cinnamon crème fraîche," followed by a main course of miso-butter turkey with a ginger-soy glaze. On the side, it recommends a chili-lime corn bread and pistachio risotto. And for the big finish, it has you stick to staples like pie, cheesecake, and saffron-flavored ice cream.

According to ChatGPT, this menu takes familiar Thanksgiving flavors and elevates them through unexpected ingredients and combinations. Each dish tells a story and invites conversation, making the meal not just about food, but about shared experience and creativity.

Claude's Thanksgiving Menu

Claude goes for the moon with an appetizer of "butternut squash bisque with sage foam" that definitely checks the "unique" box. For the main course, it suggests "lavender and fennel dry-brined turkey with a honey-thyme glaze," an herbaceous departure from classic roast turkey. On the side, it recommends whipping out the fine liquor for a "wild mushroom and chestnut stuffing with aged sherry."

Claude writes that its creations take familiar Thanksgiving flavors and elevate them through unexpected ingredients and combinations. Each dish tells a story and invites conversation, making the meal not just about food, but about shared experience and creativity.

OpenAI's Sora Leaks

A group appears to have leaked access to OpenAI's video generator, Sora, in protest of what it's calling "art washing" on OpenAI's part. This has raised concerns about the security and ethics of AI technology.

OpenAI needs to address these issues and ensure that its technology is used in a responsible and ethical manner.

Amazon Backs Anthropic

Anthropic has raised an additional $4 billion from Amazon and has agreed to train its flagship generative AI models primarily on Amazon Web Services. This partnership could have a significant impact on the development and deployment of AI technology.

It remains to be seen how this partnership will shape the future of AI and what benefits it will bring to users.

AI App Connectors

In other Anthropic news, the company has proposed a new standard, the Model Context Protocol, for connecting AI assistants to the systems where data resides. This could make it easier for users to integrate AI into their existing workflows and applications.

The Model Context Protocol has the potential to revolutionize the way we use AI and make it more accessible and useful.

OpenAI Funds "AI Morality" Research

OpenAI is pouring $1 million into a Duke University research program to develop algorithms that can predict humans' moral judgments. This is an important step in ensuring that AI is developed with ethical considerations in mind.

By understanding human moral judgments, AI can be developed to make more ethical decisions and avoid causing harm.

YouTube Gets AI Backdrops

YouTube's Dream Screen feature for Shorts now lets users create AI-generated video backdrops. This could make it easier for content creators to add visual effects and enhance their videos.

With AI-generated backdrops, YouTube Shorts can become even more engaging and immersive for viewers.

Brave Adds AI Chat

Search engine Brave has introduced an AI chat mode for follow-up questions based on initial queries on Brave Search. This is an expansion of Brave's Answer with AI feature that provides AI-generated summaries of web searches.

Brave's AI chat mode makes it easier for users to get the information they need and have more engaging conversations with the search engine.

AI2 Open Sources Tülu 3

The Allen Institute for AI (Ai2) has released Tülu 3, a generative AI model that can be fine-tuned and customized for a range of applications. This could make it easier for developers to build their own AI applications and services.

Tülu 3 has the potential to revolutionize the way we develop and use AI and make it more accessible to a wider range of users.

Crusoe Raises Cash

Crusoe Energy, a startup building data centers reportedly to be leased to Oracle, Microsoft, and OpenAI, is in the process of raising $818 million. This could help the company expand its operations and bring more AI-powered services to market.

With the additional funding, Crusoe Energy could become a major player in the world of AI and data centers.

Threads Tests AI Summaries

Meta's Threads has begun testing AI-generated summaries of what people are discussing on the platform, taking a page from rival X. This could make it easier for users to stay up-to-date on the latest conversations and trends.

Threads' AI summaries could become an important tool for social media users and help them stay connected with the world around them.

DeepMind's AlphaQubit

DeepMind has developed a new AI system called AlphaQubit that can accurately identify errors inside of quantum computers. This is an important step in improving the reliability of quantum computers and making them more useful for a wide range of applications.

AlphaQubit has the potential to revolutionize the way we use quantum computers and make them more accessible and reliable.

Runway's Frames Model

Runway has released a new image-generation model called Frames that offers better stylistic control than most. The model is slowly rolling out to users of Runway's Gen-3 Alpha video generator and can reliably create images that stay true to a particular aesthetic.

Runway's Frames model has the potential to revolutionize the way we create and edit images and make it easier for content creators to bring their visions to life.

Nvidia's Fugatto

Nvidia has unveiled a model called "the world's most flexible sound machine," dubbed Fugatto. The chip giant's model can create a mix of music, voices, and sounds from a text description and a collection of audio files. It can even generate things that don't exist in the real world.

Fugatto has the potential to revolutionize the way we create and experience sound and make it easier for musicians and content creators to bring their ideas to life.

On October 21, a significant event took place in the world of Nasdaq traders. A new ticker, NBIS, emerged, a truncation of Nebius. This fledgling player in the AI cloud infrastructure space had been quietly building its presence. Many casual observers might have wondered where this company came from, as there was little of the usual fanfare surrounding startups' IPO journeys. But Nebius is an unusual beast, a public company with the essence of a startup.

Yandex's Journey and the Birth of Nebius

Nebius actually became public 13 years ago in May 2011 as Yandex N.V., the Dutch holding company of the Russian internet giant Yandex, often dubbed the "Google of Russia." At the end of 2021, it reached a peak valuation of $31 billion. However, with Russia's invasion of Ukraine in early 2022, everything changed. Nasdaq halted trading in Yandex N.V. shares due to sanctions, and a year later, it announced the delisting. But Yandex successfully appealed, undergoing a restructuring process that took an additional 16 months to complete. This divestment led to the creation of Nebius AI, an AI cloud platform with its own Finnish data center.The new business was spearheaded by Arkady Volozh, the Russian Yandex co-founder and former CEO. He was removed from a European sanctions list in March after publicly condemning Russia's assault on Ukraine.

The Core Nebius Business

The core Nebius business sells GPUs "as-a-service" to companies in need of "compute" - processing power and resources for tasks like running algorithms and executing machine learning models. Last month, the company debuted a holistic cloud computing platform for the "full machine learning lifecycle," spanning data processing, training, fine-tuning, and inference.With the restructuring complete and Volozh free to run the show from the new headquarters in the Netherlands, Nasdaq green-lit Nebius to recommence trading last month. This situation was unprecedented - a public company with trading paused and now resuming under a new name and different business proposition.In the first month of trading, Nebius had a somewhat tepid re-entry. It was significantly down on its $18 billion market cap before trading halted in February 2022, but it has since yo-yoed between $3.5 billion and $4.75 billion, showing signs of settling.Volozh explained to TechCrunch that building infrastructure is capital intensive, and the public markets are the easiest and cheapest way to access capital in the current hot tech space. But there was uncertainty about how the public markets would respond to this new entity.

Nebius' Competitors and Expansion Plans

Nebius competes with the usual hyperscaler cloud behemoths. Its more direct rivals are other alternative cloud startups like CoreWeave, which has raised a lot of cash this year. While CoreWeave is expanding from the U.S. to Europe, Nebius is moving in the opposite direction, announcing plans to extend its presence to the U.S. with a new GPU cluster in Kansas City (on the Missouri side) scheduled to go live in early 2025. The company has also opened "customer hubs" in San Francisco and Dallas and plans a third in New York by the end of the year.

The Nebius Group's Additional Businesses

Under the Nebius Group umbrella, there is a triumvirate of additional businesses. Avride is an autonomous vehicle company based in Texas, descending from Yandex's self-driving unit. It was an early trailblazer in Russia but had its plans disrupted by the war. The team working on Yandex's autonomous vehicle project transitioned to Avride last year and is now based in Austin via Tel Aviv.Last month, Avride announced a significant multiyear partnership with Uber, with its sidewalk food delivery robots landing on Uber Eats in Austin and its self-driving cars set to join the Uber platform later.Toloka is a platform specializing in data labeling and quality control for large language models and related AI systems. It has clear synergies with Nebius's core infrastructure business but serves different customers. Nebius works with generative AI startups seeking compute, while Toloka works with bigger companies like Amazon and Hugging Face.TripleTen, on the other hand, is a direct-to-consumer product offering online coding bootcamps for those looking to transition into the technology sector. It is somewhat of an outlier in the Nebius group and is currently breaking even. Volozh acknowledges it won't be a big revenue driver like the infrastructure business but has potential to provide meaningful income and will remain part of the group.

The Core Nebius AI Cloud Business

For the core Nebius AI cloud business, the company already has its fully owned data center facility in Finland and plans to triple its capacity to 75 megawatts. It is also building out additional sites at co-location facilities to increase capacity and reduce latency by bringing processing closer to customers. In addition to the Kansas location announced this week, Nebius has already unveiled a new GPU cluster in Paris that goes live this month.Further down the line, Nebius plans to build more of its own data centers in both Europe and the U.S. Given the time it takes to build, using co-location facilities is a quicker way to bridge the gap, which is why it is adopting a hybrid approach.Volozh emphasized that building data centers takes a long time, about a year and a half to two years, and they can't wait. That's why they have these co-locations in Paris and Kansas City.