Advertisement

Column: This AI chatbot was ‘trained’ using my books, but don’t blame me for its incredible stupidity

The black shape of a phone with the word "OpenAI" on it is shown in front of words on a page.
OpenAI’s ChatGPT chatbots are at the center of the controversy over the “training” of artificial intelligence systems with copyrighted works — without compensation for the authors.
(Michael Dwyer / Associated Press)
Share

I’ve just discovered that I am part of the AI chat revolution. Please don’t hate me.

My role is as the author of three of the nearly 200,000 books being pumped into the electronic brain of LLaMa, the chatbot developed and distributed by Meta Platforms (formerly Facebook), in competition with the better-known ChatGPT bots marketed by OpenAI.

Alex Reisner of the Atlantic compiled a handy search tool for the database, which is known as Books3, giving authors the world over an opportunity to hunt for their names and decide how to think about the results.

Would I forbid the teaching (if that is the word) of my stories to computers? Not even if I could.

— Stephen King

Advertisement

I haven’t quite decided for myself — on the one hand, I’m a bit peeved that only three of my seven books have been putatively used to “train” LLaMa; on the other, I’m given to pondering what my contribution should be worth, and why shouldn’t I get paid for it?

The reactions of others authors, prominent and not so prominent, have been all over the map. Some have expressed convincing outrage. They include the novelists John Grisham, George R.R. Martin, Scott Turow and others who are members of the Authors Guild and among the plaintiffs in a copyright infringement lawsuit filed against OpenAI, and Sarah Silverman, a plaintiff in a similar lawsuit against Meta Platforms.

Some have turned to social media to express their irritation or outright fury, including Margaret Atwood and the novelist Lauren Groff.

Then there’s the camp that asks, what’s the big deal? For example, Ian Bogost, the author or co-author of 10 books, mostly about game-playing, wrote a recent article for the Atlantic titled “My Books Were Used to Train Meta’s Generative AI. Good — It can have my next one too.”

Finally, there’s Stephen King, whose reaction to a database listing 87 of his works appears to be something akin to resignation. “Would I forbid the teaching (if that is the word) of my stories to computers?” he writes. “Not even if I could. I might as well be King Canute, forbidding the tide to come in.”

Advertisement

Is it foolish to believe that authors should resist having their works used to train AI without getting paid for it? Stephen King thinks so, but he misunderstands some important history.

Aug. 31, 2023

Before delving further into the legal issues, let’s take a detour into what the database and its usage means in the context of “generative AI,” the technology category to which these chatbots belong.

As I’ve written before, for these products the term “artificial intelligence” is a misnomer. They’re not intelligent in anything like the sense that humans and animals are intelligent; they’re just designed to seem intelligent to an outsider unaware of the electronic processes going on inside.

Indeed, using the very term distorts our perception of what they’re doing. They’re not learning in any real sense, such as creating perceptions of the world around them based on the information they already have in their circuits.

They’re not creative in any remotely human sense: “Creativity can’t happen without sentience,” King observes, though he hedges his bet by answering his own question of whether the systems are creative with the words, “Not yet.”

Will artificial intelligence make jobs obsolete and lead to humankind’s extinction? Not on your life

July 13, 2023

Chatbot developers “train” their systems by infusing them with the trillions of words and phrases present on the internet or in specialized databases; when a chatbot answers your question, it’s summoning up a probabilistic string of those inputs to produce something bearing a resemblance — often a surprising resemblance — to what a human might produce. But it’s mostly a simulacrum of human thought, not the product of cogitation.

What’s gratifying about the disclosure that Books3 has been used to “train” LLaMa is that it underscores how everything and anything spewed out by chatbots comes, at its core, from human sources.

Although OpenAI refuses to disclose what it uses to “train” ChatGPT, it’s almost certainly doing something similar. (Meta hasn’t formally acknowledged using Books3, but the database’s role was disclosed in a technical paper by LLaMa’s developers at the company.

Advertisement

Another important point to keep in mind is that none of this training has yet enabled developers to solve the most important and persistent problem with the chatbots: They get things wrong, often spectacularly so.

When they can’t find factual material to answer a question, they tend to make it up or cite irrelevancies; the answers’ resemblance to human thought and speech misleads users into taking them at face value, leading to not a few embarrassing and costly consequences.

That’s endemic in the AI field generally. As recently as Sept. 20, the prestigious journal Nature retracted a paper by Google researchers that had reported that an AI system needed only a few hours to design computer chips that required months of work by human designers. The paper’s author reportedly concluded that the opposite was true.

In my case, the sad truth is that however rigorously LLaMa was “trained” with my books, it didn’t seem to have learned much. Indeed, its responses to my questions showed it to be as much of an idiot as its cousins in the generative AI family.

The tech site CNET brought in an artificial intelligence bot to write financial articles, but the product turned out to be worthless and even unethical.

Jan. 25, 2023

When I asked what it knew about me, its answer was a melange of a biobox published on latimes.com, along with the mention of three books, none of which are listed in the Books3 database: one that isn’t by me (though I’m cited in its endnotes) and two that, from what I can tell, don’t exist at all. It did, however label me as “a highly respected and accomplished journalist who has made significant contributions to the field of journalism,” which suggests it isn’t entirely lacking in sagacity and sound judgment.

When I asked LLaMa to describe the three books that are in the Books3 database, its answers were assembled from boilerplate that could have come from the blurbs in the book covers, and outright, even bizarre, errors.

Advertisement

That brings us back to the concerns raised in the literary world. If the reactions by established writers seem confused, it’s mostly because copyright law is confusing. That’s especially true when the topic is “fair use,” a carve-out from authorial rights that allows portions of copyrighted works to be used without permission.

Fair use is what allows snippets of published works to be quote in reviews, summaries, news reports or research papers, or to be parodied or repurposed in a “transformative” way.

What’s “transformative”? As a digest from the Stanford libraries puts it, “millions of dollars in legal fees have been spent attempting to define what qualifies.... There are no hard-and-fast rules, only general guidelines and varied court decisions.”

That’s especially true when a new technology emerges, such as digital reproduction or, now, the training of chatbots.

The lawsuit filed against OpenAI by the novelists and the Authors Guild assert that OpenAI copied their works “wholesale, without permission or consideration [that is, payment],” amounting to “systematic theft on a grand scale.”

The authors observe that the U.S. Patent Office has found that AI “‘training’ ... almost by definition involve[s] the reproduction of entire works or substantial portions thereof.” They say that “training” is merely “a technical-sounding euphemism for ‘copying and ingesting.’”

Advertisement

The authors say that the OpenAI chatbots “endanger fiction writers’ ability to make a living,” because they “allow anyone to generate ... texts that they would otherwise pay writers to create.” The bots “can spit out derivative works: material that is based on, mimics, summaries, or paraphrases Plaintiffs’ works, and harms the market for them.”

Those are crucial assertions, because interference with the marketability of a copyrighted work is a key factor weighing against a fair-use defense in court.

AI sounds great, but it has never lived up to its promise. Don’t fall for the baloney.

Oct. 7, 2022

It’s worth mentioning that the encroachment of AI into the market for professional skills was a key factor in the recent strike of Hollywood writers, and remains so for the actors still on strike. Limitations on the use of AI are a major provision of the contract that settled the writers strike, and are sure to be part of any settlement with the actors.

The lawsuit brought by Silverman and her fellow plaintiffs against Meta tracks the Authors Guild case closely. It may not help Meta’s defense that Books3 is itself the alleged product of piracy; at least some of the works in it are drawn from illicit versions circulating on the web. Indeed, one host of the database took it offline following a complaint from a Danish anti-piracy organization.

Meta, in its response to the Silverman lawsuit, maintains that its use of Books3 is “transformative by nature and quintessential fair use.” (Its motion to dismiss the case is scheduled to be heard by a federal judge in San Francisco on Nov. 16.) The company says that the plaintiffs can’t point to “any example” of LLaMa’s output that reproduces any part of their work. That may be true, but it will be up to Judge Vincent Chhabria to decide whether it’s relevant.

Meta also implies that it’s doing the world a favor by building up LLaMa’s capabilities, which it says are among “the clearest cases of the substantial potential benefits AI can offer at scale to billions of people.” If this sounds a bit like Meta’s defenses against accusations that it has infringed on its users’ privacy for profit — that it’s only providing information to others who will make the world a better place — that’s probably not an accident.

Advertisement

Bogost argued in the Atlantic that training bots with published and copyrighted material shouldn’t require the originators’ permission — that it isn’t fundamentally different from what happens when a reader recommends a book to a friend or relative. “One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways,” he writes.

But in this context, that’s absurd. Recommending a book doesn’t involve copying it. Even lending or gifting a book to another is perfectly lawful, since at some point in the process the book was purchased, and some portion of the purchase price ended up in the author’s pocket.

That’s not the case here. OpenAI and Meta are commercial enterprises that expect to make a mint from their chatbots. To the extent they’re using copyrighted material to build their functionality, they owe something to the creators.

Maybe now I know what to think about the use of my books to “train” these machines, especially if no one in the Books3/Meta or OpenAI chain paid for them. It may be hard to discover what role they played in the “training,” but whatever it was, it shouldn’t come for free.

Advertisement