Preet Bharara:
From Cafe and the Vox Media Podcast Network, this is Stay Tuned In Brief. I’m Preet Bharara. In recent weeks, there’s been a lot of debate over the risks that artificial intelligence poses to many things, including intellectual property rights. A number of content owners have filed lawsuits against AI developers, accusing them of infringing copyrights and trademarks. Meanwhile, YouTube is partnering with record labels to set rules for how AI-generated music is treated on its website. News organizations like The New York Times have begun to block AI web crawlers from accessing information on their sites. Joining me to discuss these issues is Nilay Patel. He’s editor-in-chief of The Verge and host of the podcast Decoder. Nilay, welcome to the show.
Nilay Patel:
Thanks for having me.
Preet Bharara:
We’re going to have a brief conversation about AI, but we should note for folks that you and I will be at the Code Conference in California, where a lot of the discussion will be about AI, won’t it?
Nilay Patel:
Yeah. We’re going to have the CTO of Microsoft, Kevin Scott. They’re deeply invested in AI, particularly OpenAI, which makes ChatGPT, all the way to the CEO of Getty Images, Craig Peters, which is suing a company called Stability AI for using copyrighted Getty photos to train their AI systems. I think we’re going to have a pretty rich conversation at Code.
Preet Bharara:
I think it’ll be a lot of fun and very instructive. I want to get to the intellectual property and copyright issues in a moment. Given what went on in the last few days, we saw something fairly extraordinary. Senator Schumer, my old boss, convened a forum, the AI Insight Forum, in Washington, D.C., that more than two-thirds of all United States Senators attended, and basically a who’s who of technology and AI people. Elon Musk, who gets talked about from time to time, said about that that it was an historic gathering.
He also, in talking about AI, said, “The question is really one of civilizational risk.” He said, “There is some chance that is above zero that AI will kill us all. I think it’s low, but if there’s some chance, I think we should also consider the fragility of human civilization.” What’s your reaction to the forum? Was it as big a deal as people are saying, or not?
Nilay Patel:
I think it’s a big deal when the tech industry acknowledges that the government has a role to play in regulating anything. Until now, the common view in Silicon Valley is the government is too slow, doesn’t understand the workings of technology or technology markets, and by the time they solve any one problem, the tech industry will have moved on to the next problem. “That stuff is old news.” This is one of the first times I can think of that is really forward-looking. You have a bunch of people.
Preet Bharara:
We said one of the first times. Can you think of any other time where there has been a consensus among I guess you can call them technologists or tech entrepreneurs, businesspeople, that government must get involved?
Nilay Patel:
Outside of infrastructure buildouts, capital-intensive telecom buildouts, “We need more fiber,” it’s really hard to see a moment where the government’s interests and political will aligned with the industry’s interests and political will. The example I would give you, we lived through years of talking about privacy regulation and social media regulation in the Trump era, and one, I think everyone just forgot that the First Amendment existed, right? Facebook is like, “Please regulate us.” There’s a huge barrier to doing that. The government cannot pass these speech regulations, but the political will in the government to do it was just not there, even though both sides wanted to.
Preet Bharara:
Did anything in particular get accomplished at this forum?
Nilay Patel:
I don’t think so. I think we’re all doing a bunch of posturing and nodding at each other. There’s certainly no legislation that is anywhere close to passing. I do think that it is very important for the government to know these characters, to convene these summits …
Preet Bharara:
Characters is a good phrase.
Nilay Patel:
… they’re all full of characters … and to make them people instead of cartoons. I think one of the things that happened to say Mark Zuckerberg is he became a cartoon, instead of a person with a very big business to run. You could assign him as the cartoon, any villain you wanted to, without ever understanding what his motivations are. I do think there’s something here that’s important, that Senator Schumer knows a lot of these people now, and feels like they’re actual people who are making decisions about very important technology. There is a long road between what we see now with AI systems and it killing us all. We’re nowhere down that road, and just the fact that the industry is like, “Maybe that road should have some speed limits on it,” is a pretty notable shift.
Preet Bharara:
Good. Since we’re not going to die immediately, let’s talk about some of these, to me, very interesting copyright and intellectual property issues. Let’s start with how it is that a chatbot or a large language model such as ChatGPT knows what it knows, how it gets trained, because I think some of this emanates just from that. Can you explain to folks how a chatbot like ChatGPT, or another example, learns?
Nilay Patel:
Yeah, and these are really interesting words to use in the context of these systems. ChatGPT doesn’t know anything. It’s not actually reasoning. It’s doing something that can often look like reasoning or be substituted for reasoning. Fundamentally, what a transformer model does … it’s the T in the GPT … is very fast predictive statistics. These are auto-complete systems, at just a massive state.
Preet Bharara:
It’s about language, not reasoning or thinking?
Nilay Patel:
Yeah. There’s an argument here now as they get more advanced that something like reasoning could be said to be happening, but it’s so esoteric as to be not relevant in the current moment. Fundamentally, it’s basically saying, “Okay, you’ve asked me for some output. I’ve ingested an enormous amount of data, and I know the statistical relationships between words.”
Preet Bharara:
That enormous amount of data, talk about that. What is the data that has been ingested?
Nilay Patel:
Depending on the system that you’re talking about, depending on the model you’re talking about, it could be the entire World Wide Web, right? It stands to reason that Google and OpenAI have ingested most of the web. These companies are being sued. OpenAI is being sued for having ingested pirated collections of books. Sarah Silverman and others say, “Hey, you ingested our books into the ChatGPT training dataset from a pirated source. That’s copyright infringement.”
Preet Bharara:
Put aside the pirated source. I guess the question is, if included in the materials that are ingested by ChatGPT or some other large language model are copyrighted works, whether articles or books, pirated or otherwise, is the argument that’s being made that the ingested copyrighted works cause to be illegal the output you generate from that learning and training?
Nilay Patel:
Yes, but I think it’s even dumber than that. Copyright law is very dumb.
Preet Bharara:
You have a view?
Nilay Patel:
I have a view. I’m a failed copyright lawyer. I was no good at it. I hated doing other people’s paperwork, but I went to become a copyright lawyer because I was so interested in technology, and almost every new technology is gated by copyright law. It is the dynamic of the industry, is all computers do is make copies. At every turn, at every step change of technology, copyright comes in and says, “Well, don’t make these copies.” That is the cassette tape and that’s the VHS and that’s Napster, and on and on and on.
Fundamentally, if you are not authorized to make a copy, a copy from the page of a book to your camera roll on your iPhone, a copy from memory to your display, these are all things that have been litigated, and the courts have had to come and say, “Okay, these copies are okay and these copies aren’t.” In this case, some of the early lawsuits are saying, “Look, even just copying this material into your computer so that you might do some training is unauthorized use.” Then there’s the next turn, which is now you’ve copied them all. Even if we concede that that is fair use, which no one wants to do, the output is an illegal derivative work, and that we need to be compensated for.
There’s just no norms here. There’s no boundaries in an industry. There’s no set of previous contracts people can look at. I don’t think there’s even a lot of good faith between all these parties to even have these conversations. I think we’re just full-on into litigation and wanting courts to determine what is and isn’t fair use here. That first step, “You copied my book into your computer,” that’s the first thing that’s being litigated.
Preet Bharara:
You reminded me of something that I’d forgotten about, and I should mention incidentally, since you made a concession along these lines, I think Copyright was my worst class in law school. I did take it, and it was not a great moment for me.
Nilay Patel:
Yeah, copyright law is inherently counterintuitive. The folk copyright law that people believe they exist in almost has nothing to do.
Preet Bharara:
I now vaguely remember, in the news when I was young, hearing about the issue of VHS tapes, which young people may not know what they are, but that was litigated favorably towards the making of those tapes.
Nilay Patel:
Right, so that’s Sony versus the movie studios. There was a standard that emerged from the Supreme Court in that case. It was actually Betamax when litigated. Doesn’t matter, but video recording broadly, home taping of television. The standard was this product can exist, the product of the home video recorder can exist, because it has substantial non-infringing uses. Then years later, the Napster case. Napster’s in front of the Supreme Court arguing that Napster has substantial non-infringing uses, and the Court comes up with yet another standard that goes into the Gloucester case.
All of that, the reason people are bad at this … and everyone’s bad at it … is because it is inherently counterintuitive to our actual experiences. You just have to go on YouTube. YouTube is rife with copyright infringement. It’s rife with people dodging the automated copyright enforcement mechanisms, and they’re all writing in the descriptions of these videos, “No copyright intended.”
Preet Bharara:
Maybe this is analogizable to the AI issues before us today. I can make a copy of a TV show on my VHS cassette or my Betamax cassette, but if I then copy that cassette a thousand times and try to sell them for 10 bucks a pop, that would violate a law, correct?
Nilay Patel:
Yeah. First of all, just the fact of that first copy, because it’s on Betamax with this kind of product, there is a court case that says that’s okay. Then, now you’re distributing, you’re making even more copies. All of those copies, each individual copy you’re making, is unauthorized, and that’s illegal. You take that, by the way, and you say, “Now I have a TiVo. My TiVo, instead of having a tape in it, has a hard drive in it, so we’ve got to go litigate that again,” and that all got litigated yet again. That’s what I mean. It’s unintuitive, because the activities are the same but the technologies are different.
Preet Bharara:
Right, but that’s not what AI is doing. AI, it seems to me, is ingesting all this stuff … and that’ll be litigated again … and then its output is not 100 videotapes of some copyrighted material. Its output, depending on what the query is, is something that is informed by all the ingestion that it has done, in the same way that you or I or any other content maker spews out, in part, stuff that we’ve digested over time, the books we’ve read, the TV shows we’ve watched, the movies we’ve attended. Is that the argument? It’s not a legal argument. Is that the plain-language argument against these lawsuits?
Nilay Patel:
To some extent, yeah, right? Look, in order to learn how to play music, you have to listen to and practice a lot of music. The difference is … this is where the tech industry often fails to remember they’re not regular people … you are not a computer. Famously … you were a prosecutor … people have horrible memories. They’re not reliable. You can ingest. You can watch Harry Potter a million times. You cannot spit Harry Potter back out.
Preet Bharara:
Right. There’s a greater dilution given the feebleness of our minds, in a way.
Nilay Patel:
Right, and that is the power of our minds as well. This is what generative AI has done, is it’s made computers appear to be creative in the way that people can be creative. The mechanism is different, though. You can wander the Earth taking in all of the art that has ever existed, and you can have it in your head. Then when you go to make art, you will change it, not because your mind is feeble but because your mind is powerful. Maybe some of it is you’re filling in gaps, but …
Preet Bharara:
Well, your mind is powerful.
Nilay Patel:
… your ability to fill in gaps is the essence of creativity. Your ability to say, “I will push this farther,” is the essence of creativity. Generative AI is just a statistical representation of the past. The argument that this is how people learn and so we should allow computers to learn in the same way falls down when you actually look at what is happening, because a computer is just copying all of that data and then saying, “Okay, in response to some output, I will statistically recreate what I have in my memory.”
Preet Bharara:
Let’s talk about some actual litigations and how you think they’ll unfold. I’ll pick the one because it has a famous plaintiff, Sarah Silverman, the comedian, who I’m a fan of. Tell us about her suit and similar suits and how they’re going.
Nilay Patel:
Sarah Silverman and a group of authors have sued OpenAI and potentially others, that we’re in the stage of a new technology lifecycle where law firms devoted to litigation, like specific kinds of litigation, are beginning to form. They have one of those law firms, and their complaint is pretty dead ahead. It says, “Look, we know our material is in your dataset, because when we ask about it, you can tell us about it. ChatGPT can tell us about it. That means you have a copy of it. At no point did we authorize that copy.
“Then, based on information and belief, we’re looking at the data sources that may or may not be here. It appears that you didn’t even buy the book. You just used a dataset online full of pirated books,” so that there’s some potential other claim there. I don’t know how this is going to go. The thing that I want to emphasize is that anyone who tells you what a fair use litigation outcome will be in the current moment is absolutely getting paid by one side or the other, because fair use right now …
Preet Bharara:
Do you think it’s up in the air?
Nilay Patel:
I think fair use right now is more of a coin flip than it has ever been. You need only look at the music industry, which is full of sampling, which is full of interpolation, which is full of quotes, and the norms that have developed in the music industry because fair use litigation is so unreliable. Famously, Pharrell and Robin Thicke wrote Blurred Lines. The estate of Marvin Gaye sues them, not for any actual copyrighted infringement … none was ever alleged … but just because the vibe of the song was like a Marvin Gaye song, and they lost. Robin Thicke and Pharrell lost that case. That’s crazy. That is a crazy outcome. You look at Ed Sheeran, who just got sued, and he’s saying, “Look, if I lose this case, there’s no point in making music.” He won. Well, I couldn’t tell you the difference. I couldn’t distinguish them.
Here’s my favorite example. I’ll just give this to you because it’s good and it will stick in your head. The Thong Song, by Sisqo. At one point in Sisqo, he says just the words, “She was Livin’ la Vida Loca,” and Ricky Martin’s songwriter owns most of the publishing rights to The Thong Song because of these few words, because they didn’t want to litigate.
Preet Bharara:
What about this example that became viral and famous recently of an AI-generated song by Drake?
Nilay Patel:
Yeah, so this is what’s really interesting here. The AI-generated Drake song represents an enormous problem for the music industry, for distributors like YouTube, for Spotify, for all these companies. Because if you are a tech company and you believe that ingesting vast sums of material to train your AI is fair use, then there’s nothing wrong with the fake Drake song. “We’ve ingested all of Drake’s music, and we’ve had a computer generate a Drake song.” Well, that is exactly what Google is doing with the web. Google owns YouTube. They’ve got to look at this and say, “Okay, how do we split the difference here? How do we make it so it’s okay when we do it, but our existentially important partners in the labels feel okay about this?”
Preet Bharara:
What’s the law going to say about that?
Nilay Patel:
Right now, the law says nothing about it. It’s just a blank canvas of pending fair use litigation, because fair use is supposed to be determined on a case-by-case basis. Like I said, it’s such a coin flip that in particular the music industry has developed its own standards and practices, its own private fair use copyright standards to dole out dollars in publishing and songwriting credits, because it cannot rely on the courts. I think here what YouTube is doing is they’ve partnered with Universal Music and they’re going to develop their own system, where they say, “Okay, we can detect something that sounds like Drake’s voice. We will send some pennies to Universal Music,” Drake is a Universal artist, “whenever we detect something that sounds like Drake’s voice.”
I think that is, by the way, extraordinarily dangerous for platforms to have private copyright law, because it means if you are a kid who can just impersonate Drake, now you might be paying some of your revenue to Universal Music for no reason at all, because the automated system is going to have some error rate. We don’t know how it might distinguish between a good impersonation of Drake and an AI impersonation of Drake.
Preet Bharara:
What’s the Getty Images case about?
Nilay Patel:
The Getty Images case is actually maybe the easiest one for them to win. Again, I think all of this is a coin flip, but the reason I say it’s probably the easiest, Stability AI trained its AI on a bunch of Getty images, and when you ask it to spit out an image, because it’s just statistically generating the pixels, it sometimes includes a Getty watermark. I think if you put that in front of a jury and say, “Is this fair, that it’s making fake Getty images because they illegally copied all of ours,” a jury’s going to say of course not. That one is so intuitively understandable to I think the average person who might be in a jury that it feels like the one Getty is most likely to win, whereas I think all the rest of them are coin flips.
Preet Bharara:
Now, on the other side, there’s the question of whether AI-generated content is itself copyrightable, right? Is it?
Nilay Patel:
The Copyright Office right now is leaning towards no, and the parallel it draws is a very famous case that we at The Verge love talking about. It’s the monkey selfie case. Monkey takes a selfie, and then there’s a long string of litigation about whether that image can be copyrighted.
Preet Bharara:
By the monkey?
Nilay Patel:
By the monkey. It’s a great case.
Preet Bharara:
That’s a smart monkey.
Nilay Patel:
Incredible case. I encourage everyone to go read the history of it. It’s deeply funny. At every turn, right, the United States Justice system is contending with the monkey selfie, like very seriously. It’s deeply hilarious. Eventually the courts come to the conclusion, “Look, the monkey’s not a person. This was not an active creation. He just hit the button. We don’t know what. There is no connection between the action and the creation of art.” “We’re just not doing this,” basically is what they say. That is what they’re saying about AI right now. “Look, you can tell AI to make 5,000 photographs, but you were not involved in the creation of them. The AI is just doing it by rote. There’s all these underlying questions. We are not allowing this to happen right now.”
Preet Bharara:
Wait, so that would mean that all of AI-generated content is in the public domain.
Nilay Patel:
To some extent. I think there’s a pretty big question about the boundary between how much you guide the AI and what then is copyrightable to you. In these early cases, the artists have been saying, “Look, I did nothing,” right? These are test cases, and the Copyright Office is saying, “Look, you’re telling us you didn’t do anything, so of course not.” Now, there’s some other boundary here, where I ask ChatGPT to write the first draft of something and I go and change it a bunch, and now it’s some hybrid creation. I don’t think we have an answer to that question yet.
Preet Bharara:
Is there some other legal question that we haven’t yet discussed that you think is either very, very important or very, very interesting in this area?
Nilay Patel:
Yeah. I think fundamentally, if we’re starting from a position of AI might kill us all, then the question of who gets to use it becomes very important. The question of how we might enforce regulations on who gets to use it are almost impossible to contend with. Fake Drake, for example. There’s some software in the world that can just generate a Drake song. Okay. Maybe you want that software to be illegal. How? Are you going to tell some kid that they can’t run arbitrary code in their MacBook? How? Are you going to tell Apple that it has to regulate the use of arbitrary code in the MacBook? That’s a huge step. That’s a step that probably implicates the First Amendment in some way. How do you stop it? What is the mechanism of saying, “We will not allow this code to be run on computers,” even if we all agree it could kill us all?
The actual step between the policy and the enforcement of the policy, I think, is completely being skimmed over in all these conversations. I think even if you start with copyright law, we don’t want the kids to make fake Drake songs. Well, YouTube can stop it. They can scan the stuff and not allow it to be distributed, but it doesn’t mean it’s going to stop being made. That I think is the hardest legal question of all, is how do you enforce or prohibit the execution of arbitrary computer code. Because basically, we have not been able to do it in all of history.
Preet Bharara:
We are out of time, and I want to confess error on something. That is, it was ludicrous of me to think that 20 minutes was enough to discuss these issues to any level of satisfaction. We will have to have more of this conversation again. Nilay Patel, thanks for being on the show, and I will see you and we’ll discuss a lot more of this at the Code Conference in a couple of weeks.
Nilay Patel:
It was a pleasure. I’ll see you soon.
Preet Bharara:
For more analysis of legal and political issues making the headlines, become a member of the Cafe Insider. Members get access to exclusive content, including the weekly podcast I host with former U.S. Attorney Joyce Vance. Head to cafe.com/insider to sign up for a trial. That’s cafe.com/insider.
If you like what we do, rate and review the show on Apple Podcasts or wherever you listen. Every positive review helps new listeners find the show. Send me your questions about news, politics, and justice. Tweet them to me at Preet Bharara with the hashtag #AskPreet. You can also now reach me on Threads, or you can call and leave me a message at 669-247-7338. That’s 669-24-PREET, or you can send an email to letters@cafe.com.
Stay Tuned is presented by Cafe and the Vox Media Podcast Network. The executive producer is Tamara Sepper. The technical director is David Tatasciore. The senior producer is Adam Waller. The editorial producer is Noa Azulai, and the Cafe team is Matthew Billy, David Kurlander, Jake Kaplan, Nat Weiner, Namita Shah and Claudia Hernández. Our music is by Andrew Dost. I’m your host, Preet Bharara. Stay tuned.