Discord cuts ties with Peter Thiel-backed verification software after its code was found tied to US surveillance efforts

mrmaplebar@fedia.io · 10 hours ago

Obviously an AI can’t work without being trained. Neither can a human.

This is a false equivalency that equates natural learning and human agency with “machine learning”, when in that they are not remotely the same. This is a common and extremely flawed personification of a mathematical system that simply does not “learn” in the same way that a human being does.

Contrary to what seems to be a popular belief today, the creative insight of a human artist is not simply a combination of all of the other works of art that they have seen (akin to training data superimposed into a model). A human artist has the x-factors of personal agency, taste, and the constant sensory barrage of simply living as a huge part of their creative development. For every painting that a human artist sees, they see an unknowable score of other things that influence their perception of the world and art.

This is very much not a legal point that you’re arguing here, by the way, it’s a technical and practical one.

I should note that it’s a very long-standing and well established principle that style cannot be copyrighted.

“Style” is not what’s in question. It never was, and it wasn’t a word that I used in my example.

ML models are not trained on “style”. They are trained on actual works.

And in many cases (including in OpenAI’s case) trained on an unimaginable amount of full copyrighted works, in their entirety, without license or consent from the copyright holders, often times pirated with DRM circumvented.

It’s a simple fact of the technology that OpenAI’s Ghibli filter could not have been made without training off of a large amount (probably every frame of every film, if I had to make an educated guess) of their actual artistic work. OpenAI have admitted that much themselves in court.

Okay, you think that. What do the judges think? That’s what it ultimately comes down to.

You seem to have forgotten that this is a social media website comments section discussion, not a court of law.

I’m sharing my personal opinion, with a background in art, music, and programming, not law.

I’m entitled to do so, and I won’t stop because it should go without saying that the copyright system matters a great deal to people who actually make things.

If you think you’re above that then I’m not sure why you’re even here, frankly. Are you here to argue that any of this is fair use? I don’t see you making that case… (Maybe slightly timidly making that case, but not really going for it.)

In the end this topic is central to human culture and society, it’s not some kind of intellectual exercise for only people in blue suits to muse about.

Welcome to “the court of public opinion”, where Texan judges and Roman popes alike can be wrong.

mrmaplebar@fedia.io · 11 hours ago

Fair enough, I see what you’re saying.

I’ll go ahead and share the quote from the court’s decision for context:

We affirm the denial of Dr. Thaler’s copyright application. The Creativity Machine cannot be the recognized author of a copyrighted work because the Copyright Act of 1976 requires all eligible work to be authored in the first instance by a human being. Given that holding, we need not address the Copyright Office’s argument that the Constitution itself requires human authorship of all copyrighted material. Nor do we reach Dr. Thaler’s argument that he is the work’s author by virtue of making and using the Creativity Machine because that argument was waived before the agency.

I’m a little bit uncertain based on this summary of the judgement by the Stanford library on copyright and fair use:

Dr. Thaler sought review of the Copyright Office’s decision in the United States District Court for the District of Columbia. The district court affirmed the Copyright Office’s denial, holding that human authorship is a fundamental requirement under the Copyright Act of 1976. The court also rejected Dr. Thaler’s argument that he should own the copyright under the work-made-for-hire doctrine, as the work was never eligible for copyright protection in the first place

Why are they saying that “the work was never eligible for copyright in the first place”? Because Thaler claimed that the AI itself made the work? This all feels a bit like Schroedinger’s Copyrighted Work to me… the work exists, so who made it?

Generative AI fans would have you believe that they are the author and copyright holder, because they wrote a prompt.

AI companies might want to argue, like Thaler, that they made the AI, so they are the author and copyright holder.

My personal opinion is that the prompt and code are both relatively insignificant in comparison to the training data from which the probabilistic machine learning model is derived. The prompt would do nothing without the model, and OpenAI themselves said they quiet part out loud when they argued in court that the creation of a model such as theirs would be “impossible” to achieve without training off of vast amounts of copyrighted works.

“It would be impossible to train today’s leading AI models without using copyrighted materials … Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens,”

Clearly the training data itself is the most important piece of the system, which makes a lot of sense to those of us who understand how machine learning and “AI” training actually works on a technical level. They’ve admitted in plain English that their entire product and for-profit business model relies on the use of other people’s work as training data. Sounds to me like they have derived considerable value from other people’s work without any sort of license or compensation…

By that logic alone, I would argue that the real copyright holders of generative AI works ought to be, at least in part, the people who provided (wittingly or unwittingly) the training data. They are the ones who made this whole social experiment possible, after all. Data is the new code, so I’m not sure why people expect to be able to use it for free in an unrestricted way.

mrmaplebar@fedia.io · 11 hours ago

Yeah. But there’s always the risk of being undercut by someone or something cheaper if you’re operating in a workplace with zero standards. After all, you could write a lot of articles if you didn’t give a rat’s ass about the veracity or quality of the information within.

Good newsrooms are supposed to have standards–that’s what makes them good.

If this the people at Ars had done their jobs to a high standard, the article in question wouldn’t have been written like that in the first place, let alone edited and published as is. They want to fire the writer in question, and the writer wants to blame being sick, but the fact remains that the publishing of that article reveals a systemic problem with how Ars are operating, and a total lack of editorial standards.

mrmaplebar@fedia.io · 11 hours ago

I’m not a lawyer, maybe you are. I can’t fully speak to the legalities at play.

But I am a programmer, and speaking technically, AI simply cannot produce an output without consuming other works to be used as training data. In many cases, the training data includes full copyrighted works (images, books, music, etc.) in their entirety.

I’m also an artist and musician, and someone who takes the matter of copyright seriously as any person who creates things should.

There are cases where it’s been ruled fair use.

I’m not sure what the relevance of that is. From what I understand, the scope of those judgments are limited to the specific context of those uses, as well as the jurisdiction in which they were made, right?

One use might be deemed fair based on the specifics of that particular case, but that doesn’t preclude that all uses of AI are fair, or even that a different/higher court might come to a different conclusion. After all, the opinions of a court are just that, opinions.

Reasonable people can disagree with the conclusions of a court, and until this reaches the height of the SCOTUS I don’t think we can pretend like it’s settled law. (And even then, they don’t seem particularly bound to any precedent…)

It’s worth noting, for the sake a more complete discussion, this draft report from the United States Copyright Office from May 2025, that many applications of generative AI are unlikely to be considered fair use when reasonably weighing all of the various factors:

We observe, however, that the first and fourth factors can be expected to assume considerable weight in the analysis. Different uses of copyrighted works in AI training will be more transformative than others. And given the volume, speed and sophistication with which AI systems can generate outputs, and the vast number of works that may be used in training, the impact on the markets for copyrighted works could be of unprecedented scale.

As generative AI involves a spectrum of uses and impacts, it is not possible to prejudge litigation outcomes. The Office expects that some uses of copyrighted works for generative AI training will qualify as fair use, and some will not. On one end of the spectrum, uses for purposes of noncommercial research or analysis that do not enable portions of the works to be reproduced in the outputs are likely to be fair. On the other end, the copying of expressive works from pirate sources in order to generate unrestricted content that competes in the marketplace, when licensing is reasonably available, is unlikely to qualify as fair use. Many uses, however, will fall somewhere in between. [Emphasis mine.]

Going off of basic logic alone…

I think if you look at something as blatant as the OpenAI Studio Ghibli filter, it’s very clear that the works that were used in training could have been, and almost certainly should have been licensed from Studio Ghibli for the creation of such a feature, especially considering the output images from those for-profit tools could feasibly be used without restriction without even the most basic consent from Studio Ghibli as a whole (or the individual artists who, in Japan, may have some claim of copyright over the individual contributions, iirc).

How can anyone reasonably argue that this is a “fair” way to use Studio Ghibli’s works?

I guess the courts will decide, potentially swayed by the political and corporate interests of our time. But speaking personally, it doesn’t pass the smell test to me…

mrmaplebar@fedia.io · 12 hours ago

To be fair to Ars Technica, that doesn’t sound like the case to me.

The “journalist” in question seems to be suggesting that this was their own bad judgment to use AI to “find relevant quotes” from the source material.

Having said that, there’s also a senior editor on the by-line who hasn’t been held accountable for clearly failing to do their job, which as I understand it, is to read, edit and verify the contents of the article. So in a way Ars seems to have a problem with quality whether or not the use of AI was mandated.

mrmaplebar@fedia.io · 12 hours ago

In this case it was very much NOT “damned if you do, damned if you don’t”–It’s just don’t.

As a journalist it’s your whole fucking job to do the research and report things accurately and truthfully. There’s no reason at all the “journalist” in question here should have had an AI generated anything for his shitty article.

The fact that this was a story on AI misuse in the first place only adds insult to injury.

mrmaplebar@fedia.io · 15 hours ago

Plaintiff Stephen Thaler had appealed to the justices after lower courts upheld a U.S. Copyright Office decision that the AI-crafted visual ⁠art ‌at issue in the case was ineligible for copyright protection ⁠because it did not have a human creator.

That’s what the article says. What are you saying this case was about?

mrmaplebar@fedia.io · 22 hours ago

Why would anyone think that they could copyright something that they didn’t make?

Maybe you can trademark the prompt or whatever, but in the end of the day, you didn’t make shit, so why would you own the copyright?

In the immortal words of everyone ever, pick up a fucking pencil.

mrmaplebar@fedia.io · 7 days ago

Discord cuts ties with Peter Thiel-backed verification software after its code was found tied to US surveillance efforts

mrmaplebar

Discord cuts ties with Peter Thiel-backed verification software after its code was found tied to US surveillance efforts

Discord cuts ties with Peter Thiel-backed verification software after its code was found tied to US surveillance efforts