@BURN

BURN@lemmy.world · 2 years ago

Too bad

If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.

Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.

BURN@lemmy.world · 3 years ago

Creating an AI model is a commercial work. They’re made to make money. Now these models are dependent on other artists data to train on. The models would be useless if they weren’t able to train on anything.

I hold the stance that using copyrighted data as part of a training set is a violation of copyright. That still hasn’t been fully challenged in court, so there’s no specific legal definition yet.

Due to the requirement of copywritten materials to make the model function I feel that they are using copyrighted works in order to build a commercial product.

Also AI doesn’t learn. LLMs build statistical models based on sentence structure of what they’ve seen before. There’s no level of understanding or inherent knowledge, and there’s nothing new being added.

BURN@lemmy.world · 3 years ago

It may be freely available for non-commercial works, eg. Photos on Photobucket, internet archive free book archives, etc.

Most everything is on the internet these days, copyrighted or not. I’m sure if I googled enough I could find the entire text of Harry Potter for free. I still haven’t purchased it, and technically it’s not legally freely available. But in training these models I guarantee they didn’t care where the data came from, just that it was data.

I’m against piracy as well for the record, but pretty much everything is available through torrenting and pirate sites at this point, copyright be damned.

BURN@lemmy.world · 3 years ago

LLMs don’t create anything new. They have limited access to what they can be based on, and all assumptions made by it are based on that data. They do not learn new things or present new ideas. Only ideas that have been already done and are present in their training.

BURN@lemmy.world · 3 years ago

But they don’t purchase the data. That’s the whole problem.

And copyright is absolutely violated by training off it. It’s being used to make money and no longer falls under even the widest interpretation of free use.

BURN@lemmy.world · 3 years ago

They’re stealing a ridiculous amount of copyrighted works to use to train their model without the consent of the copyright holders.

This includes the single person operations creating art that’s being used to feed the models that will take their jobs.

OpenAI should not be allowed to train on copyrighted material without paying a licensing fee at minimum.