

Creating an AI model is a commercial work. They’re made to make money. Now these models are dependent on other artists data to train on. The models would be useless if they weren’t able to train on anything.
I hold the stance that using copyrighted data as part of a training set is a violation of copyright. That still hasn’t been fully challenged in court, so there’s no specific legal definition yet.
Due to the requirement of copywritten materials to make the model function I feel that they are using copyrighted works in order to build a commercial product.
Also AI doesn’t learn. LLMs build statistical models based on sentence structure of what they’ve seen before. There’s no level of understanding or inherent knowledge, and there’s nothing new being added.
Too bad
If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.
Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.