Riskable

Riskable@programming.dev · 2 days ago

If you hired someone to copy Ghibli’s style, then fed that into an AI as training data, it would completely negate your entire argument.

It is not illegal for an artist to copy someone else’s style. They can’t copy another artist’s work—that’s a derivative—but copying their style is perfectly legal. You can’t copyright a style.

All of that is irrelevant, however. The argument is that—somehow—training an AI with anything is somehow a violation of copyright. It is not. It is absolutely 100% not a violation of copyright to do that!

Copyright is all about distribution rights. Anyone can download whatever TF they want and they’re not violating anyone’s copyright. It’s the entity that sent the person the copyright that violated the law. Therefore, Meta, OpenAI, et al can host enormous libraries of copyrighted data in their data centers and use that to train their AI. It’s not illegal at all.

When some AI model produces a work that’s so similar to an original work that anyone would recognize it, “yeah, that’s from Spirited Away” then yes: They violated Ghibli’s copyright.

If the model produces an image of some random person in the style of Studio Ghibli that is not violating anyone’s copyright. It is not illegal nor is it immoral. No one is deprived of anything in such a transaction.

Riskable@programming.dev · 2 days ago

I think your understanding of generative AI is incorrect. It’s not just “logic and RNG”…

If it runs on a computer, it’s literally “just logic and RNG”. It’s all transistors, memory, and an RNG.

The data used to train an AI model is copyrighted. It’s impossible for something to exist without copyright (in the past 100 years). Even public domain works had copyright at some point.

if any of the training data is copyrighted, then attribution must be given, or at the very least permission to use this data must be given by the current copyright holder.

This is not correct. Every artist ever has been trained with copyrighted works, yet they don’t have to recite every single picture they’ve seen or book they’ve ever read whenever they produce something.

Riskable@programming.dev · 6 days ago

I’m still not getting it. What does generative AI have to do with attribution? Like, at all.

I can train a model on a billion pictures from open, free sources that were specifically donated for that purpose and it’ll be able to generate realistic pictures of those things with infinite variation. Every time it generates an image it’s just using logic and RNG to come up with options.

Do we attribute the images to the RNG god or something? It doesn’t make sense that attribution come into play here.

Riskable@programming.dev · 25 days ago

They created golems powered by compressed air instead of magic.

Definitely marking this down in my mental, “in case of Isekai” notes.

Riskable@programming.dev · 27 days ago

If you studied loads of classic art then started making your own would that be a derivative work? Because that’s how AI works.

The presence of watermarks in output images is just a side effect of the prompt and its similarity to training data. If you ask for a picture of an Olympic swimmer wearing a purple bathing suit and it turns out that only a hundred or so images in the training match that sort of image–and most of them included a watermark–you can end up with a kinda-sorta similar watermark in the output.

It is absolutely 100% evidence that they used watermarked images in their training. Is that a problem, though? I wouldn’t think so since they’re not distributing those exact images. Just images that are “kinda sorta” similar.

If you try to get an AI to output an image that matches someone else’s image nearly exactly… is that the fault of the AI or the end user, specifically asking for something that would violate another’s copyright (with a derivative work)?

Riskable@programming.dev · 27 days ago

…in the same way that someone who’s read a lot of books can make money by writing their own.

Riskable@programming.dev · edit-2 27 days ago

I wasn’t being pedantic. It’s a very fucking important distinction.

If you want to say “unethical” you say that. Law is an orthogonal concept to ethics. As anyone who’s studied the history of racism and sexism would understand.

Furthermore, it’s not clear that what Meta did actually was unethical. Ethics is all about how human behavior impacts other humans (or other animals). If a behavior has a direct negative impact that’s considered unethical. If it has no impact or positive impact that’s an ethical behavior.

What impact did OpenAI, Meta, et al have when they downloaded these copyrighted works? They were not read by humans–they were read by machines.

From an ethics standpoint that behavior is moot. It’s the ethical equivalent of trying to measure the environmental impact of a bit traveling across a wire. You can go deep down the rabbit hole and calculate the damage caused by mining copper and laying cables but that’s largely a waste of time because it completely loses the narrative that copying a billion books/images/whatever into a machine somehow negatively impacts humans.

It is not the copying of this information that matters. It’s the impact of the technologies they’re creating with it!

That’s why I think it’s very important to point out that copyright violation isn’t the problem in these threads. It’s a path that leads nowhere.

Riskable@programming.dev · 27 days ago

Would you say your research is evidence that the o1 model was built using data/algorithms taken from OpenAI via industrial espionage (like Sam Altman is purporting without evidence)? Or is it just likely that they came upon the same logical solution?

Not that it matters, of course! Just curious.

Riskable@programming.dev · 27 days ago

This completely ignores all the endless (open) academic work going on in the AI space. Loads of universities have AI data centers now and are doing great research that is being published out in the open for anyone to use and duplicate.

I’ve downloaded several academic models and all commercial models and AI tools are based on all that public research.

I run AI models locally on my PC and you can too.

Riskable@programming.dev · edit-2 27 days ago

They’re not illegally harvesting anything. Copyright law is all about distribution. As much as everyone loves to think that when you copy something without permission you’re breaking the law the truth is that you’re not. It’s only when you distribute said copy that you’re breaking the law (aka violating copyright).

All those old school notices (e.g. “FBI Warning”) are 100% bullshit. Same for the warning the NFL spits out before games. You absolutely can record it! You just can’t share it (or show it to more than a handful of people but that’s a different set of laws regarding broadcasting).

I download AI (image generation) models all the time. They range in size from 2GB to 12GB. You cannot fit the petabytes of data they used to train the model into that space. No compression algorithm is that good.

The same is true for LLM, RVC (audio models) and similar models/checkpoints. I mean, think about it: If AI is illegally distributing millions of copyrighted works to end users they’d have to be including it all in those files somehow.

Instead of thinking of an AI model like a collection of copyrighted works think of it more like a rough sketch of a mashup of copyrighted works. Like if you asked a person to make a Godzilla-themed My Little Pony and what you got was that person’s interpretation of what Godzilla combined with MLP would look like. Every artist would draw it differently. Every author would describe it differently. Every voice actor would voice it differently.

Those differences are the equivalent of the random seed provided to AI models. If you throw something at a random number generator enough times you could–in theory–get the works of Shakespeare. Especially if you ask it to write something just like Shakespeare. However, that doesn’t meant the AI model literally copied his works. It’s just doing it’s best guess (it’s literally guessing! That’s how work!).

Riskable@programming.dev · 1 month ago

Nope. In fact, if you generate a lot of images with AI you’ll sometimes notice something resembling a watermark in the output. Demonstrating that the images used to train the model did indeed have watermarks.

Removing such imaginary watermarks is trivial in image2image tools though (it’s just a quick extra step after generation).

Riskable@programming.dev · 1 month ago

To be fair, when it comes to stock photos the creatives already got paid. You’re just violating the copyright of a big corporation at that point (if you distribute the images… If you never distribute the images then you’ve committed no crime).

Riskable@programming.dev · 1 month ago

Why stop at “AI-generated”? Why not have the individual post their entire workflow, showing which model they used, the prompt, and any follow-up editing or post-processing they did to the image?

In the 90s we went through this same shit with legislators trying to ban photoshopped images (hah: They still try this from time to time). Then there were attempts at legislating mandatory watermarks and similar concepts. It’s all the same concept: New technology scary, regulate and restrict it.

In a few years AI-generated content will be as common as photoshopped images and no one will bat an eye because it’ll “just be normal”. A photographer might take a picture of a model (or a number of them) for a cover or something then they’ll use AI to change the image after. Or they’ll use AI to generate an image from scratch and then have models try to copy it. Or they’ll just use AI to change small details in the image such as improving lighting conditions or changing eye color.

AI is very rapidly becoming just another tool in photo/video editing and soon it will be just another tool in document writing and audio recording/music creation.

Riskable@programming.dev · edit-2 1 month ago

Not a bad law if applied to companies and public figures. Complete wishful thinking if applied to individuals.

For companies it’s actually enforceable but for individuals it’s basically impossible and even if you do catch someone uploading AI-generated stuff: Who cares. It’s the intent that matters when it comes to individuals.

Were they trying to besmirch someone’s reputation by uploading false images of that person in compromising situations? That’s clear bad intent.

Were they trying to incite a riot or intentionally spreading disinformation? Again, clear bad intent.

Were they showing off something cool they made with AI generation? It is of no consequence and should be treated as such.

Riskable@programming.dev · 2 months ago

I dunno. What kind of service can you get with LowG™?

Riskable@programming.dev · 9 months ago

Market shows that hype is a cycle and the AI hype is nearing its end.