• 2 Posts
  • 7 Comments
Joined 1 year ago
cake
Cake day: August 27th, 2023

help-circle


  • All LLMs and Gen AI use data they don’t own. The Pile is all scraped or pirated info, which served as a starting point for most LLMs. Image gen is all scraped from the web. Speech to text and video gen mainly uses YouTube data.

    So either you put a price tag on that data, which means only a handful of companies can afford to build these tools (including Meta), or you understand that piracy is the only way for most to aquire this data but since it’s highly transformative, it isn’t breaching copyrights or directly stealing from them as piracy “normally” is.

    I’m being pragmatic.




  • Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.

    If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.

    The copyright crew aren’t the good guys here, even if it’s spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.