Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Ahh, really?! Thanks for letting me know. I will see if there is something I can do to throttle that after holidays. Curious to see what solutions others come up with
I think Science Memes may make it halucinate more, tbf.
PS: https://anubis.techaro.lol/
That’s interesting. I still don’t fully understand the implications from a user-experience perspective. It looks as if the proof-of-work would go unnoticed when using a user client but presents a more significant challenged for an automated scraping bot. So, it does look promising. I still don’t understand what it would do to a bot such as a ‘PlantID bot’ and other good bots. Do they have a heavy soul? I’ll look into it.
For now, I have modified https://mander.xyz/robots.txt, copying the file that Dave from lemmy.nz found to work to prevent at least some scraping and bot load.
¯\_(ツ)_/¯
I also don’t know what it would do to HTTP requests from federated instances