Sorry, we don't support your browser.  Install a modern browser
This post is closed.

exclude Common Crawl via robots.txt?#418

Hey folks,

Common Crawl is the infrastructure used to scrape the internet for images that are then used to train AI image generators such as DALL-E and Stable Diffusion. These generators can be asked to copy the style of human artists and cheapen their work.

I suspect that very few people who opt to have public profiles on Glass want their photos to be used this way. Exclusion is as simple as adding this to /robots.txt:

User-agent: CCBot
Disallow: /

a year ago
Changed the status to
Completed
a year ago

Great idea! Done!

a year ago