Understand the Scraping Landscape
First, it’s crucial to understand what you’re up against. AI companies train their image-generating models (like Midjourney or DALL-E) on massive datasets containing billions of images scraped from the internet. This includes public social media posts,
art portfolios, stock photo sites, and personal blogs. The goal is to teach the AI to recognise patterns, styles, and objects. For artists, this can lead to AI mimicking their unique style. For everyday users, it raises significant privacy concerns about where personal photos end up and how they are used. The fight isn't just about preventing a direct copy; it's about stopping your data from being absorbed into these powerful systems without your permission.
Disrupt Style Mimicry with 'Glaze'
One of the most effective tools for artists is Glaze, a free application developed by researchers at the University of Chicago. Glaze works by adding a very subtle layer of 'noise' or 'perturbations' to your images before you upload them. To the human eye, the image looks identical. But to an AI model, these changes are highly disruptive. When an AI tries to learn your artistic style from a 'glazed' image, it gets confused. Instead of learning to draw like you, it learns a jumbled, useless style. This technique, known as 'style cloaking', acts as a defensive shield, making your work a poor-quality training source for models attempting to replicate your specific aesthetic. It’s a proactive step that protects the very essence of an artist's signature.
Fight Back with 'Data Poisoning'
From the same team that created Glaze comes a more offensive tool: Nightshade. While Glaze protects your individual style, Nightshade 'poisons' the AI model itself. It alters your image's pixels in a way that is invisible to humans but catastrophically misleads the AI during training. For example, if an AI is fed a 'poisoned' image of a dog, Nightshade might trick the model into learning that the image is a cat. If enough poisoned images enter a dataset, the model's performance degrades. A model trained on poisoned dogs might start generating images of cats with dog-like features or vice-versa. It’s a form of digital civil disobedience, designed to make the cost of indiscriminate scraping higher for AI companies. However, it should be used with care, as its goal is to corrupt the model, not just protect your work.
Manage Your Metadata and Opt-Outs
Most digital photos contain EXIF data—hidden information about the camera used, location, date, and sometimes even your name. While stripping this data before uploading won't stop a visual scrape, it's good digital hygiene to prevent personal information from being collected alongside your image. Furthermore, a growing number of platforms and websites are implementing 'noai' and 'noimageai' tags in their robots.txt files. These are commands that tell ethical web crawlers not to scrape images for AI training. While unethical scrapers may ignore these directives, it's a foundational step. Check the privacy and content settings on platforms where you host images, like DeviantArt or Flickr, which have introduced specific opt-out toggles for AI training.
Revisit Watermarks and Privacy Settings
The classic watermark—your name or logo overlaid on an image—is less effective than it used to be. Modern AI is increasingly capable of 'inpainting' or digitally removing watermarks. However, a prominent, well-placed watermark can still act as a deterrent, especially against casual theft. It makes your image a less appealing, lower-quality target. For non-public photos, the best defence remains the simplest: robust privacy settings. On platforms like Instagram and Facebook, setting your profile to private prevents your images from being indexed by search engines and scraped by many automated tools. This is the most effective barrier for personal photos not intended for public consumption, ensuring they remain within your trusted circle.
















