The New Data Gold Rush
You may have heard of web scraping, but the rise of generative AI has turned it into a global, industrial-scale operation. Companies building large language models (LLMs) like those behind ChatGPT and other AI tools need unfathomable amounts of text and image
data to 'train' their systems. The easiest, cheapest place to get this human-generated content is the public internet—specifically, social media platforms where millions of people share their thoughts, photos, and creative work daily. These AI scrapers are automated bots that systematically visit public profiles and save everything they can find: your posts, pictures, comments, and even the metadata attached to them. They aren't targeting you personally; you are simply part of a vast data-harvesting operation fueling the next generation of commercial AI products.
Why Your Public Profile Is a Prime Target
If your profile on platforms like Instagram, Facebook, or X (formerly Twitter) is set to 'public', you've essentially laid out a welcome mat for these bots. Every post is a data point. A photo of your vacation helps an AI learn what a beach looks like. A rant about a movie helps it understand sentiment. A comment on a news article helps it learn conversational nuances. While some AI companies claim their crawlers respect certain web standards (like 'robots.txt' files that give websites instructions on what not to crawl), social media profiles are a grey area. The most effective barrier isn't a technical request; it's a digital wall. By making your information non-public, you remove it from the low-hanging fruit that these automated systems are designed to collect.
Your Action Plan: Securing Meta Profiles
Facebook and Instagram, both owned by Meta, hold a treasure trove of personal data. Start here. On Facebook: Navigate to 'Settings & Privacy' > 'Settings' > 'Audience and Visibility'. The most critical step is to change 'Who can see your future posts?' to 'Friends'. Go through the 'Privacy Checkup' tool, which guides you through who can see what you share, how people can find you, and your data settings. Pay close attention to 'Off-Facebook Activity' to see which apps and websites share data with Meta and disconnect any you don't recognise or use. On Instagram: Go to your profile, tap the three-line menu, then 'Settings and privacy'. The single most effective step is to toggle on 'Private account'. This means only your approved followers can see your posts and stories. For public-facing accounts, you can still limit interactions. Under 'How others can interact with you', you can control message requests, comments, and tagging to reduce your public data footprint.
Locking Down X (Formerly Twitter)
X has a simple but powerful privacy tool. To activate it, go to 'Settings and privacy' > 'Privacy and safety' > 'Audience and tagging'. Here, you'll find the option to 'Protect your Posts'. When you enable this, your posts will only be visible to your followers, and new followers will require your approval. Anyone who wants to see your content must be manually approved by you. This effectively cuts off access for any scraping bot that isn't already following you. It's a trade-off, as it dramatically limits your public reach, but for personal accounts, it's the strongest privacy setting available on the platform.
Managing Your Professional Footprint on LinkedIn
LinkedIn is designed to be public, but you can still control what is shared. Go to 'Me' > 'Settings & Privacy' > 'Visibility'. Here, you can edit your public profile to control what non-logged-in members and search engines can see. You can choose to show your full profile, just your name and headline, or hide it entirely from external search results. You can also manage who can see your email address, connections, and last name. Under 'Data Privacy', review 'Who can see your connections' and consider changing it from 'Your connections' to 'Only you'. While a public profile is key for job hunting, these settings help you curate exactly what information you're broadcasting to the wider internet and its data-hungry bots.
This Isn't a Perfect Shield, But It's a Start
It's important to be realistic. These steps create significant barriers for automated, large-scale scraping. They make your data harder and more costly to obtain. However, they won't stop a determined human or a highly sophisticated, targeted attack. Furthermore, any data that was public before you made your profile private may already be archived in a training dataset. Think of these privacy settings not as an impenetrable fortress, but as locking your front door. It won't stop a battering ram, but it will deter the vast majority of opportunistic intruders. Taking these steps is a fundamental act of digital hygiene in the age of AI.
















