A new study from researchers at the University of California San Diego and the University of Chicago highlights the challenges visual artists face in protecting their work from being used by generative AI tools without consent. The findings will be presented at the 2025 Internet Measurement Conference.
The research involved a survey of over 200 visual artists about their efforts to block AI crawlers, an analysis of more than 1,100 professional artist websites for technical controls, and an evaluation of different blocking methods. According to the study, most artists lack both access to and knowledge about tools that can help prevent AI crawlers from harvesting their work.
“At the core of the conflict in this paper is the notion that content creators now wish to control how their content is used, not simply if it is accessible. While such rights are typically explicit in copyright law, they are not readily expressible, let alone enforceable in today’s Internet. Instead, a series of ad hoc controls have emerged based on repurposing existing web norms and firewall capabilities, none of which match the specificity, usability, or level of enforcement that is, in fact, desired by content creators,” wrote the researchers.
Nearly 80% of surveyed artists reported trying to stop their art from being included in training data for AI tools. About two-thirds said they use Glaze—a tool developed by co-authors at the University of Chicago—to mask original artwork from AI crawlers. Many artists also limit what they share online; 60% have reduced online sharing and 51% post only low-resolution images.
Despite strong demand—96% want access to tools that deter AI crawlers—more than 60% were unfamiliar with robots.txt files. These files can tell web crawlers which pages or sites should not be accessed but are not mandatory for all bots to follow.
Squarespace offers a user-friendly way for site owners to manage robots.txt settings regarding AI-related crawlers. However, only about 17% of Squarespace users who are artists enable this feature—likely due to lack of awareness.
Researchers found that more than three-quarters of artist websites are hosted on third-party platforms where modifying robots.txt is often not possible. Most content management systems provide little information on crawler-blocking options.
A broader survey showed that over 10% of top websites explicitly disallow AI crawlers via robots.txt files. Some publishers have reversed these restrictions after making licensing deals with AI companies; others may allow crawling intentionally for various reasons.
The effectiveness of robots.txt depends on crawler compliance. Large corporate bots generally respect these directives—with one exception: Bytespider (from ByteDance) does not appear to comply reliably. Many other bots claim compliance but cannot always be verified as doing so.
“The majority of AI crawlers operated by big companies do respect robots.txt, while the majority of AI assistant crawlers do not,” according to the researchers.
Cloudflare has introduced a “block AI bots” option for its customers; currently just under 6% have enabled it. Elisa Luo—a co-author and Ph.D. student at UC San Diego—noted: “While it is an ‘encouraging new option’, we hope that providers become more transparent with the operation and coverage of their tools (for example by providing the list of AI bots that are blocked).”
Legal uncertainty adds another layer: U.S.-based lawsuits question whether copyright covers works scraped for model training and what obligations exist toward creators; meanwhile, Europe’s new AI Act requires explicit permission from rightsholders before using data for model training purposes.
“There is reason to believe that confusion around the availability of legal remedies will only further focus attention on technical access controls,” stated the research team. “To the extent that any U.S. court finds an affirmative ‘fair use’ defense for AI model builders, this weakening of remedies on use will inevitably create an even stronger demand to enforce controls on access.”
The study was partly funded by NSF grant SaTC-2241303 and Office of Naval Research project #N00014-24-1-2669.


