1076
AI companies are violating a basic social contract of the web and and ignoring robots.txt
(www.theverge.com)
This is a most excellent place for technology news and articles.
Put something in robots.txt that isn't supposed to be hit and is hard to hit by non-robots. Log and ban all IPs that hit it.
Imperfect, but can't think of a better solution.
robots.txt is purely textual; you can't run JavaScript or log anything. Plus, one who doesn't intend to follow robots.txt wouldn't query it.
You're second point is a good one, but you absolutely can log the IP which requested robots.txt. That's just a standard part of any http server ever, no JavaScript needed.
You'd probably have to go out of your way to avoid logging this. I've always seen such logs enabled by default when setting up web servers.