this post was submitted on 22 Dec 2024
198 points (95.8% liked)
Technology
60062 readers
3922 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
probably start with no public API that uses something like a csrf token.
then leverage the patterns that companies combating adblock use and make the page source so spaghetti that anything like beautifulsoup couldn't use it.
at that point you would have to track fluctuations in user actions, rather the lack of. if a user takes .5 seconds to click an action after a page loads 20 times in an hour flag the account for further surveillance that watches over n days for larger impacting actions like post or activity times. correlate with existing users that are in the same profile range and if they are a match for known bot activity ban the account.
on the flip side, bots may try to randomize interactions to combat this, so another filter may look at long term patterns with repetitive actions. things like adding comments may not be useful, but the way they're entered may.
how long does it take them to enter the comment into the input form? how many words are they using every comment? are what they are responding to indicative of a response length provided?
for example if the post was, "what's your favorite cheese?" someone may respond with "Gouda" or "I love Swiss on toasted rye. it reminds me of a lake retreat I had where ......" but it would certainly be less than 1000 characters.
as opposed to a post asking about a political opinion that's nuanced and requires thought and opinion to be shared. not just, "you suck!"
further interactions like upvote/downvote can trigger surveillance. I know some users will dv bomb a user for whatever reason. that could be reason enough to identify them as a malicious entity that's interfering with the system.
So... Having no public API means people just develop libraries to interact with your private API.
Furthermore, beautiful soup can work on any page... It's just a matter of how easily.
CSRF doesn't do what I think you think it does. It only works with a cooperating client (i.e. it's to protect a user in their own web browser). If it's a bot you'd just scrape the token and move on.
Fluctuations in user actions can also be simulated (you can have a bot architecture that delays work to be done to be similar to what a normal user might do/say/post) ... and rate limiting can be overcome by just using more accounts, stolen IP addresses, etc
You can do a lot, but it's always going to be a bit of a war. Things you're suggesting definitely help (a lot of them echo strategies used by RuneScape to prevent/reduce bots), but ... I think saying it's an architecture problem is a bit disingenuous; some of those suggestions also hurt users.