Topic: A spambot thread  (Read 6756 times)


Erkka

« on: January 12, 2021, 09:15:28 PM »
For a while we have had two layers of automated anti-spam measures. The first layer is invisible, tracking site traffic and spotting known bots based on a vast database. And the another layer is visible to the user, often an image captcha which is beyond bots' ability to solve. Also, users with less than 3 posts need to pass the captcha when making a new post - that should stop old undetected spambot accounts if they try to activate.

Yet, we witness new spam posted on the forums every now and then. Many of them with copypasted content from elsewhere.

I'm afraid that we can't make our anti-spam measures significantly tighter. At one point I tried adding a third layer, which instantly resulted in a post at Steam forums, someone saying that the registration at these forums is broken.

I wouldn't be surprised if somewhere out there is a sweatshop where poor people get paid paltry money for passing captcha to post spam content everywhere in the internet. I mean, that is a more plausible explanation, instead of there being a spambot AI which can solve those captchas far better than the state-of-the-art big company AI systems do. And if we have real people posting spam, then there is not going to be a captcha to stop them.

For a long time things were fine with custom security questions which were easy for anyone familiar with the game, but made no sense for a random visitor. Eventually, all of those questions became useless (maybe the spam networks keep on building their own database of 'known captcha questions and their correct answers'. Already at that point I suspected that they are paying people to answer those questions and storing answers for bot use. But I really don't know how sophisticated or how ugly the hidden machinery of spam networks are. Just guessing.

All in all, I felt like posting these general thoughts here, for we are all affected by the situation. We have four people in the moderation team, that allows us to manually weed out spam posts and accounts on a daily basis, and I think that is the last line of defense we anyway need to have.

So, everyone:

1. if you suspect spam, but are not sure, one good way is to do a google search with post content. If it turns out that the post is copypaste from old post from some other corner of the internet, please use "report to moderator" button

2. if you suspect spam for any other reason - the obvious ones being malicious links embedded in text - hit the "report to moderator" button

3. Thank you for your co-operation, and we apologize for all the inconvenience!
UnReal World co-designer, also working on a small side project called Ancient Savo

Mati256

« Reply #1 on: January 13, 2021, 12:47:34 AM »
I used to be very active in a History forum, and it had a software that detected if the content was copy pasted from somewhere else, the idea was that you had to write your own stuff and not plagiarize from somewhere. I don't know the name of this software, but it's out there, maybe it could be installed in the forum.
It allowed you to post, but sent a message warning to the admin or mods, so don't know if it would be useful.

JP_Finn

« Reply #2 on: January 13, 2021, 05:06:03 AM »
Depends so much on the forum software.

Anyway, there’s only so much even that can do if what Erkka suspects, and what I’ve seen of even one of today’s first posts as a reply. Highly likely sweatshop style spam-bot-people...
Likely somewhere along the line, perfectly legit single response will get edited to include spam-links. It’s sad, but there nothing else to do but carry on and block/edit/mute bots

Erkka

« Reply #3 on: January 13, 2021, 07:06:10 AM »
And there are also legitimate cases of copypasted threads. For example, a person might ask a question on Steam or Reddit, and then decide to register here to ask the same question in the official forums. It might be a complicated task to code an algorithm which could tell apart "a same person re-posting their own text" or "a random user copypasting just some stuff from the internet".

So any such automation couldn't be granted rights to delete posts or user accounts right away. An algorithm could just issue a report for moderators to review. And I think that we all together can already handle that, I don't feel like wasting precious coding time for making (or seeking and installing) such an algorithm.

Theoretically speaking, what could be useful would be an algorithm detecting if a forum user returns to edit their older post to insert suspicious links. But again, for this there are legitimate cases, where person posts something and then later edits their own post to insert a link to savefile, or a link to an external picture hosting site for screencaptures etc. So, again, such an algorithm could only be allowed to issue a report.

So I prefer doing some manual work, instead of having any algorithm detecting false positives and deleting real user accounts because of AI failure.
UnReal World co-designer, also working on a small side project called Ancient Savo

PALU

« Reply #4 on: January 18, 2021, 01:46:02 PM »
I definitely agree banning and deletion should be manual processes for as long as possible (there's always the risk of spammers managing to overwhelm the capacity of the human defense to deal with them).

It seems odd to have a sweat shop for breaking through defenses on a case by case basis, as the rationale for spam is that it costs almost nothing to spread, and so can have an abysmal success rate and still be profitable. Involving a human increases the cost from a small fraction of a cent to much more than that, unless the "employees" are to be kept for other reasons and the sweat shopping is just a way to recoup some of the costs of keeping them locked up.

Erkka

« Reply #5 on: January 18, 2021, 03:49:02 PM »
Quote
It seems odd to have a sweat shop for breaking through defenses on a case by case basis 

Yes. I could imagine there being a huge database of text-based captcha questions, and then someone paying other people to answer those questions. For each correct reply could be re-used just than many times.

But passing through complicated image-based captcha? That makes me wonder. Although, brute force could be used to eventually pass any captcha =)

Oh well. But this is all just guesses, trying to figure out the logic how spammers get through the captcha.
UnReal World co-designer, also working on a small side project called Ancient Savo