Please Don’t Share Our Links on Mastodon: Here’s Why!

[–] [email protected] 56 points 6 months ago* (last edited 6 months ago) (3 children)

Just put the site behind a cache, like Cloudflare, and set your cache control headers properly?

They mention that they are already using Cloudflare. I'm confused about what is actually causing the load. They don't mention any technical details, but it does kinda sound like their cache control headers are not set properly. I'm too lazy to check for myself though...

[–] [email protected] 21 points 6 months ago* (last edited 6 months ago)

I've found that if left on default settings, CloudFlare is not that great at caching. It requires a bit of configuration to really make it sing. itsfoss.com thought they were "using CloudFlare" but probably not to it's fullest potential.

[–] [email protected] 21 points 6 months ago

Even without Cloudflare, simple NGINX microcaching would help a ton there.

It's a blog, it doesn't need to regenerate a new page every single time for anonymous users. There's no reason it shouldn't be able to sustain 20k requests per second on a single server. Even a one second cache on the backend for anonymous users would help a ton there.

They have Cloudflare in front, the site should be up with the server being turned off entirely.

[–] [email protected] 10 points 6 months ago* (last edited 6 months ago) (2 children)

I'm confused about what is actually causing the load.

Thousands of instances simultaneously fetching link previews from a VPS w/2GB RAM.

https://mastodon.social/@itsfoss/112369476450719989

[–] [email protected] 14 points 6 months ago

If caching is properly configured, the cache (Cloudflare) will see thousands of requests, but the VPS should only see one request.

[–] [email protected] 0 points 6 months ago (1 children)

This should be front and center, caching won’t be able to make up for that…

[–] [email protected] 25 points 6 months ago* (last edited 6 months ago)

Of course it will, cloudflare is in front of it, they can definitely handje this traffic as long as itsfoss bothers to set correct caching headers for cloudflare to use. That's the entire point of cloudflare...

[–] [email protected] 39 points 6 months ago

Real talk, the mastodon traffic stampede isn't that bad for a properly configured website.

[–] [email protected] 38 points 6 months ago (1 children)

There's another reason I don't share "It's FOSS" links anywhere: this should have been a github issue but it's turned into a clickbaity headline. Every othe article coming out of "It's FOSS" is either low effort, sensationalist, or both.

[–] [email protected] 2 points 6 months ago

The article mentions there are aleady a few issues, some quite old. The article is useful for raising awareness and hopefully getting the fix prioratized higher.

[–] [email protected] 30 points 6 months ago (1 children)

Their website isn't properly caching pages which is the real reason they're having problems.

[–] [email protected] 14 points 6 months ago

I think they just advertised how trivial it would be to take their website down...

[–] [email protected] 29 points 6 months ago (1 children)

I always downvote posts with titles like this. Here's Why -

[–] [email protected] 2 points 6 months ago

same. read more to find out!

[–] [email protected] 22 points 6 months ago* (last edited 6 months ago) (1 children)

The only bit of data I could find:

However, I got a bit of a nasty surprise when I looked into how much traffic this had consumed - a single roughly ~3KB POST to Mastodon caused servers to pull a bit of HTML and… fuck, an image. In total, 114.7 MB of data was requested from my site in just under five minutes - making for a traffic amplification of 36704:1.

That's peak activity of about 30mbps for five minutes. If the server has a gigabit connection, this should take about a second of data transmission at full speed. Of course, there's TCP slow start to deal with, and I doubt many Fediverse clients do requests in the form of HTTP/3 by default, but this doesn't seem all that high? I don't know what the nornal "background" traffic of random clients visiting looks like, but mathematically this seems like it shouldn't take more than a second or two with a RAM cache.

If this were some random independent website that avoids services like Cloudflare because of their status as the gatekeeper of the internet, I would sympathise, but they already use Cloudflare. Their website, like many on the internet, just isn't ready for bursts of visitors, it seems.

This could also be a bug in Ghost CMS, of course.

In theory, content like this could be federated directly; a Fediverse Article could be offered to the wider Fediverse and servers would distribute the content rather than a link with preview. However, this would also prevent ads from showing up, trackers from collecting visitor information, and Mastodon has chosen not to implement more than microblogging objects either. I also don't think Lemmy supports that kind of post, but it'd be a solution in theory.

[–] [email protected] 24 points 6 months ago (1 children)

thanks for saying this! i really don't want to victim blame itsfoss for getting traffic spikes but if you cant handle ~20MB in one minute (~400kbps) of traffic you're doing something really really wrong and you really should look into it, especially if you want to distribute content. crying "dont share our links on mastodon" also sounds like hunting windmills, block the mastodon UA and be done with it, or stop putting images in your link previews for mastodon, or drop link previews completely. a "100 mb DDOS" is laughable at best, nice amplification calculation but that's still 100 megs

[–] [email protected] 4 points 6 months ago (1 children)

I doubt they actually want people to stop sharing their content on Mastodon, as they share the content on Mastodon themselves. I think they want to get more attention for this issue.

Nobody seems to have done so, but it'd be trivial to use ActivityPub as an amplification factor for attacking small publications. Just register free accounts with a couple hundred servers, post links to articles (with unique garbage added to the end of the URL to bust basic server side caching), and tag a couple dozen random users from other servers. Every server, as well as every server whose user was tagged, will fetch the page, and if present, a header image. You can easily send out dozens of links per second to thousands of servers, enough to overwhelm any site that doesn't have their content gatekept by internet giants like Cloudflare.

If the website is hosted on a server with expensive egress fees ("serverless", Amazon, GCloud, Azure, hosters that don't disconnect your server when you hit your bandwidth limit) you can run up a bill of tens of thousands. If the hoster does apply an egress cap, you can shut down a website for a couple of days at the very least.

I don't have a workable solution to this problem, but the way the Fediverse seems to be built with the rather naïve idea that every request that passes the signature requirement is done in good faith has major implications on the wider internet. If we don't find a solution to this problem, I expect websites to start blocking Fediverse user agents when the first DDoS waves start.

[–] [email protected] 5 points 6 months ago (1 children)

AWS charges $0.09/GB. Even assuming zero caching and always dynamically requested content, you’d need 100x this “attack” to rack up $1 in bandwidth fees. There are way faster ways to rack up bandwidth fees. I remember the days where I paid $1/GB of egress on overage, and even then, this 100MB would’ve only set me back $0.15 at worst.

Also worth noting that those who’d host on AWS isn’t going to blink at $1 in bandwidth fees; they’d be hosting else where that offers cheaper egress (I.e. billed by megabits or some generous fixed allocation); those that are more sane would be serving behind CDNs that’d be even cheaper.

This is a non-issue written by someone who clearly doesn’t know what they’re talking about, likely intended to drum up traffic to their site.

[–] [email protected] 20 points 6 months ago (3 children)

Direct link to article:

https://news.itsfoss.com/mastodon-link-problem/

TL;DR:

When you share a link on Mastodon, a link preview is generated for it, right?

With Mastodon being a federated platform (a part of the Fediverse), the request to generate a link preview is not generated by just one Mastodon instance. There are many instances connected to it who also initiate requests for the content almost immediately.

And, this "fediverse effect" increases the load on the website's server in a big way.

Does Lemmy not cause this issue? Other federated software was not mentioned in the article at all.

[–] [email protected] 7 points 6 months ago (2 children)

So the preview should be federated as well?

How many requests are we actually talking about here, though? Is that better or worse than everyone clicking the link?

[–] [email protected] 9 points 6 months ago (2 children)

There's some problem with a federated previews: tricking one instance into generating the wrong preview would spread to every instance. It's been exploited for malware and scam campaigns in message apps.

[–] [email protected] 3 points 6 months ago

Here's a related, interesting example for BlueSky, on generating disguised links and preview cards (with content the url doesn't actually contain) for anyone curious: https://github.com/qwell/bsky-exploits

[–] [email protected] 2 points 6 months ago (1 children)

What is the threat model here?

[–] [email protected] 4 points 6 months ago* (last edited 6 months ago) (1 children)

Masquerading a normal looking link for another one, usually phishing, malware, clones loaded with ads.

Like, lets say I post something like

https://www.google.com

And also have my instance intercept it to provide Google's embed preview image, and it federates that with other instances.

Now, for everyone it would look like a Google link, but you get Microsoft Google instead.

I could also actually post a genuine Google link but make the preview go somewhere else completely, so people may see the link goes where they expect even when putting the mouse over it, but then they end up clicking the preview for whatever reason. Bam, wrong site. Could also be a YouTube link and embed but the embed shows a completely different preview image, you click on it and get some gore or porn instead. Fake headlines, whatever way you can think of to abuse this, using the cyrillic alphabet, whatever.

People trust those previews in a way, so if you post a shortened link but it previews like a news article you want to go to, you might click the image or headline but end up on a phony clone of the site loaded with malware. Currently, if you trust your instance you can actually trust the embed because it's generated by your instance.

On iMessage, it used that the sender would send the embed metadata, so it was used for a zero click exploit by sending an embed of a real site but with an attachment that exploited the codec it would be rendered with.

[–] [email protected] 1 points 6 months ago

Couldn't a malicious ActivityPub server do similar things now?

[–] [email protected] 6 points 6 months ago

2 requests per instance - one for the HTML of the page and another for a preview image.

[–] [email protected] 6 points 6 months ago

Lemmy (and Kbin for that matter) very much do the same thing for posts. I don't think they fetch URL previews for links in comments, but that doesn't matter: posts and comments are both fairly likely to end up spreading to Mastodon/etc anyway, so even comments will trigger this cascade.

Direct example: If you go to mastodon.social, stick @[email protected] in the search box at the topleft and click for the profile, you can end up browsing a large Mastodon server's view of this community, and your very link has a preview. (Unfortunately, links to federated communities just result in a redirect, so you have to navigate through Mastodon's UI.)

[–] [email protected] 2 points 6 months ago* (last edited 6 months ago)

They say it's fediversal in the comments on Mastodon.

[–] [email protected] 4 points 6 months ago

Gotta respect boycotting Cloudflare on principle.. but also, why?

[–] [email protected] 4 points 6 months ago* (last edited 6 months ago) (2 children)

That sounds a lot like a weird spin on the Slashdot effect, caused by content mirroring. It seems that it could be handled by tweaking the ActivityPub protocol to have one instance requesting to generate a link preview, and the other instances copying the link preview instead of sending their own requests.

But frankly? I think that the current way that ActivityPub works is outright silly. Here's what it does currently:

User is registered to instance A
Since A federates with B, A mirrors content from B into A
The backend is either specific to instance A (the site) or configured to use instance A (for a phone program)
When the user interacts with content from B, actually it's the mirrored version of content from B that is hosted in A

In my opinion a better approach would be:

User is registered to instance A
Since A federates with B, B accepts login credentials from A
The backend is instance-agnostic, so it's able to pull/send content from/to multiple instances at the same time
When the user interacts with content from B, the backend retrieves content from B, and uses the user's A credentials to send content to B

Note that the second way would not create this "automated Slashdot effect" - only A would be pulling info from the site, and then users (regardless of their instance) would pull it from A.

Now, here's my question: why does the ActivityPub work like in that first way, instead of this second one?

[–] [email protected] 4 points 6 months ago (1 children)

If server A makes one request, it keeps server B from being overload by thousands of requests from users A.

[–] [email protected] 2 points 6 months ago (1 children)

"A" Users would need to send requests to some server anyway, either A or B; that's only diverting the load from B to A, but it isn't alleviating or even sharing it.

Another issue with the current way that ActivityPub works is foul content, that needs to be removed. Remember when some muppet posted CP in LW?

[–] [email protected] 3 points 6 months ago (1 children)

Yes, but this way demand on instances scales with user count and aliows smaller instances to exist. Otherwise an errant toot on a small instance that suddenly gets popular will instantly drag that smaller instance down.

[–] [email protected] 2 points 6 months ago (1 children)

Got it - and that's a fair point. I wonder however if this problem couldn't be solved another way, specially because mirroring is itself a burden for the smaller instances.

[–] [email protected] 3 points 6 months ago (1 children)

consider that caching happens at thousands of levels on the internet. every centralized site has its content replicated many many times in geo local caches, proxies and even local browsers. caching is a very core concept for the internet. others often bash AP because it replicates a lot, but that's kind of like explicit caching: if the whole fediverse network fetched a post from it source, millions of requests would beat small servers down constantly. big servers cache the content they intend to distribute and handle the traffic spike instead of the small instance. small instances on their hand dont need to replicate as much and can rely more on bigger instances, maybe cleaning their cached content often and refetching when necessary. replication is a feature, not a design flaw!

[–] [email protected] 2 points 6 months ago

replication is a feature, not a design flaw!

In this case I'd argue that it's both. (A problematic feature? A useful bug? They're the same picture anyway.)

Because of your comment I can see the pros of the mirroring strategy, even if the cons are still there. I wonder if those pros couldn't be "snipped" and implemented into a Nostr-like network, or if the cons can't be ironed out from a Fediverse-like one.

[–] [email protected] 3 points 6 months ago (1 children)

Check out Nostr, ActivityPub alternative that does authentication separately from content, works more like that.

[–] [email protected] 3 points 6 months ago* (last edited 6 months ago)

I'm aware of Nostr. In my opinion it splits better back- and front-end tasks than the AP does, even if the later does some things better (as the balance between safeness and censorship-resistance). It's still an interesting counterpoint to ActivityPub.

[–] [email protected] 4 points 6 months ago (2 children)

It's an interesting and frustrating problem. I think there are three potential ways forward, but they're both flawed:

Quasi-Centralization: a project like Mastodon or a vetted Non-Profit entity operates a high-concurrency server whose sole purpose is to cache link metadata and Images. Servers initially pull preview data from that, instead of the direct page.
We find a way to do this in some zero-trust peer-to-peer way, where multiple servers compare their copies of the same data. Whatever doesn't match ends up not being used.
Servers cache link metadata and previews locally with a minimal amount of requests; any boost or reshare only reflects a proxied local preview of that link. Instead of doing this on a per-view or per-user basis, it's simply per-instance.

I honestly think the third option might be the least destructive, even if it's not as efficient as it could be.

[–] [email protected] 6 points 6 months ago

As I understand it, 3) already happens. What causes the load is that each connected instance is also loading and caching the preview.

[–] [email protected] 4 points 6 months ago

Or 4) Ignore noise and do nothing; this is a case of user talking about things they don’t understand at best, or a blog intentionally misleading others to drum up traffic for themselves at worst. This is literally not a problem. Serving that kind of traffic can be done on a single server without any CDN and they’ve got a CDN already.

[–] [email protected] 3 points 6 months ago

i mean it's solid training but they do realise it's not limited to mastodon, right?

the slashdot effect has been around for years

[–] [email protected] 1 points 6 months ago

So why doesn't a random follower posting a link on Mastodon cause server load issues, but a popular follower does?

Fediverse

Rules