this post was submitted on 05 Jul 2023
3372 points (99.4% liked)

Lemmy.World Announcements

29084 readers
265 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news 🐘

Outages 🔥

https://status.lemmy.world/

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 2 years ago
MODERATORS
 

Another day, another update.

More troubleshooting was done today. What did we do:

  • Yesterday evening @phiresky@[email protected] did some SQL troubleshooting with some of the lemmy.world admins. After that, phiresky submitted some PRs to github.
  • @[email protected] created a docker image containing 3PR's: Disable retry queue, Get follower Inbox Fix, Admin Index Fix
  • We started using this image, and saw a big drop in CPU usage and disk load.
  • We saw thousands of errors per minute in the nginx log for old clients trying to access the websockets (which were removed in 0.18), so we added a return 404 in nginx conf for /api/v3/ws.
  • We updated lemmy-ui from RC7 to RC10 which fixed a lot, among which the issue with replying to DMs
  • We found that the many 502-errors were caused by an issue in Lemmy/markdown-it.actix or whatever, causing nginx to temporarily mark an upstream to be dead. As a workaround we can either 1.) Only use 1 container or 2.) set ~~proxy_next_upstream timeout;~~ max_fails=5 in nginx.

Currently we're running with 1 lemmy container, so the 502-errors are completely gone so far, and because of the fixes in the Lemmy code everything seems to be running smooth. If needed we could spin up a second lemmy container using the ~~proxy_next_upstream timeout;~~ max_fails=5 workaround but for now it seems to hold with 1.

Thanks to @[email protected] , @[email protected] , @[email protected], @[email protected] , @[email protected] , @[email protected] for their help!

And not to forget, thanks to @[email protected] and @[email protected] for their continuing hard work on Lemmy!

And thank you all for your patience, we'll keep working on it!

Oh, and as bonus, an image (thanks Phiresky!) of the change in bandwidth after implementing the new Lemmy docker image with the PRs.

Edit So as soon as the US folks wake up (hi!) we seem to need the second Lemmy container for performance. So that's now started, and I noticed the proxy_next_upstream timeout setting didn't work (or I didn't set it properly) so I used max_fails=5 for each upstream, that does actually work.

(page 2) 50 comments
sorted by: hot top controversial new old
[–] [email protected] 24 points 1 year ago (1 children)

I'm very curious: does single Lemmy instance have the ability to horizontally scale to multiple machines? You can only get so big of a machine. You did mention a second container, so that would suggest that the Lemmy software is able to do so, but I'm curious if I'm reading that right.

[–] [email protected] 20 points 1 year ago (5 children)

A single instance, no. You run multiple instances on multiple machines, then put a frontend (nginx in this case) to distribute the traffic among them.

load more comments (5 replies)
[–] [email protected] 23 points 1 year ago (1 children)

Shouldn't the correct HTTP status code for a removed API be 410? 404 indicates the domain wasn't found or doesn't exist, 410 indicates a resource being removed

[–] [email protected] 14 points 1 year ago (2 children)

Or 418 for the wrong API being used :^)

load more comments (2 replies)
[–] [email protected] 22 points 1 year ago

How great is it to be a part of history in the making -

This is Web 3 in its fomenting -

Headlines ~5yrs:

The ending of Web 2 was unceremonious and just ugly. u/spez and moron@musk watched as their social media networks signaled the end of Web 2 and slowly dissolved. Blu bird’s value disintegrated and Reddit’s hopes for IPO did likewise. Twitter and Reddit dissolved into odorous flatulence as centralization fell apart to the world’s benefit. Decentralized/federated social media such as Mastodon and Lemmy made their convoluted progress and led Web 3’s development and growth…

This is how history is made, it’s ugly and convoluted but comes out sweeet…

[–] [email protected] 22 points 1 year ago* (last edited 1 year ago) (1 children)

Awesome work - things seem to be running much more smoothly today.

Do you have anything behind CDN by chance? Looking at the lemmy.world IPs, the server appears to be hosted in Europe and web traffic goes directly there? IPv4 apparently seems to be resolving to a Finland-based address, and IPv6 apparently seems to be resolving to a Germany-based address.

If you put the site behind a CDN, it should significantly reduce your bandwidth requirements and greatly drop the number of requests that need to hit the origin server. CDNs would also make content load faster for people in other parts of the world. I'm in New Zealand, for example, and I'm seeing 300-350 ms latency to lemmy.world currently. If static content such as images could be served via CDN, that would make for a much snappier browsing experience.

[–] [email protected] 13 points 1 year ago (3 children)

Yes that's one of the things on our To Do list

load more comments (3 replies)
[–] [email protected] 21 points 1 year ago (1 children)

Whilst I'm aware that too many users on one instance can be a bad thing for the wider Fediverse, I think it is a great thing at the moment in terms of how well people are banding together to fix the issues being encountered from such a surge in users.

The issues being found on lemmy.world results in better lemmy instances for everyone and improves the whole Fediverse of lemmy instances.

I'm very impressed with how well things are being debugged under pressure, well done to all those involved 👏

load more comments (1 replies)
[–] [email protected] 21 points 1 year ago* (last edited 1 year ago) (8 children)

Is it safe to use 2FA yet?

load more comments (8 replies)
[–] [email protected] 18 points 1 year ago (1 children)

That's so awesome! Look at that GRAPH!

I'd volunteer to be a technical troubleshooter - very familiar with docker/javascript/SQL, not super familiar with rust - but I'm sure yall also have an abundance of nerds to lend a hand.

load more comments (1 replies)
[–] [email protected] 18 points 1 year ago

You guys are absolute legends, thanks for the update!

[–] [email protected] 18 points 1 year ago

Hey I can upvote now!

[–] [email protected] 17 points 1 year ago

I just love the transparancy you guys are coming forward with. It's absolutely awesome! Thank you for that and for all the work you put in. It means a lot to me that you folks are taking the time to keep us updated. Much love!

[–] [email protected] 16 points 1 year ago

You guys are absolutely amazing. So many thanks to you @Ruud and the entire admin/troubleshooting team! Thank you.

[–] [email protected] 16 points 1 year ago* (last edited 1 year ago) (4 children)

It blows my mind with the amount of traffic you guys must be getting that you are only running one container and not running in a k8s cluster with multiple pods (or similar container orchestration system)

Edit: misread that a second was coming up, but still crazy that this doesn’t take some multi node cluster with multiple pods. Fucking awesome

load more comments (4 replies)
[–] [email protected] 15 points 1 year ago

smoooooooooth! Keep up the good work!

[–] [email protected] 13 points 1 year ago (1 children)

Is it weird that I’m always excited to read the update posts?

load more comments (1 replies)
[–] [email protected] 13 points 1 year ago (1 children)

Installed Jerboa again and it feels smoother than Reddit itself, great job!

load more comments (1 replies)
[–] [email protected] 13 points 1 year ago

Wow it is smooth as butter now. Great job ruud and team!

[–] [email protected] 12 points 1 year ago (1 children)

Things have been super smooth lately, thanks for all the work!

load more comments (1 replies)
[–] [email protected] 12 points 1 year ago (1 children)
load more comments (1 replies)
[–] [email protected] 11 points 1 year ago (1 children)

Awesome work. Any way other devs can contribute?

[–] [email protected] 10 points 1 year ago (4 children)

Go checkout the Github issues for Lemmy

load more comments (4 replies)
[–] [email protected] 11 points 1 year ago* (last edited 1 year ago)

I took a SM break for a few days, and it's running noticeably better today...I think. (:

Thanks a bunch for floating us degenerates.

[–] [email protected] 11 points 1 year ago

You know there's something about dealing with the lagginess in the past few days makes me appreciate the fast and responsive of the update. It nice to see the community grows and makes the experience at Lemmy feels authentic.

[–] [email protected] 11 points 1 year ago (1 children)

Really great job, guys! I know from my experience in SRE that these types of debugs, monitoring and fixes can be much pain, so you have all my appreciation. I'm even determined to donate on Patreon if it's available

load more comments (1 replies)
[–] [email protected] 11 points 1 year ago

It felt like I’d jinx us all if I commented but THANK YOU! This has been a wonderful experience today. Absolutely loving it and knew you just needed some time to work out the kinks that happen with fast growth.

load more comments
view more: ‹ prev next ›