Lemmy PostgreSQL

33 readers
2 users here now

founded 1 year ago
MODERATORS
1
 
 

just hitting the first page, limit 20, using lemmy-ui
http://localhost:1234/c/zy_photography
default sort of "Active"
taking over 26 full seconds, each refresh.
463K posts in community, 6.18M posts in database
12K communities
block lists should be entirely empty
this is while logged-in

2023-08-16 12:46:44.049 MST [1520415] lemmy@lemmy_alpha LOG: duration: 26064.185 ms execute s22174:

SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner", "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community"."moderators_url", "community"."featured_url", ("community_person_ban"."id" IS NOT NULL), "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank", "community_follower"."pending", ("post_saved"."id" IS NOT NULL), ("post_read"."id" IS NOT NULL), ("person_block"."id" IS NOT NULL), "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments")

FROM (((((((((((((
"post_aggregates" INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id")))
INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = $1)))
LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = $2)))
LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = $3)))
LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = $4)))
LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = $5)))
LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = $6)))
LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = $7)))
LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = $8)))
LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = $9)))

WHERE ((((((((("community"."removed" = $10) AND ("post"."removed" = $11)) AND ("post_aggregates"."community_id" = $12)) AND (("community"."hidden" = $13) OR ("community_follower"."person_id" = $14))) AND ("post"."nsfw" = $15)) AND ("community"."nsfw" = $16)) AND ("local_user_language"."language_id" IS NOT NULL)) AND ("community_block"."person_id" IS NULL)) AND ("person_block"."person_id" IS NULL))

ORDER BY "post_aggregates"."featured_community" DESC , "post_aggregates"."hot_rank_active" DESC , "post_aggregates"."published" DESC LIMIT $17 OFFSET $18

2
 
 

Backstory: I had a debian 11 VPS. I installed the postgresql program from the debian 11 apt repo, back a few months ago. Back then, it was postgres 13 on the debian stable repos.

Fast forward couple months: debian 12 comes out. I do a "apt dist-upgrade" and in-place upgrade my VPS from debian 11 to debian 12. Along with this upgrade, comes postgresql 15 installed.

Now, fast forward couple more months: lemmy 0.18.3 comes out. I do not upgrade (I am on lemmy 0.18.2---afaik).

Fast forward some time, too: lemmy 0.18.4 comes out. I decide to upgrade to 0.18.4 from my existing 0.18.2.

I pull the git repo. Compile it locally. It goes well, no errors in the compilation process. I stop the lemmy systemd service, then I "mv" the compiled "lemmy_server" to /usr/bin dir.

I try to restart the now-upgraded lemmy systemd service. However, the systemd service fails.

I check the sudo journalctl -fu lemmy and I see the following error message:

lemmy_server[17631]: thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: syntax error at or near "trigger"', crates/db_schema/src/utils.rs:221:25 

I report this issue here: https://github.com/LemmyNet/lemmy/issues/3756#issuecomment-1686439103

However, after a few back and forths and internet search, I conclude that somewhere between lemmy 0.18.3 and 0.18.4, lemmy stops supporting psql <15. So, my existing DB is not compatible.

Upon my investigation on my VPS setup, I concluded that psql 15 is running, however, lemmy is using the psql 13 tables (I do not know if this is the correct term).

Now my question: is there a way to import the lemmy data I had in the psql 13 tables to a new psql 15 table (or database, I don't know the term).

To make things hairier: I also run a dendrite server on the same VPS, and the dendrite server is using the psql 15 with psql 13 tables on the same database as the lemmy one.

The dendrite database is controlled by a psql user named "dendrite" and the lemmy database is controlled by a psql user named "lemmy" . I hope this makes differentiation between two databases possible. And so I do not harm my existing dendrite database.

Any recommendations about my options here?

3
4
 
 

Instead of tied to a person, a block or subscribe list would be a sharable entity.

Special virtual lists:

  1. All local
  2. All remote
  3. All local + remote
  4. None
  5. NSFW
  6. Other than NSFW
  7. Private community
  8. Other than Private community
5
1
Community Retention Policy (bulletintree.com)
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 
 

Reddit traditionally did not purge comments and posts when a user deleted their account. Prior to the API protest in May/June 2023, it was common to encounter posts and comments by [deleted] that were still fully readable.

Lemmy has encouraged total deletion of content upon account removal.

I think a community should have retention policy:

  1. How much old data to retain for performance reasons. communities like memes seem to churn and repeat content in variations - do people really read memes from 60 days ago?

  2. Allow retention in technical and specialized topics for search engine / historic stability. A policy that 'posting in this community will not be deleted upon account deletion'. Maybe the user has to consent to this with a prompt on their first post or comment?

  3. Remote instances may not want to retain years of content and purge it to only keep the most recent 30 or 60 days for storage and liability reasons, etc. So settings per-community on retention / automatic removal....

6
 
 

Some learning about JOIN filled queries like the SELECT that is used to list post_aggregates table when browsing postings on Lemmy.

https://dba.stackexchange.com/questions/155972/postgres-join-conditions-vs-where-conditions

"Logically, it makes no difference at all whether you place conditions in the join clause of an INNER JOIN or the WHERE clause of the same SELECT. The effect is the same. (Not the case for OUTER JOIN!)"

Learning about join_collapse_limit

7
1
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 
 

Lemmy's PostgreSQL was developed with this philosophy, intentional or otherwise:

  1. Real-time client connection and notification via websocket that gets every single action that passes through PostgreSQL. One upvote, instantly sent to client. One new comment, instantly appeared on Lemmy-ui with version 0.17.4

  2. INSERT overhead for PostgreSQL. As soon as a Lemmy post or comment is done, a parallel post_aggregate and comment_aggregate row is created.

  3. INSERT counting overhead. Extreme effort is made by Lemmy to count things, all the time. Every new INSERT of a comment or post does a real-time update of the total server count. This is done via a SQL UPDATE and not by just issuing a COUNT(*) on the rows when the data is requested.

  4. No archiving or secondary storage concept. PostgreSQL has it in the main tables or nothing.

  5. Raw numbers, local and unique for each instance, for comment and post. But community name and username tend to be more known than raw numbers.

  6. Sorting choices presented on many things: communities, posts, comments. And new methods of sorting and slicing the data keep being added in 2023.

  7. No caching of data. The developers of lemmy have gone to extremes to avoid caching on either lemmy-ui or within the Rust code of lemmy_server. Lemmy philosophy favors putting constant connection to a single PostgreSQL.

  8. User preferences and customization are offloaded to PostgreSQL do do the heavy lifting. PostgreSQL has to look at the activity of each user to know if they have saved a post, previously read a post, upvoted that post, or even blocked the person who created the post.

  9. Language choice is built into the system early, but I see no evidence it has proven to be useful. But it carries a high overhead in how many PostgreSQL database rows each site carries - and is used in filtering More often than not, I've found end-users confused why they can't find content when they accidentally turned off choices in lemmy-ui

  10. All fields on SELECT. Throughout the Rust Diesel ORM code, it's every field in every table being touched.

  11. SELECT statements are almost always ORM machine generated. TRIGGER FUNCTION logic is hand-written.

8
9
10
 
 
# "this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default."
sudo -iu postgres pg_dump --format=custom --compress=5 \
   --exclude-table-data "*.received_activity" \
   --exclude-table-data "*.sent_activity" \
   lemmy > 2023-08-10_backup_compress_1.dump

11
 
 

comment_aggregates gets a lot of hammering to set child_count on new replies, it is also probably the 2nd or 3rd largest table in Lemmy's database, votes for post and comments being the largest