Remove id column and use different primary key on some tables #4093

dullbananas · 2023-10-24T14:36:27Z

This will also allow more concise filters using the diesel find method. For db views, I will do that in other pull requests.

Some is_not_null checks that previously used id are now a little more confusing, but that will be fixed when I refactor all db views.

Die4Ever · 2023-10-25T17:14:11Z

migrations/2023-10-24-030352_change_primary_keys_and_remove_some_id_columns/up.sql

@@ -0,0 +1,4 @@
+ALTER TABLE post_saved
+    DROP COLUMN id,
+    ADD PRIMARY KEY (post_id, person_id);


Is it possible that old databases could have duplicate entries?

If there's a unique constraint on the table already, it's fine. But if there isn't one on that given table, we need to find and delete duplicates before adding that new primary key.

edit: also, I'm not sure, but you might need to remove the previous unique constraint manually, probably after adding the new PK.

The table has UNIQUE and NOT NULL constraints. I changed the migration to remove them after adding the primary key.

NOT NULL is no longer removed because it does not show any effect on the schema when using pg_dump.

Die4Ever · 2023-10-28T04:52:37Z

migrations/2023-10-24-030352_change_primary_keys_and_remove_some_id_columns/up.sql

@@ -0,0 +1,7 @@
+ALTER TABLE post_saved
+    DROP COLUMN id,
+    ADD PRIMARY KEY (post_id, person_id),


I think this index would be more efficient backwards? person_id as the first index element means it's quick to load all your saved posts, but maybe there's already an index for (person_id, published)

There's already an index on just person_id.

I changed the primary key to (person_id, post_id) and removed the old person_id index so now there's only 1 index.

…ived_activity

dessalines

Thx, this is definitely a lot cleaner, esp for bridge tables. Before we merge I wanna test that the migrations work with production data tho, so gimme a few days on that.

dessalines · 2023-11-07T20:01:49Z

@dullbananas Check this one out: dullbananas#17

I merged from main, fixed a few lints, and the scheduled jobs.

* Also order reports by oldest first (ref LemmyNet#4123) (LemmyNet#4129) * Support signed fetch for federation (fixes LemmyNet#868) (LemmyNet#4125) * Support signed fetch for federation (fixes LemmyNet#868) * taplo * add federation queue state to get_federated_instances api (LemmyNet#4104) * add federation queue state to get_federated_instances api * feature gate * move retry sleep function * move stuff around * Add UI setting for collapsing bot comments. Fixes LemmyNet#3838 (LemmyNet#4098) * Add UI setting for collapsing bot comments. Fixes LemmyNet#3838 * Fixing clippy check. * Only keep sent and received activities for 7 days (fixes LemmyNet#4113, fixes LemmyNet#4110) (LemmyNet#4131) * Only check auth secure on release mode. (LemmyNet#4127) * Only check auth secure on release mode. * Fixing wrong js-client. * Adding is_debug_mode var. * Fixing the desktop image on the README. (LemmyNet#4135) * Delete dupes and add possibly missing unique constraint on person_aggregates. * Fixing clippy lints. --------- Co-authored-by: Nutomic <me@nutomic.com> Co-authored-by: phiresky <phireskyde+git@gmail.com>

dessalines

Looks good to me. Had to kill all the database administrators in my head saying "always use autoincrement integer primary keys", but I don't think that's necessary, especially for aggregate tables.

Lets have @phiresky take a look also, since this is a bigger DB change.

phiresky

This is definitely a good idea for the aggregate tables.
For the other things, I'd be a bit more cautious:

Having an incrementing integer primary key has the advantage of ensuring locality / improving the access pattern for large indexes. Using something random like UUID makes this worse which is bad as soon as parts of an index are evicted from RAM.

For example, for posts you always have a hot set which is what 90% of access goes to - when the rows are keyed by an int these are all close together and thus need only a small part of the index to find. With a random key like UUID or the ap_id they are scattered everywhere. Here's a random article about that. In addition, it makes future partitioning and sharding changes harder (e.g. we might want to partition the posts table by post_id/1e6 for index size and performance, with older partitions living on cheaper / compressed storage and falling out of ram cache).

That said, this is really only relevant for the largest tables - tables that have at least like 10 million rows and are expected to grow linearily - in Lemmy that would be posts, comments, votes. E.g. for captcha_answer, it shouldn't matter much. It's also only relevant when you look at the primary look up method for something - whatever is used to primarily look up a row should have good locality. (this is something we should consider if we switch to more readable urls in the future)

Compound indexes also increase the index size but this should be irrelevant since all the primary keys added should have a corresponding unique index that's being dropped.

For the 1:1 tables (mostly aggregates), using the same id as the main table is great all around. For most of the other changed tables, it looks good as well.

These are the tables where it's not 100% obvious if it's good:

comment_like,post_like: the id is replaced with a compound id (person_id, comment_id). Since it looks like comment_likes are never actually were looked up based on their id it's fine
image_upload: these seem to be mostly queried by alias not id anyways so it's fine. I'm guessing the primary lookup (when an image is viewed) happens inside pictrs not lemmy?

For future changes, be careful maybe. What I'm saying is mainly relevant if you change what keys rows are looked up by - removing id columns that aren't really used anyways is fine from a perf perspective.

I didn't check the down migration and I didn't exactly verify whether the triggers/tasks do what they are supposed to. But LGTM

Nutomic · 2023-11-08T14:38:33Z

Yes image requests are passed to pictrs, Lemmy doesnt perform any db queries for them.

https://github.com/LemmyNet/lemmy/blob/main/crates/routes/src/images.rs

dullbananas added 4 commits October 24, 2023 03:36

post_saved

4601027

Merge branch 'main' into primary-keys

67c97e3

Merge branch 'main' into primary-keys

3eb0ccc

fmt

73c7ac0

Die4Ever reviewed Oct 25, 2023

View reviewed changes

dullbananas added 2 commits October 28, 2023 04:08

remove unique and not null

afb4851

Merge remote-tracking branch 'upstream/main' into primary-keys

327b836

Die4Ever reviewed Oct 28, 2023

View reviewed changes

dullbananas added 22 commits October 28, 2023 14:59

put person_id first in primary key and remove index

486396b

use post_saved.find

8bcb3bb

change captcha_answer

ab31355

remove removal of not null

924900d

comment_aggregates

41f0270

comment_like

c321bcc

comment_saved

e6e3748

aggregates

b350775

remove "\"

2df2758

deduplicate site_aggregates

b52bdec

person_post_aggregates

585a651

community_moderator

68f517b

community_block

e0a5ca6

community_person_ban

2cc6552

custom_emoji_keyword

53cb31c

federation allow/block list

861326e

federation_queue_state

17e870c

instance_block

0b7cbb4

local_site_rate_limit, local_user_language, login_token

c8386f9

person_ban, person_block, person_follower, post_like, post_read, rece…

870d96c

…ived_activity

community_follower, community_language, site_language

78795e9

Merge remote-tracking branch 'upstream/main' into primary-keys

f949ca8

dullbananas added 9 commits November 4, 2023 03:45

fmt

201ba64

image_upload

d239c35

remove unused newtypes

2833b15

remove more indexes

fb1fa2e

use .find

fb345db

merge

44555e9

Merge branch 'main' into primary-keys

91cea0b

fix site_aggregates_site function

b5d97b7

fmt

260e314

dullbananas marked this pull request as ready for review November 4, 2023 16:15

dullbananas requested review from phiresky and Nutomic as code owners November 4, 2023 16:15

Nutomic approved these changes Nov 6, 2023

View reviewed changes

dessalines reviewed Nov 6, 2023

View reviewed changes

dessalines and others added 3 commits November 7, 2023 14:49

Merge branch 'main' into primary-keys

a3ba0eb

fmt

671ea17

dessalines approved these changes Nov 8, 2023

View reviewed changes

phiresky approved these changes Nov 8, 2023

View reviewed changes

Merge branch 'main' into primary-keys

9822771

dessalines enabled auto-merge (squash) November 8, 2023 14:07

Merge branch 'main' into primary-keys

175660a

auto-merge was automatically disabled November 9, 2023 12:27
Head branch was pushed to by a user without write access

dullbananas added 4 commits November 9, 2023 05:31

Update community_block.rs

e6d1674

Update instance_block.rs

3aafbd2

Update person_block.rs

ae951ea

Update person_block.rs

ccd5ad0

dessalines merged commit 8e2cbc9 into LemmyNet:main Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove id column and use different primary key on some tables #4093

Remove id column and use different primary key on some tables #4093

dullbananas commented Oct 24, 2023 •

edited

Loading

Die4Ever Oct 25, 2023

dessalines Oct 25, 2023 •

edited

Loading

dullbananas Oct 28, 2023

dullbananas Oct 28, 2023

Die4Ever Oct 28, 2023

dullbananas Oct 28, 2023

dessalines left a comment

dessalines commented Nov 7, 2023

dessalines left a comment

phiresky left a comment •

edited

Loading

Nutomic commented Nov 8, 2023

Remove id column and use different primary key on some tables #4093

Remove id column and use different primary key on some tables #4093

Conversation

dullbananas commented Oct 24, 2023 • edited Loading

Die4Ever Oct 25, 2023

Choose a reason for hiding this comment

dessalines Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

dullbananas Oct 28, 2023

Choose a reason for hiding this comment

dullbananas Oct 28, 2023

Choose a reason for hiding this comment

Die4Ever Oct 28, 2023

Choose a reason for hiding this comment

dullbananas Oct 28, 2023

Choose a reason for hiding this comment

dessalines left a comment

Choose a reason for hiding this comment

dessalines commented Nov 7, 2023

dessalines left a comment

Choose a reason for hiding this comment

phiresky left a comment • edited Loading

Choose a reason for hiding this comment

Nutomic commented Nov 8, 2023

dullbananas commented Oct 24, 2023 •

edited

Loading

dessalines Oct 25, 2023 •

edited

Loading

phiresky left a comment •

edited

Loading