Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(trin-storage): add filter to store #1286

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

njgheorghita
Copy link
Collaborator

@njgheorghita njgheorghita commented May 3, 2024

What was wrong?

Soon, we'll need the ability to remove certain content from our storage according to a specified "validation function". This pr implements the ability to filter such data from the storage.

How was it fixed?

Added filter feature to storage

To-Do

@njgheorghita njgheorghita marked this pull request as ready for review May 3, 2024 15:25
@njgheorghita njgheorghita requested review from morph-dev, ogenev and KolbyML and removed request for morph-dev and ogenev May 3, 2024 15:34
@morph-dev
Copy link
Collaborator

Regardless if we decide to postpone this for later, I would like to highlight an issues with this approach.

Pagination doesn't really guarantee that you will see all entries. Consider following scenarios:

Scenario 1: Entries changing pages

  1. Query first page (entries 1-100) and delete 10 of them because they didn't pass the filter
  2. Query second page (entries 101-200)

Result: Entries that were originally at index 101-110 were never queried (because at the moment of second query they were moved to 91-100).

This can be avoided by either:

  • tracking number of deletes and updating offset
  • saving all entry keys to delete, and delete them once we finished iterating entire db

Scenario 2: Running as a separate task while trin normal operation are running as well

If trin is running normally, we can't avoid issue from Scenario 1 because trin can delete entries (e.g. using pruning) and affect pagination in a way that we can't control.

Note: This can be avoid by simply only running filtering on startup, but that should be documented (if not enforced).

@njgheorghita
Copy link
Collaborator Author

@morph-dev @KolbyML
The goal right now is to remove all epoch accumulators from the history store on our network nodes. There are a few ways we can go about this. I've included 2 in this pr.

  1. With a simple script that will be run in a devops step before trin is initialized. This will probably be a one-off, though the script can be easily updated as needed in the future. (note: the script will need to be updated to actually remove epoch accs not just invalid content, but I'll handle that before deploying)
  2. Implementing a filter method on Storage that can handle any validation function, and can be used from within trin's core code

Personally, I'm leaning towards using # 1, but I'm curious what thoughts you guys have on this...

@KolbyML
Copy link
Member

KolbyML commented Sep 17, 2024

Is purge_invalid_history_content.rs just an example?, wouldn't EpochAccumulators be "valid data" in the example of this PR, if this PR just purges invalid data, couldn't I merge #1436 (review) first, as then the EpochAccumulator data would be invalid.

I am just a little confused sorry if it is a silly question. to actually remove epoch accs not just invalid content, but I'll handle that before deploying this makes it sound like the code to "remove epoch accs" won't be

@njgheorghita
Copy link
Collaborator Author

Sorry, I should have probably said but I'll handle that before asking for a final review and merging - it's a fairly trivial change whether or not we're explicitly evicting all EpochAccumulator types or just "invalid" content

Basically, right now I'm just curious as to what your thoughts are on the two approaches in this pr.

I still don't think we should merge #1436 until we have a gameplan settled upon for removing epoch accs. But if we're ok with the "script" approach, then it'll probably make sense to have the two prs released in the same deployment.

@KolbyML
Copy link
Member

KolbyML commented Sep 17, 2024

Sorry, I should have probably said but I'll handle that before asking for a final review and merging - it's a fairly trivial change whether or not we're explicitly evicting all EpochAccumulator types or just "invalid" content

Basically, right now I'm just curious as to what your thoughts are on the two approaches in this pr.

I still don't think we should merge #1436 until we have a gameplan settled upon for removing epoch accs. But if we're ok with the "script" approach, then it'll probably make sense to have the two prs released in the same deployment.

I am a fan of the script approach and deploying both in the same release so

The script should be able to still work, because it won't be able to decode Epoch-accumulators because it is no longer a valid type, hence it will be evicted (as we just discussed, just a recap of what I think the plan is correct me if I am wrong)

I am good with this plan

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will distinguish between these two approaches as script vs store.
Based on the comment in the code, I assume store would run only at the startup.

TL;DR: If you want this only for purging EpochAccumulators, then I'm in favor of script but we can do it faster using sql. If we want some "purge" solution that can be reusable in the future, I'm in favor of store with few suggestions.


If always enabled, both would significantly delay startup time (especially for prod nodes). If we want at every startup to check that all data in db is still valid (which I believe we don't), we can't avoid this delay. But In that case I would prefer store approach (and in that case, there are couple improvements that I can suggest, e.g. instead of having it as a function, doing it as a part of trin_storage::versioned::create_store and have a flag that can enable/disable it).

If this is done as a one-off thing to purge EpochAccumulators, then I'm in favor of script approach. But in that case, we can make this much faster by running some sql query like this:

DELETE FROM ii1_history WHERE hex(content_key) LIKE "03%"

(the number of deleted rows can be obrained with changes function)

If you want to have a solution that is easily reusable in the future (e.g. with simple modification of the is_content_valid function), than I'm again in favor of store approach but with improvements that I already mentioned above.

}
}
}
for content_id in &content_ids_to_remove {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this outside bigger loop (line 54), otherwise it doesn't work.

@njgheorghita
Copy link
Collaborator Author

@morph-dev Sounds good. I'll move ahead with the script strategy here and update this pr to reflect that.

@njgheorghita njgheorghita force-pushed the filter-store branch 3 times, most recently from 500dd4b to 1ab70f4 Compare September 18, 2024 16:47
@njgheorghita
Copy link
Collaborator Author

Ready for another review here. I also added a flag to the script that allows for test runs before actually evicting the content.

@morph-dev It's definitely not as fast as directly implementing a sql query, but afai to do so requires access to the sql connection pool on the Store which is hidden behind the private field, config. I figured if this script is meant to be a one-off, then it's better to just take the slow approach rather than modify the store.

@KolbyML
Copy link
Member

KolbyML commented Sep 18, 2024

DELETE FROM ii1_history WHERE hex(content_key) LIKE "03%"

@njgheorghita I think we should do @morph-dev suggestion. As currently paginate() will throw an error, as the EpochAccumulator keys will be invalid and fail to be decoded to the HistoryContentKey type. The error it will throw is FromSqlConversionFailure

Or make the script properly handle invalid HistoryContentKey's as currently it only properly handles invalid content_value's no?

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't say it's super fast (we don't have index for content_key), but it must be faster than getting every single content key/value and decoding them :).

create_store(ContentType::History, config, sql_connection_pool).unwrap();
let total_entry_count = store.usage_stats().entry_count;
info!("total entry count: {total_entry_count}");
let sql_connection_pool = store.config.sql_connection_pool.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have sql_connection_pool, at line 42. Therefor, you don't need config in store to be public (changes in other file).


Also, I would argue that you don't need store at all. All you are doing is getting the number of entries, which you can do by executing:

SELECT COUNT(*) as count FROM ii1_history

Up to you if you want to create instance of store and get entry count from there or execute it manually from here.

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this comment wasn't attached to the previous review.

.expect("Failed to fetch history content");
info!("found {} epoch accumulators", lookup_result);
if script_config.evict {
let _: Vec<String> = sql_connection_pool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we used execute here?

It would be something like:

let deleted = sql_connection_pool.get().unwrap().execute(&delete_query(), []).unwrap();

I think with this api we don't even need extra changes() call that I said in the previous review.

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I assume the PR is ready for review now, tell me if I am wrong)

:shipit: looks good, I haven't tested the changes locally. But I don't see any major red flags so we should be good to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants