Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: Send a hash of all database rules at beginning and end of sync #1409

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

russellhancox
Copy link
Collaborator

No description provided.

@russellhancox russellhancox added the sync service Issues related to the sync service / protocol label Aug 1, 2024
@russellhancox russellhancox force-pushed the rule-database-hash branch 3 times, most recently from a5db48c to a118f39 Compare August 1, 2024 17:08
CC_SHA256_Init(&sha);

[self inDatabase:^(FMDatabase *db) {
FMResultSet *rs = [db executeQuery:@"SELECT * FROM rules"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to exclude transitive rules here as it'll change your hash of hashes and the server will have no idea that these exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also any rules added manually while a sync server didn't exist (though I don't think we track such thing today). However this is likely niche...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought - there's been requests and talks in the past about per-user rules. It would be good to have a bit of an idea about how something like those might be represented here if they're ever implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmarkowsky Good call, I've excluded transitive rules

@mlw If a sync server is added the first sync afterward is a clean sync so the hashes shouldn't matter. For per-user rules those would be implemented on the server side anyway, so that shouldn't matter here. The purpose of this is not necessarily to have the server calculate the hash and compare them but to store the hash sent in a postflight and then validate that the one sent in a preflight matches the stored value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlw I'm assuming you're talking about the rumblings in the past that folks would like to block execution by the user e.g. root can run X but not user peterm.

That'd have to be implemented in the client but is an extension to rules similar to #1200.

Re: Hash of Hashes. Do you need to sort by identifier and policy to create a stable hash? Like if I get a rule out of order for some reason i'll end up with a different hash even though the set of policies is the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to sort by identifier and policy to create a stable hash? Like if I get a rule out of order for some reason i'll end up with a different hash even though the set of policies is the same.

Right now this hashes based on insertion order, which is stable on the client at least. The server could ensure this is stable by always ordering by creation timestamp but I'm not really expecting servers to calculate this in the same way as the client but instead just check the hash doesn't change during preflight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this hashes based on insertion order, which is stable on the client at least. The server could ensure this is stable by always ordering by creation timestamp but I'm not really expecting servers to calculate this in the same way as the client but instead just check the hash doesn't change during preflight.

That's good to know.

Not saying you should do this as the hash is probably the right tradeoff for now, but previously, I looked into using an xorfilter/bloomfilter rather than a hash to avoid worrying about ordering e.g. this single header one https://github.com/FastFilter/xor_singleheader.

The advantage to this is that you're creating a set membership data structure so order becomes irrelevant.

The process would just be:

  1. Hash rules (policy, state, identifier) to a 64-bit integer (e.g. use murmur3)
  2. Insert into the xorfilter
  3. Serialize and ship the xorfilter to the server in postflight
  4. Rehydrate the xorfilter on the server and then for all rules per user check if they're in the supplied xorfilter

Alternatively you could also receive a xorfilter from the server of all the expected rules and as you're scanning the Santa rule db check if those rules are supposed to be in the client DB. This would let you know exactly which rules were spurious.

Copy link
Collaborator Author

@russellhancox russellhancox Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 3 problems with that approach

  1. Serializing and de-serializing the filter may require reimplementing the serialization - the Go implementation in particular doesn't support this
  2. This sounds far more expensive on the server-side; what I've proposed requires saving a single extra field in postflight and doing a simple 64-byte string comparison during preflight. The fuse filter method would require looping over all rules in postflight to check that they're in the filter.
  3. This will not detect extra rules that have been added. Reversing the process would find extra rules (which is better, imo) but then not detect missing rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late reply, but I was referring to some unimplemented per-user rule feature. (I think this has been requested multiple times, but the only reference I found quickly was: #1280 )

This may not matter much, and we should certainly not overfit to some unimplemented feature. I could see the hash-of-hashes being user-agnostic and that would likely be just fine. I was just hoping to consider it early in case there were any obvious/easy tweaks to make to this draft that would better adapt to some new, more complex rule set (per user, CEL, etc.).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlw Gotcha. I don't think that should be an issue, whether we implement that as an extra field in a rule or via CEL, we would add the new field(s) to those included in each rules' hash.

@russellhancox russellhancox force-pushed the rule-database-hash branch 7 times, most recently from 76ed896 to 26df816 Compare August 1, 2024 19:06
Source/common/SNTFileInfo.mm Outdated Show resolved Hide resolved
Source/common/SNTRule.mm Outdated Show resolved Hide resolved
Source/santad/DataLayer/SNTRuleTable.mm Outdated Show resolved Hide resolved
@russellhancox russellhancox force-pushed the rule-database-hash branch 4 times, most recently from 11db5ea to c503604 Compare August 9, 2024 16:33
@russellhancox russellhancox marked this pull request as ready for review August 13, 2024 14:38
@russellhancox russellhancox requested a review from a team as a code owner August 13, 2024 14:38
@russellhancox russellhancox force-pushed the rule-database-hash branch 2 times, most recently from 0400375 to 5f2cce0 Compare August 13, 2024 19:24
while ([rs next]) {
char digest[128];
uint32_t len = snprintf(
digest, 128, "%s:%d:%d:%u", [rs stringForColumn:@"identifier"].UTF8String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/128/sizeof(digest)/

@mlw
Copy link
Member

mlw commented Aug 14, 2024

What do you think about adding the hash of hashes value to santactl status output? feels like it might be useful for debugging / sanity check purposes.

@russellhancox russellhancox marked this pull request as draft August 15, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sync service Issues related to the sync service / protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants