-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for voidtools everything DB #515
base: main
Are you sure you want to change the base?
Conversation
eaffcfa
to
00f8351
Compare
90979c3
to
a228c74
Compare
I've now added support for all filesystem types supported by Everything stable (Currently NTFS/REFS/EFU/Folder), along with tests for each. When I have some more time I'll add support for more versions (Everything 1.5.0alpha currently uses version 1.7.49 and also supports FAT, network drives, and network indexes) |
Really cool PR! Since this is another big one, please give it some time for us to do the review :). Stay tuned! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this file not put in lfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if possible, it is a good idea to compress the bigger files
class EverythingDBParser: | ||
def __init__(self, file_handle: IO[bytes]): | ||
self.fh = file_handle | ||
magic = self.__parse_magic(self.fh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently see you are manually doing a lot of manual reading in this parser. Can't you convert it to dissect.cstruct definitions? Then it gets more readable and consistent with dissect.target
elif isinstance(item, EverythingFile): | ||
typ = EverythingFileRecord | ||
else: | ||
raise NotImplementedError(f"type {type(item)} is not Recordable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do it like this in the plugin, even if there is still valid data in everything_file
it will completely end this plugin run.
Even if it still needs to process other paths in self.config
if isinstance(item, EverythingDirectory): | ||
typ = EverythingDirectoryRecord | ||
elif isinstance(item, EverythingFile): | ||
typ = EverythingFileRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just make a EverythingRecord, which adds a file_type
field?
Same with the distinction of EverythingFile
and EverythingDirectory
.
Wouldn't a attribute in the previous class suffice?
Hey, thanks for the review. The only request I haven't worked on yet is the request regarding using dissect.cstruct. I'll have to think a bit about how to implement it, because of differences between structs for multiple versions. I'd be happy to hear thoughts about how I handled different versions in the code (I'm not quite happy about with the version handling). |
Inspired by #505, I remembered I had some code lying around to parse the database of Voidtools Everything, very similar to mlocate/plocate, but for Windows.
I updated the code and added it to the codebase.
Because Everything is closed source, this is completely based off of reverse-engineering the code, and I haven't found any reference implementation on the internet to help (AFAIK this is the only parser), so this is all based off of my (not too great) reversing skills.
I've tested this on ~10 random database files I had lying around, from multiple computers, all of them have given exactly the same exact results as Everything itself (checked by exporting to CSV and comparing md5sums).
It should support any DB created since 2017, and if given a broken file, I'm willing to add support for earlier versions as well.
All comments are mine, written while reversing the code.
This is relatively slow code (takes 4.5 seconds for a DB with 126828 files),
I have a version written in Rust which is 22 times faster, and if that's something you are interested in, then I'm happy to try creating bindings with Py03.