-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and rename persistentID to trackID #784
Comments
Just let me if you change this name, as it is used in the SARD project. |
Just saw we are not using the datasource UUID in the computation. If we include it, this ID would really work like an UUID and will be unique across cases, that would be great. Renaming to globalID then will totally make sense. |
Just an explanation why this is used. We can have 2 different files with this same path (one allocated and other deleted): The two d.txt files have no idInDatasource (they are subitems from zip, they aren't allocated) and their subitemId could be equal if they were extracted from 2 different b.zip files (one allocated and other deleted). So b.zip globalID (that uses b.zip IdInDatasource) must be used in the computation of d.txt globalID |
Thinking better about this, do we need an "UUID" for items in iped? Would it be useful in multicases? This change would make #918 very difficult, I doubt users will know/remember they need to specify the evidence UUID when re-processing cases to import old bookmarks later. Maybe this change to include the evidence UUID in globalID computation can be made just into ElasticSearchTask, what do you think @hauck-jvsh? |
I think it could be used only in the elastic ID, I think you could also maintain the persistentID in elastic just not as _id field which must be unique. |
This was done to make the --continue option work when resuming a processing to ElasticSearch instead of having to delete a remote index and start the processing from beginning again. I think including the evidence UUID into persistentID/globalID computation should be enough to avoid _id conflicts between elastic cases. |
@hauck-jvsh what field are you using to store bookmarks into in elastic, _id? edit: I mean, to correlate bookmarks to items? |
Currently I'm using the _id just to find the item and then set a new metadata with the bookmark in the item. |
@hauck-jvsh, I changed the attribute names persistentId->globalId, parentPersistentId->parentGlobalId, parentContainerPersistentId->containerGlobalId, and also ElasticSearchTask contentPersistentId -> contentGlobalId to follow the new naming convention. |
After that commit an error is occurring when processing cases, see the log file attached. |
Thanks @hauck-jvsh, I'll take a look. Actually I'm still not convinced about the new globalID attribute name, since it could repeat across cases without including the evidenceUUID in the computation. As you said, we can create a real UUID for items for possible future use in a new attribute (maybe using the globalID name), I like this idea. But about persistentID renaming, I thought about more options: fixedID, constantID, constID. What do you think? @tc-wleite do have any suggestion? |
Processing an E01 image worked, but when I tried to process a folder, got a similar exception here. |
I was following the discussions around this issue, but I am not sure what would be the best option. |
I also use it to allow filtering using the file tree in the web interface. |
Have you tested an implementation with just parentId, right? Did it have a noticeable performance impact? |
I couldn't make the searches, because I have to filter items that has in their parentdIds the ids of the selected items. |
I see, this would need some recursive search, possibly Elastic doesn't have a support for that, but we could try to implement this inside iped... |
- this property is needed when resuming processing to get a previous parent id referenced by subitems which parents were not commited, then when reprocessing parents, their id can be updated to the previous value, so parent-child relationships will be preserved.
- fix embedded disks subitems references to parentGlobalID
This is an ID that doesn't change between different runs used when resuming processing to skip already processed items. It is built hashing different concatenated IDs: path, idInDataSource (eg. sleuthID, ad1ID, ufdrID), subitemId, parentContainerPersistentID.
Current name isn't intuitive. Although it is not unique across different cases (it is not an UUID), changing it to globalID seems more user friendly. Any other suggestion?
The text was updated successfully, but these errors were encountered: