-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to delete data before a given day #286
Comments
However the data is still there even after having repeated the deletion. |
I tried to reproduce the problem and it works as expected:
But once I try copypasting your query, I get data:
On closer inspection, the data has been removed until the 21st of december, but the later queries are until the 25th of december, so they correctly show data between the 21st and hte 25th. It seems that arkimet is working as expected, but it's really hard to see the difference between 21 and 25 among all those numbers (it took me quite a while to see it, too) |
@spanezz if you look at the results, you can see that the reftime of the first record is 201112150000, which is 15 Dec 2011. I suspect the problem is not reproducible on a small dataset. I have removed decades of data. The arki-check command took a long time to finish. Maybe I can try removing the data in batches of smaller subsets, for example, one variable at a time. |
I may be time to optimize deletion by directly passing a query to If we both worked on the same dataset, it looks like when I took a copy of it, the data had not been deleted. I now suspect something went wrong in the deletion when you ran it, and worked when I ran it on a subset of the dataset. I'm also considering making |
Unfortunately, I did not notice the error immediately because I did check only the summary with The arki-check command did not print any error message, and I forgot to check the returned status with "exit $?". So I cannot be sure that it did not end without error. However, the summaries were updated to reflect the deletion requested. Is this done only at the end of the transaction? Maybe we can check that the expected result is consistent with the final result. For example, we can check that the number of records before and after is correct. |
An advantage of the current system is that it forces you to backup the deleted data. |
That is a very good point, I never considered it that way. It's a tricky backup, since the results of the query do not contain the data. But until the dataset is repacked, the results of the query should contain valid references to the deleted data still in the dataset. |
I did not know this! I did believe that it contained all the data because I can extract all the records with "arki-scan --data ./file.arkimet" |
In theory the summaries are deleted while the data is deleted and regenerated at the end. The actual deletion is performed in the index files, which are the main files you should expect to see modified by the deletion |
Sorry @spanezz but maybe I don't understand which query you're referring to:
|
Ah interesting, then it works for VM2 data only, because the metadata contain enough information to reconstruct the data. For other formats, I would expect this not to work |
Nice! Is it possible that this feature is format agnostic and it's enabled by the |
No, smallfiles are only supported for VM2, since a VM2 data can be reconstructed with is its arkimet metadata plus a small string. For all other formats it does not make any sense, since to preserve the data one has to copy all of it after the metadata, and that's what It is not however possible to delete data from the output of |
I've reworked deletion for iseg datasets (which are now the only datasets that support deletion) to group data to delete by segment, and do one transaction per segment. The result should be much faster. I've also added, with
|
Redoing the deletion with current master should be much faster and have a far less boring output. Hopefully data should also stay deleted, but I still have no idea how come it didn't get deleted when you first tried :( |
@mnuccioarpae arkimet 1.41-1 is available in the Copr repository. |
I need to delete the data before 2011-12-21 for station 6257. So I ran the following commands:
As a check, I run the following command:
But, if I restrict the query to an interval of reftimes starting before the 2011-12-21, I see the old data. For example, limiting to the variable 78, I get:
What am I doing wrong?
Thanks
The text was updated successfully, but these errors were encountered: