Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arki-check --unarchive documentation #297

Closed
brancomat opened this issue Jan 25, 2023 · 4 comments
Closed

arki-check --unarchive documentation #297

brancomat opened this issue Jan 25, 2023 · 4 comments

Comments

@brancomat
Copy link
Member

There's no mention of the --unarchive option in https://arpa-simc.github.io/arkimet/datasets/archive.html (or in any other part of the doc).

The only mention I found is in the help and in the man page of arki-check

   --unarchive pathname  Given a pathname relative to .archive/last, move it
                         out of the archive and back to the main dataset

This is a bit misleading since by trial and error it seems that it accepts only specific filenames (no paths, no wildcards).
Is this correct?

@spanezz
Copy link
Contributor

spanezz commented Jan 26, 2023

Let's work out how it works first, then where to document it.

In theory, if you have somethign like datasets/lami123/.archive/last/2022/2022-12.grib, you can do this:

arki-check --unarchive 2022/2022-12.grib datasets/lami123

And this should move that segment into datasets/lami123/2022/2022-12.grib, and index it as part of the online dataset.

Does this match the behaviour you observe?

@brancomat
Copy link
Member Author

Does this match the behaviour you observe?

yes.
My question is if in the current implementation is possible to specify more than one file (directories or wildcard).

Side note: I tried a couple of things (admittedly, not very clever) that had an unexpected impact on lock file creation in the $dataset/$year directory (in this example: cosmo/2022), I don't know if it could be considered a bug:

$ ls cosmo/2022/ 
$ arki-check --unarchive 2022/\*.grib cosmo/
Traceback (most recent call last):
  File "/usr/bin/arki-check", line 11, in <module>
    main()
  File "/usr/bin/arki-check", line 7, in main
    sys.exit(Check.main())
  File "/usr/lib/python3.10/site-packages/arkimet/cmdline/base.py", line 83, in main
    return cmd.run()
  File "/usr/lib/python3.10/site-packages/arkimet/cmdline/check.py", line 133, in run
    arki_check.unarchive(pathname=self.args.unarchive)
RuntimeError: cannot rename /home/dbranchini@ARPA.EMR.NET/Scaricati/arkitest/cosmo/.archive/last/2022/*.grib to /home/dbranchini@ARPA.EMR.NET/Scaricati/arkitest/cosmo/2022/*.grib: No such file or directory
$ ls cosmo/2022/
'*.grib.lock'
$ arki-check --unarchive 2022/* cosmo/
Traceback (most recent call last):
  File "/usr/bin/arki-check", line 11, in <module>
    main()
  File "/usr/bin/arki-check", line 7, in main
    sys.exit(Check.main())
  File "/usr/lib/python3.10/site-packages/arkimet/cmdline/base.py", line 83, in main
    return cmd.run()
  File "/usr/lib/python3.10/site-packages/arkimet/cmdline/check.py", line 133, in run
    arki_check.unarchive(pathname=self.args.unarchive)
RuntimeError: cannot auto-detect format from file name 2022/*: file extension not recognised
$ ls cosmo/2022/
'*.grib.lock'  '*.lock'

@spanezz
Copy link
Contributor

spanezz commented Jan 26, 2023

Right, yes, I see I have work to do to make it not just working, but also useable.

It makes sense to make it take segment names, and infer datasets from them.

I'll work on this

spanezz added a commit that referenced this issue Jan 27, 2023
…ther python tools, and make space for subcommands. refs: #297
spanezz added a commit that referenced this issue Jan 27, 2023
@spanezz
Copy link
Contributor

spanezz commented Jan 27, 2023

In the issue297 branch there's a version of arkimet that adds the arki-maint command. arki-maint allows subcommands, and it currently only has the unarchive subcommand, which works like this:

arki-maint unarchive dataset/.archive/last/2022-*.grib

It will look for .archive/last in each of its arguments, infer the dataset directories and the segment names from that, and do the equivalent of running arki-check on each dataset and on each segment.

I did quite a bit of refactoring in command line parsing code to be able to share code between normal commands and commands with subcommands, that's why I'm pushing to a separate branch and not to master

spanezz added a commit that referenced this issue Apr 28, 2023
…ther python tools, and make space for subcommands. refs: #297
spanezz added a commit that referenced this issue Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants