-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve name of output file #117
Comments
👍 I think it's a good idea. For a little additional historical context; there was some discussion around output file naming conventions used for harmony services back in 2020 (https://wiki.earthdata.nasa.gov/display/HARMONY/Output+File+Naming+Convention). At the time the general consensus was that the original filename should be preserved as much as possible. Obviously with concise, that is not exactly feasible because it is a combination of many input files (this is noted on the https://wiki.earthdata.nasa.gov/display/HARMONY/Transformation+service+availability+and+compliance#Transformationserviceavailabilityandcompliance-Servicecompliance page) As such, as part of this ticket, that wiki page should be updated to specify what the output filename format is. With respect to start and end timestamps, rather than introduce a dependency on CMR, I wonder if it would be better to just inspect the data in the final output file and find the min/max timestamps in the data itself to use for filename. |
Looking at the relevant code for this a bit more, it's seeming to me now like the short name and version number aren't directly accessible in the The only alternatives I can currently think of—i.e., to include useful information beyond (just) the ConceptID and timestamps—is to put the full granule name of the first granule along with the number of granules, or to put the names of the first and last granules. Having the full granule names would actually be more analogous to the output filenames from other services, such as l2ss, harmony-netcdf-to-zarr, or net2cog. So, instead of the current naming, which is: filename = f'{collection}_merged.nc4' , the new naming could look something like (with min/max time stamps): filename = f'{collection}_{datetimes[0].isoformat()}-{datetimes[1].isoformat()}_merged.nc4' , or (with # granules and first name; note, this is what filename = f"{collection}-concatenated_{number_of_granules}_starting_from_{first_url_name}.nc4" , or (with first and last names): filename = f"{collection}_concatenated_granules_from_{first_url_name}_to_{last_url_name}.nc4" Do any of these look like good approaches? Or are there other ideas? Also tagging @ank1m, @chris-durbin, and @owenlittlejohns, since this is likely relevant for other and/or future "many-to-one" output services. |
I like this naming |
Maybe we can start with |
From discussions with @alexrad71...
Issue
Currently, CONCISE writes the output to a file named with the Collection ID and "_merged.nc", as defined here. This name is not very useful to many users.
Proposed Change
Add both the Collection's short name and the version number to the output filename.
What would be even better :)
Include information about the start and end granules in the output filename. For instance, the start and end timestamps could be retrieved from CMR for the start and end granules, and then converted to
str
, and then added to the output filename.The text was updated successfully, but these errors were encountered: