Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup input and output file headers #204

Closed
jhamman opened this issue Jan 22, 2015 · 10 comments
Closed

Cleanup input and output file headers #204

jhamman opened this issue Jan 22, 2015 · 10 comments

Comments

@jhamman
Copy link
Member

jhamman commented Jan 22, 2015

I propose we define a simple standard header for all input and output files. If included now, the output file header looks like this:

# NRECS: 26280
# DT: 3600.000000
# STARTDATE: 1949-01-01-00000
# ALMA_OUTPUT: 0
# NVARS: 12
# YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

My preference would be that all input and output (including cases when OUTPUT_FORCE=TRUE) files be required to use the same header format. All files would include a single row header without #, including the date/time values (see related #18):

YEAR    MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND
@jhamman jhamman self-assigned this Jan 22, 2015
@jhamman jhamman added this to the 5.0 milestone Jan 22, 2015
@tbohn
Copy link
Contributor

tbohn commented Jan 22, 2015

I advocate for keeping the initial #, since that is a common method of
denoting a comment line... But yes, if the data fields would always be
present, then the startdate and dt records would not be necessary. The
ALMA_OUTPUT flag tells us what the unit convention is, but it would be
better to put the units in the column headers themselves (since the ALMA
convention is not widely known, and the non-ALMA convention is also not
widely known). Essentially, I agree that all of the header information
other than column names can be eliminated, if the column names have
sufficient information in them.

On Thu, Jan 22, 2015 at 1:54 PM, Joe Hamman notifications@github.com
wrote:

I propose we define a simple standard header for all input and output
files. If included now, the output file header looks like this:

NRECS: 26280

DT: 3600.000000

STARTDATE: 1949-01-01-00000

ALMA_OUTPUT: 0

NVARS: 12

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

My preference would be that all input and output (including cases when
OUTPUT_FORCE=TRUE) files be required to use the same header format. All
files would include a single row header without #, including the
date/time values (see related #18
#18):

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND


Reply to this email directly or view it on GitHub
#204.

@bartnijssen
Copy link
Member

I suggest we keep the free-form comment lines on the top. While we cannot enforce metadata in these files (at least not without a lot of extra work), I don't want to prevent people from including or adding their own commentary to the file. Stripping all content that starts with a # is easy enough to implement and maintain,

We may also want to consider including a model version in one of those comment lines, but that is a slightly separate issue (i.e. what is the content).

@bartnijssen
Copy link
Member

Wait - I think I just misread the proposal.

If the proposal is:

  • zero or more free-form header lines started with # (with some of the content as specified)
  • one header line with the field names

then I would say I'd agree. Sorry for the confusion, read this one a bit too quickly

@jhamman
Copy link
Member Author

jhamman commented Jan 23, 2015

I originally proposed the extreme of removing everything except the variable names, figuring we would have a discussion on what makes sense. I'd support @bartnijssen's summary. A possible output format may look like this:

# SIMULATION: Simulation ID or original filename
# MODEL_VERSION: VIC.5.0.beta
# ALMA_UNITS: True
YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

Forcing files may also include free form header lines but would be required to include a minimum of one line without a # including the field names.

# VIC FORCING FILE
# SOURCE:  Sheffield, 2006
# ALMA_UNITS: False
YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

@tbohn
Copy link
Contributor

tbohn commented Jan 23, 2015

Why would the field names line not start with a "#", if the header does
start with "#"?

Is it simply that the "#" creates an extra field? We could get around that
by not putting a space between the # and the first field name...

On Thu, Jan 22, 2015 at 5:59 PM, Joe Hamman notifications@github.com
wrote:

I originally proposed the extreme of removing everything except the
variable names, figuring we would have a discussion on what makes sense.
I'd support @bartnijssen https://github.com/bartnijssen's summary. A
possible output format may look like this:

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

Forcing files may also include free form header lines but would be
required to include a minimum of one line without a # including the field
names.

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND


Reply to this email directly or view it on GitHub
#204 (comment).

@bartnijssen
Copy link
Member

Because the field name header would not be free form and is not a comment. For example, many scripting packages (R, python) have analysis modules (pandas, etc) that can read data files with comments and a header. It strips the comments and actually uses the header names to parse a file. These header names can then be used directly to address the relevant columns (dataframes in both R and pandas).

So in short: The header would not start with a ‘#’ because it is not a comment and we don’t want it stripped by something that strips comments.

In case you don’t want to read it, you can simply skip the first line after stripping comments.

On Jan 22, 2015, at 6:23 PM, Ted Bohn notifications@github.com wrote:

Why would the field names line not start with a "#", if the header does
start with "#"?

Is it simply that the "#" creates an extra field? We could get around that
by not putting a space between the # and the first field name...

On Thu, Jan 22, 2015 at 5:59 PM, Joe Hamman notifications@github.com
wrote:

I originally proposed the extreme of removing everything except the
variable names, figuring we would have a discussion on what makes sense.
I'd support @bartnijssen https://github.com/bartnijssen's summary. A
possible output format may look like this:

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

Forcing files may also include free form header lines but would be
required to include a minimum of one line without a # including the field
names.

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND


Reply to this email directly or view it on GitHub
#204 (comment).


Reply to this email directly or view it on GitHub #204 (comment).

@bartnijssen
Copy link
Member

Since the column header that is not a comment breaks backwards-compatibility with VIC.4 ideally this would still be part of VIC.5.0. However, @jhamman please provide an estimate of work involved. If it is more than a few lines of code, then I suggest we bump it to 5.1.

@jhamman
Copy link
Member Author

jhamman commented Aug 17, 2016

@bartnijssen

The remaining work is for the forcing files (ascii output files were done in #227). I think this is a lot of work and should be paired with #18. Realistically, I don’t think we should put these into VIC.5 unless someone is really asking for the feature.  I certainly am not (anymore).

@bartnijssen
Copy link
Member

bartnijssen commented Aug 17, 2016

The difference with #18 is that an extra column (for dates) can already be accommodated (although the information in the extra column will not be used). I'll close both issues and reference them in a new issue to support better meta-data information in the ASCII files. We won't implement it, but someone else may pick it up in the future, so it'll be someday

@bartnijssen
Copy link
Member

Continued in #579

@bartnijssen bartnijssen modified the milestones: someday, 5.0 Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants