Cleanup input and output file headers #204

jhamman · 2015-01-22T21:54:03Z

I propose we define a simple standard header for all input and output files. If included now, the output file header looks like this:

# NRECS: 26280
# DT: 3600.000000
# STARTDATE: 1949-01-01-00000
# ALMA_OUTPUT: 0
# NVARS: 12
# YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

My preference would be that all input and output (including cases when OUTPUT_FORCE=TRUE) files be required to use the same header format. All files would include a single row header without #, including the date/time values (see related #18):

YEAR    MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

The text was updated successfully, but these errors were encountered:

tbohn · 2015-01-22T22:04:36Z

I advocate for keeping the initial #, since that is a common method of
denoting a comment line... But yes, if the data fields would always be
present, then the startdate and dt records would not be necessary. The
ALMA_OUTPUT flag tells us what the unit convention is, but it would be
better to put the units in the column headers themselves (since the ALMA
convention is not widely known, and the non-ALMA convention is also not
widely known). Essentially, I agree that all of the header information
other than column names can be eliminated, if the column names have
sufficient information in them.

On Thu, Jan 22, 2015 at 1:54 PM, Joe Hamman notifications@github.com
wrote:

I propose we define a simple standard header for all input and output
files. If included now, the output file header looks like this:

NRECS: 26280

DT: 3600.000000

STARTDATE: 1949-01-01-00000

ALMA_OUTPUT: 0

NVARS: 12

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

My preference would be that all input and output (including cases when
OUTPUT_FORCE=TRUE) files be required to use the same header format. All
files would include a single row header without #, including the
date/time values (see related #18
#18):

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

—
Reply to this email directly or view it on GitHub
#204.

bartnijssen · 2015-01-22T23:05:26Z

I suggest we keep the free-form comment lines on the top. While we cannot enforce metadata in these files (at least not without a lot of extra work), I don't want to prevent people from including or adding their own commentary to the file. Stripping all content that starts with a # is easy enough to implement and maintain,

We may also want to consider including a model version in one of those comment lines, but that is a slightly separate issue (i.e. what is the content).

bartnijssen · 2015-01-22T23:07:22Z

Wait - I think I just misread the proposal.

If the proposal is:

zero or more free-form header lines started with # (with some of the content as specified)
one header line with the field names

then I would say I'd agree. Sorry for the confusion, read this one a bit too quickly

jhamman · 2015-01-23T01:59:10Z

I originally proposed the extreme of removing everything except the variable names, figuring we would have a discussion on what makes sense. I'd support @bartnijssen's summary. A possible output format may look like this:

# SIMULATION: Simulation ID or original filename
# MODEL_VERSION: VIC.5.0.beta
# ALMA_UNITS: True
YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

Forcing files may also include free form header lines but would be required to include a minimum of one line without a # including the field names.

# VIC FORCING FILE
# SOURCE:  Sheffield, 2006
# ALMA_UNITS: False
YEAR  MONTH   DAY SECOND  OUT_PREC     OUT_AIR_TEMP    OUT_SHORTWAVE   OUT_LONGWAVE    OUT_DENSITY     OUT_PRESSURE    OUT_VP  OUT_WIND

tbohn · 2015-01-23T02:23:03Z

Why would the field names line not start with a "#", if the header does
start with "#"?

Is it simply that the "#" creates an extra field? We could get around that
by not putting a space between the # and the first field name...

On Thu, Jan 22, 2015 at 5:59 PM, Joe Hamman notifications@github.com
wrote:

I originally proposed the extreme of removing everything except the
variable names, figuring we would have a discussion on what makes sense.
I'd support @bartnijssen https://github.com/bartnijssen's summary. A
possible output format may look like this:

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

Forcing files may also include free form header lines but would be
required to include a minimum of one line without a # including the field
names.

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

—
Reply to this email directly or view it on GitHub
#204 (comment).

bartnijssen · 2015-01-23T02:27:58Z

Because the field name header would not be free form and is not a comment. For example, many scripting packages (R, python) have analysis modules (pandas, etc) that can read data files with comments and a header. It strips the comments and actually uses the header names to parse a file. These header names can then be used directly to address the relevant columns (dataframes in both R and pandas).

So in short: The header would not start with a ‘#’ because it is not a comment and we don’t want it stripped by something that strips comments.

In case you don’t want to read it, you can simply skip the first line after stripping comments.

On Jan 22, 2015, at 6:23 PM, Ted Bohn notifications@github.com wrote:

Why would the field names line not start with a "#", if the header does
start with "#"?

Is it simply that the "#" creates an extra field? We could get around that
by not putting a space between the # and the first field name...

On Thu, Jan 22, 2015 at 5:59 PM, Joe Hamman notifications@github.com
wrote:

I originally proposed the extreme of removing everything except the
variable names, figuring we would have a discussion on what makes sense.
I'd support @bartnijssen https://github.com/bartnijssen's summary. A
possible output format may look like this:

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

Forcing files may also include free form header lines but would be
required to include a minimum of one line without a # including the field
names.

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

—
Reply to this email directly or view it on GitHub
#204 (comment).

—
Reply to this email directly or view it on GitHub #204 (comment).

bartnijssen · 2016-08-17T22:00:00Z

Since the column header that is not a comment breaks backwards-compatibility with VIC.4 ideally this would still be part of VIC.5.0. However, @jhamman please provide an estimate of work involved. If it is more than a few lines of code, then I suggest we bump it to 5.1.

jhamman · 2016-08-17T22:17:22Z

@bartnijssen

The remaining work is for the forcing files (ascii output files were done in #227). I think this is a lot of work and should be paired with #18. Realistically, I don’t think we should put these into VIC.5 unless someone is really asking for the feature. I certainly am not (anymore).

bartnijssen · 2016-08-17T22:19:52Z

The difference with #18 is that an extra column (for dates) can already be accommodated (although the information in the extra column will not be used). I'll close both issues and reference them in a new issue to support better meta-data information in the ASCII files. We won't implement it, but someone else may pick it up in the future, so it'll be someday

bartnijssen · 2016-08-17T22:23:58Z

Continued in #579

jhamman added enhancement feature cleanup labels Jan 22, 2015

jhamman self-assigned this Jan 22, 2015

jhamman added this to the 5.0 milestone Jan 22, 2015

jhamman mentioned this issue Jul 24, 2015

Clean-up ascii header #227

Merged

jhamman modified the milestones: 5.0 release candidate 2, 5.0 Jun 28, 2016

jhamman added the classic driver label Jul 7, 2016

jhamman modified the milestones: 5.0 release candidate 2, 5.0 Aug 1, 2016

bartnijssen closed this as completed Aug 17, 2016

bartnijssen mentioned this issue Aug 17, 2016

ENH: meta-data information in ASCII forcing files for VIC.5.classic #579

Open

bartnijssen modified the milestones: someday, 5.0 Aug 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup input and output file headers #204

Cleanup input and output file headers #204

jhamman commented Jan 22, 2015

tbohn commented Jan 22, 2015

NRECS: 26280

DT: 3600.000000

STARTDATE: 1949-01-01-00000

ALMA_OUTPUT: 0

NVARS: 12

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

bartnijssen commented Jan 22, 2015

bartnijssen commented Jan 22, 2015

jhamman commented Jan 23, 2015

tbohn commented Jan 23, 2015

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

bartnijssen commented Jan 23, 2015

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

bartnijssen commented Aug 17, 2016

jhamman commented Aug 17, 2016

bartnijssen commented Aug 17, 2016 •

edited

Loading

bartnijssen commented Aug 17, 2016

Cleanup input and output file headers #204

Cleanup input and output file headers #204

Comments

jhamman commented Jan 22, 2015

tbohn commented Jan 22, 2015

NRECS: 26280

DT: 3600.000000

STARTDATE: 1949-01-01-00000

ALMA_OUTPUT: 0

NVARS: 12

YEAR MONTH DAY SECOND OUT_PREC OUT_AIR_TEMP OUT_SHORTWAVE OUT_LONGWAVE OUT_DENSITY OUT_PRESSURE OUT_VP OUT_WIND

bartnijssen commented Jan 22, 2015

bartnijssen commented Jan 22, 2015

jhamman commented Jan 23, 2015

tbohn commented Jan 23, 2015

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

bartnijssen commented Jan 23, 2015

SIMULATION: Simulation ID or original filename

MODEL_VERSION: ${SHORT_VERSION}

ALMA_UNITS: 0

VIC FORCING FILE

SOURCE: Sheffield, 2006

ALMA_UNITS: 0

bartnijssen commented Aug 17, 2016

jhamman commented Aug 17, 2016

bartnijssen commented Aug 17, 2016 • edited Loading

bartnijssen commented Aug 17, 2016

bartnijssen commented Aug 17, 2016 •

edited

Loading