Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agree on an approach for handling units in .csv files #60

Open
irm-codebase opened this issue Sep 25, 2024 · 3 comments
Open

Agree on an approach for handling units in .csv files #60

irm-codebase opened this issue Sep 25, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@irm-codebase
Copy link
Collaborator

irm-codebase commented Sep 25, 2024

One of the last steps in establishing interfacing between a module and the 'outside' is units metadata for .csv files.

This is the only file type we handle that has this issue, since netCDF files have units as standardized metadata.

Based on several tests, I believe the best approach is to add a second 'header' that states the unit of each column, using No Unit to specify columns with text values, ratios, etc.

Reasoning:

  • Removing a second header row is easy in pandas
  • Fits well with the 'tidy' dataframe approach.
  • It produces way less data overhead than adding an extra column in the case of timeseries.
  • calliope v0.7 is able to skip rows easily, so it avoids extra processing on the modelling side.
  • Allows module devs to easily use pint for unit conversion, if they choose to.

Examples

Picture this table:

attribute,country,vehicle_type,vehicle_subtype,carrier,year,TotalEnergyConsumption
units,No Unit,No Unit,No Unit,No Unit, years, ktoe
index,,,,,,
0,DEU,Powered two-wheelers,Gasoline engine,Gasoline,2000,476.0664153213859
1,DEU,Powered two-wheelers,Gasoline engine,BioGasoline,2000,0.0
2,DEU,Passenger cars,Gasoline engine,Gasoline,2000,29431.27094818872
3,DEU,Passenger cars,Gasoline engine,BioGasoline,2000,0.0
4,DEU,Passenger cars,Diesel oil engine,Diesel,2000,8255.653519490721
attribute country vehicle_type vehicle_subtype carrier year TotalEnergyConsumption
units No Unit No Unit No Unit No Unit years ktoe
index
0 DEU Powered two-wheelers Gasoline engine Gasoline 2000 476.0664153213859
1 DEU Powered two-wheelers Gasoline engine BioGasoline 2000 0.0
2 DEU Passenger cars Gasoline engine Gasoline 2000 29431.27094818872
3 DEU Passenger cars Gasoline engine BioGasoline 2000 0.0
4 DEU Passenger cars Diesel oil engine Diesel 2000 8255.653519490721

Loading and removing the second header:

If you do not want to use any fancy libraries to handle units, cleaning the data is trivial:

data = pd.read_csv("tmp/test2.csv", header=[0, 1], index_col=0)
data.columns = data.columns.droplevel("units")
data.head()

image

Automatic unit conversion with pint

If you want to be fancy (and lazy), you can just as easily use pint to do all the unit heavy lifting for you.

data = pd.read_csv("tmp/test2.csv", header=[0,1], index_col=0)
data = data.pint.quantify(level=-1).head()
data.head()

image

data.dtypes

image

data['TotalEnergyConsumption'].pint.to_base_units()

image

@irm-codebase irm-codebase added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 25, 2024
@jnnr
Copy link
Contributor

jnnr commented Oct 1, 2024

I like this approach. Just a tiny request - why not choose a space-free signifier for unitless quantities? E.g.: unitless or no_unit?

Apart from this, I agree with what you propose.

@irm-codebase
Copy link
Collaborator Author

@jnnr I like a the idea, however...

The problem is that pint libraries use No Unit (in pandas) and no unit (in xarray) as standard for unitless cases. I have not tested if capitalisation matters (hopefully not).

You can re-specify it, but using our own standard would add boilerplate, and introduce bugs if said boilerplate is not there.
So even if I dislike it, sticking to the pint naming would save us trouble, I think.

@irm-codebase
Copy link
Collaborator Author

Pinning @brynpickering, as he might find this discussion interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants