Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultipleSeries to dataframe #13

Closed
CommonClimate opened this issue Jan 19, 2023 · 5 comments
Closed

MultipleSeries to dataframe #13

CommonClimate opened this issue Jan 19, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request priority_medium medium priority issue

Comments

@CommonClimate
Copy link
Contributor

Generalizing the creation of Series objects from a dataFrame would be very helpful. See Dry_Tortugas.ipynb for an example: currently we create different Series manually (one per dataframe column), but it would be very helpful to have a pyleo.MultipleSeries methods called from_pandas() that would take the following arguments:

  • df, a Pandas dataframe
  • time_col, name or index of the time used to create the time variable, hence the datetime index
  • value_cols: index of the columns to be exported into a MultipleSeries object (typically, users will only want certain columns from a spreadsheet)

metadata can be inferred from the spreadsheet columns (e.g. a string in parentheses is likely to be the value_unit of the corresponding Series, like "mmol/mol" in this example; the string preceding that parenthesis ("Sr/Ca" in this example) will be the value_name of the corresponding Series, or the name of the pd.Series created from that column of the DataFrame. Similarly, metadata for the time variable (particularly the all-important time_unit) can be inferred from the header of df.iloc[:,time_col].

The method should return a MultipleSeries object (i.e. a list of pyleo.Series objects).

@CommonClimate CommonClimate added the enhancement New feature or request label Jan 19, 2023
@khider khider moved this to Todo in Pandas integration Jan 19, 2023
@CommonClimate CommonClimate added the priority_low low priority issue label Jan 21, 2023
@CommonClimate
Copy link
Contributor Author

I take Deborah's point that this will be impossible to do for general cases, because each paleo spreadsheet is a bit of a snowflake. However, the reverse is possible: pyleo.MultipleSeries objects should have a to_pandas() method just like pyleo.Series objects do. Instead of exporting to pd.Series, it would export to pd.DataFrame. This would likely require some kind of alignment to make place, therefore issue #28 takes precedence.

@CommonClimate CommonClimate added priority_medium medium priority issue and removed priority_low low priority issue labels Feb 20, 2023
@CommonClimate CommonClimate moved this from Todo to High Priority TODO in Pandas integration Feb 20, 2023
@MarcoGorelli
Copy link

So MultipleSeries.from_pandas is no longer a requirement?

MultipleSeries.to_pandas - doesn't look like this requires #28 to be done first, given that you already have a way of aligning Series

@CommonClimate CommonClimate changed the title MultipleSeries from dataframe MultipleSeries to dataframe Feb 23, 2023
@CommonClimate
Copy link
Contributor Author

Addressed by LinkedEarth/Pyleoclim_util#335

@CommonClimate
Copy link
Contributor Author

CommonClimate commented Feb 24, 2023

Question re: this. As shown in Paleopandas Playground.ipynb, I get a result like this in my test case:
Screen Shot 2023-02-23 at 16 46 42

this is too good to be true: all three Series in this object have completely different time axes, so at first I could not understand how the df appears to have one common index to them all and no missing values (NaNs). Looking into the function, I saw that it applies common_time() under the hood, but I think this should be an optional parameter. The default behavior should be to align the timeseries to the same index, but keep NaNs in the DataFrame, unless you see a major objection to that. At the very least, users need to be warned that common_time() has been applied with default parameters. I can actually do that by applying the concept of log (present in Series) to this class as well.

@CommonClimate
Copy link
Contributor Author

addressed by #350

@CommonClimate CommonClimate moved this from High Priority TODO to Done in Pandas integration Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority_medium medium priority issue
Projects
Development

No branches or pull requests

2 participants