Fix issue 3 #4

SarahAlidoost · 2023-10-16T12:32:23Z

closes #3

SarahAlidoost · 2023-10-17T09:50:51Z

Loading data from pyphenology as below:

observations, predictors = utils.load_test_data(
    name="vaccinium", phenophase="budburst"
)
print(observations.head(5))
                species  site_id  year  doy  phenophase
0  vaccinium corymbosum        1  1991  100         371
1  vaccinium corymbosum        1  1991  100         371
2  vaccinium corymbosum        1  1991  104         371
3  vaccinium corymbosum        1  1998  106         371
4  vaccinium corymbosum        1  1998  106         371

print(observations.shape)
(48, 5)

print(predictors.head(5))
   site_id  temperature  year  doy  latitude  longitude  daylength
0        1        13.10  1990  -65   42.5429   -72.2011      10.24
1        1        13.26  1990  -64   42.5429   -72.2011      10.20
2        1        12.30  1990  -63   42.5429   -72.2011      10.16
3        1        12.15  1990  -62   42.5429   -72.2011      10.11
4        1        13.00  1990  -61   42.5429   -72.2011      10.07

print(predictors.shape)
(4356, 7)

The data is preprocessed internally as:

observed_doy, temperature_array, doy_series = mu.misc.temperature_only_data_prep(
    observations, predictors, for_prediction=False
)
print(observed_doy.shape)
(48,)
print(temperature_array.shape)
(363, 48)
print(doy_series.shape)
(363,)

SarahAlidoost · 2023-10-17T10:00:38Z

@Peter9192 I found that applying X.flatten() wasnot correct, thanks for the hints 👍 I fixed it in this commit. I also checked the shapes of the variables, see my comment above.

Now there are still two issues:
1- creating 'doy_series': np.arange(X.shape[1]) means no gap in doys and each column in X corresponds to a doy.
2. distinguish between sklearn and pyphenology is implemented by if statement isinstance(predictors, np.ndarray) and isinstance(observations, np.ndarray). In pyphenology, the processed temperature_array has a shape of (features, samples) whereas in sklearn has (samples, features). This is the reason for X.T in the code.

Peter9192 · 2023-10-18T07:09:49Z

creating 'doy_series': np.arange(X.shape[1]) means no gap in doys and each column in X corresponds to a doy

That's okay, I think. If we document it properly.

distinguish between sklearn and pyphenology is implemented by if statement isinstance(predictors, np.ndarray) and isinstance(observations, np.ndarray)

Perhaps instead of checking the types, we can check the shapes? For pyphenology, y (or obsevations) is 2d, whereas in the case of sklearn it should be 1d (only the doy column). Additionally/alternatively, for the predictors (X), we could see if there are column names like "temperature" and "latitude". Then you know it should be pyphenology-format, because with sklearn, as you said, all columns correspond to DOY.

In pyphenology, the processed temperature_array has a shape of (features, samples) whereas in sklearn has (samples, features). This is the reason for X.T in the code

Okay, I suppose that should be fine. Well spotted. Maybe copy this note to an inline comment?

Peter9192 · 2023-10-18T07:10:17Z

Nice progress! I'm wondering if it also works for some of the other pyphenology models now?

SarahAlidoost · 2023-10-18T12:21:00Z

Perhaps instead of checking the types, we can check the shapes?

Looking at the existing checks in base function, I think checking types works better than other options.

For pyphenology, y (or obsevations) is 2d, whereas in the case of sklearn it should be 1d (only the doy column).

In predict function, the input argument is only X, still a check on the shape of y cannot be implemented there.

Additionally/alternatively, for the predictors (X), we could see if there are column names like "temperature" and "latitude". Then you know it should be pyphenology-format, because with sklearn, as you said, all columns correspond to DOY.

"latitude" is not in self._required_data except for Naive model. In addition, a solution like if temperature not in predictors is not a sufficient check. In pyphenology case, there is a check for the name of required data in predictors that raises an error, see validation.validate_predictors.

SarahAlidoost · 2023-10-18T12:21:47Z

Nice progress! I'm wondering if it also works for some of the other pyphenology models now?

not for M1 and Naive. Do you want to include those as well?
Implementation is straightforward for Naive assuming that X is latitude instead of temperature. But M1 needs temperature, doy and daylengths as predictors. How to introduce the last one in X to sklearn?

Peter9192 · 2023-10-18T15:32:04Z

How to introduce the last one in X to sklearn?

Not sure about that. I'm already very happy that it works for all the others! Let's wrap it up without M1 for now

Peter9192 · 2023-11-17T14:50:35Z

Tried using this with the following:

import pandas as pd
import numpy as np
import geopandas as gpd
from pyPhenology import models

# Generate test data
# random observations for 10 points:
obs = gpd.GeoDataFrame(
    data = {
        'year': np.arange(2000, 2010), 
        'DOY_firstbloom': np.random.randint(120, 180, size=10),
        'geometry': gpd.GeoSeries.from_xy(*np.random.randn(2, 10))
        },
)

# dummy temperature data for each of these years/locations, for each DOY
get_temperature = lambda year, geometry: pd.Series(np.random.randn(365), index=np.arange(1, 366), name='temperature')
weather = obs.apply(lambda row: get_temperature(row.year, row.geometry), axis=1)


# This works
model = models.ThermalTime()
model.fit(observations=obs.DOY_firstbloom.values, predictors=weather.values)
model.get_params()

# This doesn't
model = models.ThermalTime()
model.fit(observations=obs.DOY_firstbloom, predictors=weather)
model.get_params()

That's why I still think checking based on something different than types is useful.

Peter9192 · 2023-11-17T15:45:48Z

Testing for other pyphenology models:

model_list = [ 
    models.ThermalTime(),
    models.Alternating(),
    models.Uniforc(),
    models.Unichill(),
    models.Linear(),
    models.MSB(),
    models.Sequential(),
    # models.M1(),  # Fails
    models.FallCooling(),
    models.Naive(),
]
for model in model_list:
    model.fit(observations=obs.DOY_firstbloom.values, predictors=weather.values)
    print(model.__class__.__name__, model.get_params())

Interestingly, it also works for Naive. But the output might not make sense... We might need to disable that?

SarahAlidoost · 2023-11-21T09:19:42Z

Testing for other pyphenology models:

model_list = [ 
    models.ThermalTime(),
    models.Alternating(),
    models.Uniforc(),
    models.Unichill(),
    models.Linear(),
    models.MSB(),
    models.Sequential(),
    # models.M1(),  # Fails
    models.FallCooling(),
    models.Naive(),
]
for model in model_list:
    model.fit(observations=obs.DOY_firstbloom.values, predictors=weather.values)
    print(model.__class__.__name__, model.get_params())

Interestingly, it also works for Naive. But the output might not make sense... We might need to disable that?

I have implemented it for Naive as well, see my commit.

SarahAlidoost · 2023-11-21T16:48:16Z

@Peter9192 I had another look at the implementations and your examples here. Now SklearnThermalTime() work with both array data and dataframe if they are sklearn compatible, see validation. For now, I added a docstring to SklearnThermalTime() that explains the data structure for thermal_time model, I couldn't find a good way to add more checks for X and y values. If data structure is pyphenology, the class ThermalTime() should be used as it is intended.
For other models, the same can be implemented but when addressing issue #2.

SarahAlidoost added 2 commits October 16, 2023 14:30

use sklearn adjustments in base model

a1f868d

fix a bug in setting up pycaret

ae7a08e

fix X shape, fix doy_series type

457893a

add a comment about X.T

4048975

refactor new codes to function _organize_sklearn_predictors

c389892

SarahAlidoost added 2 commits November 2, 2023 15:25

add sklearn organize inputs for Naive model

c51c56b

fix a comment

1a98406

SarahAlidoost marked this pull request as ready for review November 2, 2023 14:25

SarahAlidoost requested a review from Peter9192 November 2, 2023 14:26

SarahAlidoost added 4 commits November 21, 2023 14:43

add two sklearn checks to validation

8dd63d2

fix checks of sklearn and pyphenology

561a966

add docstring for SklearnThermalTime class

f555308

exclude M1 from sklearn data preparation

73c19f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 3 #4

Fix issue 3 #4

SarahAlidoost commented Oct 16, 2023

SarahAlidoost commented Oct 17, 2023

SarahAlidoost commented Oct 17, 2023 •

edited

Loading

Peter9192 commented Oct 18, 2023

Peter9192 commented Oct 18, 2023

SarahAlidoost commented Oct 18, 2023

SarahAlidoost commented Oct 18, 2023 •

edited

Loading

Peter9192 commented Oct 18, 2023

Peter9192 commented Nov 17, 2023

Peter9192 commented Nov 17, 2023

SarahAlidoost commented Nov 21, 2023

SarahAlidoost commented Nov 21, 2023

Fix issue 3 #4

Are you sure you want to change the base?

Fix issue 3 #4

Conversation

SarahAlidoost commented Oct 16, 2023

SarahAlidoost commented Oct 17, 2023

SarahAlidoost commented Oct 17, 2023 • edited Loading

Peter9192 commented Oct 18, 2023

Peter9192 commented Oct 18, 2023

SarahAlidoost commented Oct 18, 2023

SarahAlidoost commented Oct 18, 2023 • edited Loading

Peter9192 commented Oct 18, 2023

Peter9192 commented Nov 17, 2023

Peter9192 commented Nov 17, 2023

SarahAlidoost commented Nov 21, 2023

SarahAlidoost commented Nov 21, 2023

SarahAlidoost commented Oct 17, 2023 •

edited

Loading

SarahAlidoost commented Oct 18, 2023 •

edited

Loading