Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Utils] _calculate_sigma has warning if series has 1 data point #698

Closed
cparmet opened this issue Nov 14, 2023 · 0 comments · Fixed by #699
Closed

[Utils] _calculate_sigma has warning if series has 1 data point #698

cparmet opened this issue Nov 14, 2023 · 0 comments · Fixed by #699
Labels

Comments

@cparmet
Copy link
Contributor

cparmet commented Nov 14, 2023

Hi! Issue Severity = None because now that I understand the warning, it's no problem! Please let me know any questions. And thanks all!

What happened + What you expected to happen

What happened

When I fit Naive() models to a dataframe consisting of series with 1 point each, I get intermittent warnings:
RuntimeWarning: invalid value encountered in double_scalars. sigma = sigma / n

The warning occurs once. If I fit a fresh model to the same data 50 times, the warning only occurs once. Execute the loop again, zero warnings. But restart kernel, the loop throws one warning.

What I expected to happen

Forecasts looked as expected given the algorithm, and a Naive forecast on a series with one data point seems acceptable. But the warning made me look for a bug in my code or invalid data. The warning occurs when Naive().fit() calls _calculate_sigma(). I could see how we might get a divide by 0 or NaN issue in a standard deviation, but the intermittent nature of the warning was confusing, since this is a deterministic model fit to the same dataset.

From discussion with José on Nixtla Slack:

  • The issue is indeed from fitting Naive() to a series with only 1 data point. After the chat I understood how the n passed to _calculate_sigma() is the series length - 1, so with my dataset n=0, and sigma / n divides by zero, throwing the warning.
  • The intermittency appears to be related to the warnings module, but the divide by zero occurs every time.

José's suggestion:

We should probably change that so that if n is zero sigma is 0 or similar. Can you open an issue for that?

Versions / Dependencies

statsforecast 1.5.0 and 1.6.0
A Python 3.8 image on SageMaker

Also reproduced in Colab with Python 3.10.12, statsforecast 1.6.0

Reproduction script

# !pip install pandas statsforecast==1.6.0 # if running in notebook
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import Naive

X = pd.DataFrame({'unique_id':[0,1,2,3,4], "ds":[1,1,1,1,1], "y":[10,20,30,40,50]})

naive_model = StatsForecast(models = [Naive()],
                            freq = 'MS',
                            n_jobs = 1)
naive_model.fit(X)

# Output
# /usr/local/lib/python3.10/dist-packages/statsforecast/utils.py:349: RuntimeWarning: invalid value encountered double_scalars
#  sigma = sigma / n  
# StatsForecast(models=[Naive])

Issue Severity

None

@cparmet cparmet added the bug label Nov 14, 2023
@cparmet cparmet changed the title Utils: _calculate_sigma handles series with 1 data point [Utils] _calculate_sigma handles series with 1 data point Nov 14, 2023
@cparmet cparmet changed the title [Utils] _calculate_sigma handles series with 1 data point [Utils] _calculate_sigma has warning if series has 1 data point Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant