-
Notifications
You must be signed in to change notification settings - Fork 1
faas
Scale modeling performs an exhaustive search for best models in time series data, providing information about the fit of the best models, their cross-validation accuracy measures, and many other outputs that are usually of interest. Using the API to send the request allows for multiple requests at once, however, all datasets must contain data with the same frequency.
function validate_models.(data_list, date_variable, date_format, model_spec, project_name)
Sends a request to 4intelligence's Forecast as a Service (FaaS) validation API.
Parameters:
-
data_list: Dict[str, pd.Dataframe]:
Dictionary of pandas datataframes and their respective keys to be sent to the API
-
date_variable: str
Name of the variable to be considered as the timesteps
-
date_format: str
Format of date_variable following datetime notation (See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
-
model_spec: dict
Dictionary containing arguments required for modeling. The model specifications will be the same for all datasets in the same project. The model_spec expects the following specifications:
-
n_steps: forecast horizon that will be used in the cross-validation (if 3, 3 months ahead; if 12, 12 months ahead, etc.); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
-
n_windows: how many windows the size of ‘Forecast Horizon’ will be evaluated during cross-validation (CV); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
-
log (Optional): if True apply log transformation to the data (only variables with all values greater than 0 will be log transformed); A logical parameter: True or False (Default: True).
-
seas.d (Optional): if True, it includes seasonal dummies in every estimation; A logical parameter: True or False (Default: True).
-
n_best (Optional): number of best models to be chosen for each feature selection method; Default is 20.
-
accuracy_crit (Optional): which criterion should be used to measure the accuracy of the forecast during the CV; Options: "MPE","MAPE", "WMAPE" or "RMSE" (Default: "MAPE").
-
exclusions (Optional): restrictions on features in the same model (which variables should not be included in the same model); Default is 'exclusions = []', otherwise it should receive a list of lists containing the exclusion variables in the list.
-
golden_variables (Optional): features that must be included in, at least, one model (separate or together); Default is 'golden_variables = []', otherwise it should be a list with the golden variables.
-
fill_forecast (Optional): if True, it enables forecasting explanatory variables in order to avoid NAs in future values; A logical parameter: True or False (Default is False).
-
cv_summary (Optional): determines whether ‘mean’ ou ‘median’ will be used to calculate the summary statistic of the accuracy measure over the CV windows; Options: "mean" or "median" (Default is "mean").
-
selection_methods (Optional): specifies which selection methods should be used for feature selection and whether explanatory variables should be chosen in order to avoid collinearity;
- lasso: True if our method of feature selection using Lasso should be applied,
- rf: True if our method of feature selection using Random Forest should be applied,
- corr: True if our method of feature selection using Pearson correlation filter should be applied,
- apply.collinear: True if you wish that our feature selection avoids collinearity within the explanatory variables in the models - this is equivalent to setting ["corr","rf","lasso","no_reduction"]. False or "" otherwise.
-
lags (Optional): defines dictionary of lags of explanatory variables to be tested in dataset. For example, if you wish to apply lags 1, 2 and 3 to the explanatory variables 'x1' and 'x2' from your dataset, this parameter should be specified as lags = {"x1": [1,2,3], "x2": [1,2,3]}. However, if you wish to test lags 1, 2 and 3 for all explanatory variables in the dataset(s), you can define as lags = {"all": [1,2,3]}. If, for example the user defines lags = {"all": [1,2,3], "x1": [1,2,3,4,5,6]}, lags 1, 2 and 3 will be applied to all explanatory variables, except for 'x1', which lags 1 through 6 will be tested. The default is lags = {}.
-
allowdrift (Optional): if True, drift terms are considered in arima models; A logical parameter: True or False (Default: True).
- Can be set to True or False.
-
user_model (Optional): defines one or more models that should be included in the available models. Besides these variables, any variable that is added to regular modeling will also be in the models created from user_model. It is also possible to include a lagged variable (if defined in lags) among the variables in user_model.
-
-
project_name: str
Name of the project defined by the user, that should be at most 50 characters long
Returns: API return code, and errors and/or warnings if any were found.
function run_models.(data_list, date_variable, date_format, model_spec, project_name, skip_validation= False)
Sends a request to 4intelligence's Forecast as a Service (FaaS) for modeling.
Parameters
-
data_list: Dict[str, pd.Dataframe]:
Dictionary of pandas datataframes and their respective keys to be sent to the API
-
date_variable: str Name of the variable to be considered as the timesteps
-
date_format: str
Format of date_variable following datetime notation (See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
-
model_spec: dict
Dictionary containing arguments required for modeling. The model specifications will be the same for all datasets in the same project. The model_spec expects the following specifications:
-
n_steps: forecast horizon that will be used in the cross-validation (if 3, 3 months ahead; if 12, 12 months ahead, etc.); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
-
n_windows: how many windows the size of ‘Forecast Horizon’ will be evaluated during cross-validation (CV); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
-
log (Optional): if True apply log transformation to the data (only variables with all values greater than 0 will be log transformed); A logical parameter: True or False (Default: True).
-
seas.d (Optional): if True, it includes seasonal dummies in every estimation; A logical parameter: True or False (Default: True).
-
n_best (Optional): number of best models to be chosen for each feature selection method; Default is 20.
-
accuracy_crit (Optional): which criterion should be used to measure the accuracy of the forecast during the CV; Options: "MPE","MAPE", "WMAPE" or "RMSE" (Default: "MAPE").
-
exclusions (Optional): restrictions on features in the same model (which variables should not be included in the same model); Default is 'exclusions = []', otherwise it should receive a list of lists containing the exclusion variables in the list.
-
golden_variables (Optional): features that must be included in, at least, one model (separate or together); Default is 'golden_variables = []', otherwise it should be a list with the golden variables.
-
fill_forecast (Optional): if True, it enables forecasting explanatory variables in order to avoid NAs in future values; A logical parameter: True or False (Default is False).
-
cv_summary (Optional): determines whether ‘mean’ ou ‘median’ will be used to calculate the summary statistic of the accuracy measure over the CV windows; Options: "mean" or "median" (Default is "mean").
-
selection_methods (Optional): specifies which selection methods should be used for feature selection and whether explanatory variables should be chosen in order to avoid collinearity;
- lasso: True if our method of feature selection using Lasso should be applied,
- rf: True if our method of feature selection using Random Forest should be applied,
- corr: True if our method of feature selection using Pearson correlation filter should be applied,
- apply.collinear: True if you wish that our feature selection avoids collinearity within the explanatory variables in the models - this is equivalent to setting ["corr","rf","lasso","no_reduction"]. False or "" otherwise.
-
lags (Optional): defines dictionary of lags of explanatory variables to be tested in dataset. For example, if you wish to apply lags 1, 2 and 3 to the explanatory variables 'x1' and 'x2' from your dataset, this parameter should be specified as lags = {"x1": [1,2,3], "x2": [1,2,3]}. However, if you wish to test lags 1, 2 and 3 for all explanatory variables in the dataset(s), you can define as lags = {"all": [1,2,3]}. If, for example the user defines lags = {"all": [1,2,3], "x1": [1,2,3,4,5,6]}, lags 1, 2 and 3 will be applied to all explanatory variables, except for 'x1', which lags 1 through 6 will be tested. The default is lags = {}.
-
allowdrift (Optional): if True, drift terms are considered in arima models;
- Can be set to True or False.
-
-
project_name: str
Name of the project defined by the user, that should be at most 50 characters long
-
skip_validation: bool
If the validation step should be bypassed
Returns: API return code, and errors and/or warnings if any were found.
The following table provides the meaning of each error code returned when calling 4intelligence's validation api (through the functions validate_models or run_models with the recommended settings)
status_code | error_message | valid_options |
---|---|---|
001 | You have inserted a non-supported date format | ano / mês / dia: "%Y/%m/%d", "%y/%m/%d", ano / dia / mês: "%Y/%d/%m", "%y/%d/%m", dia / mês / ano: "%d/%m/%Y", "%d/%m/%y", mês / dia / ano: "%m/%d/%Y", "%m/%d/%y", ano - mês - dia: "%Y-%m-%d", "%y-%m-%d", ano - dia - mês: "%Y-%d-%m", "%y-%d-%m", dia - mês - ano: "%d-%m-%Y", "%d-%m-%y", mês - dia - ano: %m-%d-%Y", "%m-%d-%y". |
002 | You have inserted a non-character object | A character object defining the variable/parameter of interest |
003 | Your dependent variable does not exist in dataset | A dependent variable name that exists in your dataset |
004 | You have inserted a variable name that is not in the dataset | The unique name of the date variable in your dataset(s) |
005 | You have inserted a variable that cannot be converted to date, maybe it contains footnotes? | The unique name of the date variable in your dataset(s) |
006 | Conversion of date_variable to 'data_tidy' failed | data_tidy |
007 | data_tidy was not converted to Date type in ALL datasets | Date object |
008 | data_tidy was not converted to Date type in SOME datasets | Date object |
009 | date_variable was not converted to %Y-%m-%d | Check https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior. E.g.: "%m/%d/%Y" |
010 | You have inserted a non-logical variable | TRUE or FALSE |
011 | You have inserted a non-integer variable | Any integer number greater than zero |
012 | You have inserted a number smaller/equal to zero and/or non-integer | Any integer number greater than zero |
013 | You have inserted an invalid option | MAPE, MPE, RMSE, WMAPE, MASE |
014 | You have inserted an invalid option | AIC, BIC |
015 | You have inserted an invalid option | mean, median |
016 | You have inserted a non-list object | A list object |
017 | Some/all invalid variable(s) in exclusions, lags or user_model | Variables that exist in your dataset or list() |
018 | Variables inside exclusions must be unique | Unique names of variables that exist in your dataset |
019 | Some/all invalid variable(s) in golden_variables | Variables that exist in your dataset or c() |
020 | You have chosen an(some) invalid method(s) | c("","corr","rf","lasso","no_reduction") or simply TRUE/FALSE |
021 | You have inserted a dummy or categorical variable as dependent variable | Numeric non-dummy dependent variable |
022 | NA | Please report this problem to support@4intelligence.com.br |
023 | Please add more observations to your dataset | Number of observations should be greater than (according to frequency) "daily" -> 180, "weekly" -> 52, "fortnightly" ->24, "monthly" -> 36, "bimonthly" -> 24, "quarterly" -> 24, "half-year" -> 24, "annual" -> 12 |
024 | There is more than one observation per frequency period, make sure that you do not have more than one | One observation per frequency period |
025 | There are too many missing values in every row | Data frames with less missing values per row |
026 | n_steps and n_windows cover more than 50% of the size of your data | [(n_steps + n_windows - 1) / nrows_training] < 0.5 |
027 | Select at least one method for feature selection (set it as TRUE) | corr = TRUE; lasso = TRUE; rf = TRUE |
028 | Lags defined in 'lags' must be numeric, greater than 0 and integers | Numeric values such as 1, 2, 3, ... |
029 | Invalid variable name | Variable name conflicts with lag variable (starts with 'l' and lag number) chosen by user. |
030 | Multiple data frequency | Datasets in data_list contain more than 1 frequency |
031 | Exclusion with single element | At least one group of exclusion contain only one element |
032 | Invalid prefix for variable name ('d4i_' or 'do_') | At least one variable name in datasets of data_list start with 'd4i_' or 'do_' |
The following table provides the meaning of each warning code returned when calling 4intelligence's validation api (through the functions validate_models or run_models with the recommended settings)
status_code | warning_message | valid_options |
---|---|---|
001 | One or more variables are dummies or categorical variables and will be disconsidered in exclusions set | A list without dummy or categorical variables |
002 | One or more variables are dummies or categorical variables and will be disconsidered as golden variables | A vector without dummy or categorical variables |
003 | One or more variables are dummies or categorical variables and will be disconsidered as variables to apply lag | A list without dummy or categorical variables |
004 | One or more lag variables may not be included due to minimum data points requirement, linear dependency or being removed during pre-processing | Lag list with fewer lags or dataset with more observations |
005 | No forecast period provided | Additional dates in dataset to perform forecast |
006 | Missing values in forecast period lead to shorter or no projections | Explanatory variables with projections |
function download_zip.(project_id, path, filename, verbose)
Makes a request and downloads all files from a project created in FaaS Modelling or Model Update.
Parameters
-
project_id: str:
id of the project to be downloaded - must have been concluded
-
path: str
Folder to which the files will be downloaded
-
filename: str:
name of the zipped file (without the .zip extension)
-
verbose: bool If the message indicating the path for the downaloaded file is to be printed
Returns: The API response
function list_projects.(return_dict)
Retrieves a list of projects previously sent to be modelled or updated in FaaS from the user.
Parameters
-
return_dict: str
If a dictionary should be returned instead of a dataframe
Returns: A dataframe or dictionary containing information about the user's projects