faas

Modelling Calls

Scale modeling performs an exhaustive search for best models in time series data, providing information about the fit of the best models, their cross-validation accuracy measures, and many other outputs that are usually of interest. Using the API to send the request allows for multiple requests at once, however, all datasets must contain data with the same frequency.

faas.validate_models()

function validate_models.(data_list, date_variable, date_format, model_spec, project_name)

Sends a request to 4intelligence's Forecast as a Service (FaaS) validation API.

Parameters:

data_list: Dict[str, pd.Dataframe]:

Dictionary of pandas datataframes and their respective keys to be sent to the API
date_variable: str

Name of the variable to be considered as the timesteps
date_format: str

Format of date_variable following datetime notation (See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
model_spec: dict

Dictionary containing arguments required for modeling. The model specifications will be the same for all datasets in the same project. The model_spec expects the following specifications:
- n_steps: forecast horizon that will be used in the cross-validation (if 3, 3 months ahead; if 12, 12 months ahead, etc.); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
- n_windows: how many windows the size of ‘Forecast Horizon’ will be evaluated during cross-validation (CV); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
- log (Optional): if True apply log transformation to the data (only variables with all values greater than 0 will be log transformed); A logical parameter: True or False (Default: True).
- seas.d (Optional): if True, it includes seasonal dummies in every estimation; A logical parameter: True or False (Default: True).
- n_best (Optional): number of best models to be chosen for each feature selection method; Default is 20.
- accuracy_crit (Optional): which criterion should be used to measure the accuracy of the forecast during the CV; Options: "MPE","MAPE", "WMAPE" or "RMSE" (Default: "MAPE").
- exclusions (Optional): restrictions on features in the same model (which variables should not be included in the same model); Default is 'exclusions = []', otherwise it should receive a list of lists containing the exclusion variables in the list.
- golden_variables (Optional): features that must be included in, at least, one model (separate or together); Default is 'golden_variables = []', otherwise it should be a list with the golden variables.
- fill_forecast (Optional): if True, it enables forecasting explanatory variables in order to avoid NAs in future values; A logical parameter: True or False (Default is False).
- cv_summary (Optional): determines whether ‘mean’ ou ‘median’ will be used to calculate the summary statistic of the accuracy measure over the CV windows; Options: "mean" or "median" (Default is "mean").
- selection_methods (Optional): specifies which selection methods should be used for feature selection and whether explanatory variables should be chosen in order to avoid collinearity;
  - lasso: True if our method of feature selection using Lasso should be applied,
  - rf: True if our method of feature selection using Random Forest should be applied,
  - corr: True if our method of feature selection using Pearson correlation filter should be applied,
  - apply.collinear: True if you wish that our feature selection avoids collinearity within the explanatory variables in the models - this is equivalent to setting ["corr","rf","lasso","no_reduction"]. False or "" otherwise.
- lags (Optional): defines dictionary of lags of explanatory variables to be tested in dataset. For example, if you wish to apply lags 1, 2 and 3 to the explanatory variables 'x1' and 'x2' from your dataset, this parameter should be specified as lags = {"x1": [1,2,3], "x2": [1,2,3]}. However, if you wish to test lags 1, 2 and 3 for all explanatory variables in the dataset(s), you can define as lags = {"all": [1,2,3]}. If, for example the user defines lags = {"all": [1,2,3], "x1": [1,2,3,4,5,6]}, lags 1, 2 and 3 will be applied to all explanatory variables, except for 'x1', which lags 1 through 6 will be tested. The default is lags = {}.
- allowdrift (Optional): if True, drift terms are considered in arima models; A logical parameter: True or False (Default: True).
  - Can be set to True or False.
- user_model (Optional): defines one or more models that should be included in the available models. Besides these variables, any variable that is added to regular modeling will also be in the models created from user_model. It is also possible to include a lagged variable (if defined in lags) among the variables in user_model.
project_name: str

Name of the project defined by the user, that should be at most 50 characters long

Returns: API return code, and errors and/or warnings if any were found.

faas.run_models()

function run_models.(data_list, date_variable, date_format, model_spec, project_name, skip_validation= False)

Sends a request to 4intelligence's Forecast as a Service (FaaS) for modeling.

Parameters

data_list: Dict[str, pd.Dataframe]:

Dictionary of pandas datataframes and their respective keys to be sent to the API
date_variable: str Name of the variable to be considered as the timesteps
date_format: str

Format of date_variable following datetime notation (See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
model_spec: dict

Dictionary containing arguments required for modeling. The model specifications will be the same for all datasets in the same project. The model_spec expects the following specifications:
- n_steps: forecast horizon that will be used in the cross-validation (if 3, 3 months ahead; if 12, 12 months ahead, etc.); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
- n_windows: how many windows the size of ‘Forecast Horizon’ will be evaluated during cross-validation (CV); It should be an integer greater than or equal to 1. Typically, 'n_steps+n_windows-1' should not exceed 30% of the length of your data.
- log (Optional): if True apply log transformation to the data (only variables with all values greater than 0 will be log transformed); A logical parameter: True or False (Default: True).
- seas.d (Optional): if True, it includes seasonal dummies in every estimation; A logical parameter: True or False (Default: True).
- n_best (Optional): number of best models to be chosen for each feature selection method; Default is 20.
- accuracy_crit (Optional): which criterion should be used to measure the accuracy of the forecast during the CV; Options: "MPE","MAPE", "WMAPE" or "RMSE" (Default: "MAPE").
- exclusions (Optional): restrictions on features in the same model (which variables should not be included in the same model); Default is 'exclusions = []', otherwise it should receive a list of lists containing the exclusion variables in the list.
- golden_variables (Optional): features that must be included in, at least, one model (separate or together); Default is 'golden_variables = []', otherwise it should be a list with the golden variables.
- fill_forecast (Optional): if True, it enables forecasting explanatory variables in order to avoid NAs in future values; A logical parameter: True or False (Default is False).
- cv_summary (Optional): determines whether ‘mean’ ou ‘median’ will be used to calculate the summary statistic of the accuracy measure over the CV windows; Options: "mean" or "median" (Default is "mean").
- selection_methods (Optional): specifies which selection methods should be used for feature selection and whether explanatory variables should be chosen in order to avoid collinearity;
  - lasso: True if our method of feature selection using Lasso should be applied,
  - rf: True if our method of feature selection using Random Forest should be applied,
  - corr: True if our method of feature selection using Pearson correlation filter should be applied,
  - apply.collinear: True if you wish that our feature selection avoids collinearity within the explanatory variables in the models - this is equivalent to setting ["corr","rf","lasso","no_reduction"]. False or "" otherwise.
- lags (Optional): defines dictionary of lags of explanatory variables to be tested in dataset. For example, if you wish to apply lags 1, 2 and 3 to the explanatory variables 'x1' and 'x2' from your dataset, this parameter should be specified as lags = {"x1": [1,2,3], "x2": [1,2,3]}. However, if you wish to test lags 1, 2 and 3 for all explanatory variables in the dataset(s), you can define as lags = {"all": [1,2,3]}. If, for example the user defines lags = {"all": [1,2,3], "x1": [1,2,3,4,5,6]}, lags 1, 2 and 3 will be applied to all explanatory variables, except for 'x1', which lags 1 through 6 will be tested. The default is lags = {}.
- allowdrift (Optional): if True, drift terms are considered in arima models;
  - Can be set to True or False.
project_name: str

Name of the project defined by the user, that should be at most 50 characters long
skip_validation: bool

If the validation step should be bypassed

Returns: API return code, and errors and/or warnings if any were found.

Validation Error Table

The following table provides the meaning of each error code returned when calling 4intelligence's validation api (through the functions validate_models or run_models with the recommended settings)

status_code	error_message	valid_options
001	You have inserted a non-supported date format	ano / mês / dia: "%Y/%m/%d", "%y/%m/%d", ano / dia / mês: "%Y/%d/%m", "%y/%d/%m", dia / mês / ano: "%d/%m/%Y", "%d/%m/%y", mês / dia / ano: "%m/%d/%Y", "%m/%d/%y", ano - mês - dia: "%Y-%m-%d", "%y-%m-%d", ano - dia - mês: "%Y-%d-%m", "%y-%d-%m", dia - mês - ano: "%d-%m-%Y", "%d-%m-%y", mês - dia - ano: %m-%d-%Y", "%m-%d-%y".
002	You have inserted a non-character object	A character object defining the variable/parameter of interest
003	Your dependent variable does not exist in dataset	A dependent variable name that exists in your dataset
004	You have inserted a variable name that is not in the dataset	The unique name of the date variable in your dataset(s)
005	You have inserted a variable that cannot be converted to date, maybe it contains footnotes?	The unique name of the date variable in your dataset(s)
006	Conversion of date_variable to 'data_tidy' failed	data_tidy
007	data_tidy was not converted to Date type in ALL datasets	Date object
008	data_tidy was not converted to Date type in SOME datasets	Date object
009	date_variable was not converted to %Y-%m-%d	Check https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior. E.g.: "%m/%d/%Y"
010	You have inserted a non-logical variable	TRUE or FALSE
011	You have inserted a non-integer variable	Any integer number greater than zero
012	You have inserted a number smaller/equal to zero and/or non-integer	Any integer number greater than zero
013	You have inserted an invalid option	MAPE, MPE, RMSE, WMAPE, MASE
014	You have inserted an invalid option	AIC, BIC
015	You have inserted an invalid option	mean, median
016	You have inserted a non-list object	A list object
017	Some/all invalid variable(s) in exclusions, lags or user_model	Variables that exist in your dataset or list()
018	Variables inside exclusions must be unique	Unique names of variables that exist in your dataset
019	Some/all invalid variable(s) in golden_variables	Variables that exist in your dataset or c()
020	You have chosen an(some) invalid method(s)	c("","corr","rf","lasso","no_reduction") or simply TRUE/FALSE
021	You have inserted a dummy or categorical variable as dependent variable	Numeric non-dummy dependent variable
022	NA	Please report this problem to support@4intelligence.com.br
023	Please add more observations to your dataset	Number of observations should be greater than (according to frequency) "daily" -> 180, "weekly" -> 52, "fortnightly" ->24, "monthly" -> 36, "bimonthly" -> 24, "quarterly" -> 24, "half-year" -> 24, "annual" -> 12
024	There is more than one observation per frequency period, make sure that you do not have more than one	One observation per frequency period
025	There are too many missing values in every row	Data frames with less missing values per row
026	n_steps and n_windows cover more than 50% of the size of your data	[(n_steps + n_windows - 1) / nrows_training] < 0.5
027	Select at least one method for feature selection (set it as TRUE)	corr = TRUE; lasso = TRUE; rf = TRUE
028	Lags defined in 'lags' must be numeric, greater than 0 and integers	Numeric values such as 1, 2, 3, ...
029	Invalid variable name	Variable name conflicts with lag variable (starts with 'l' and lag number) chosen by user.
030	Multiple data frequency	Datasets in data_list contain more than 1 frequency
031	Exclusion with single element	At least one group of exclusion contain only one element
032	Invalid prefix for variable name ('d4i_' or 'do_')	At least one variable name in datasets of data_list start with 'd4i_' or 'do_'

Validation Warning Table

The following table provides the meaning of each warning code returned when calling 4intelligence's validation api (through the functions validate_models or run_models with the recommended settings)

status_code	warning_message	valid_options
001	One or more variables are dummies or categorical variables and will be disconsidered in exclusions set	A list without dummy or categorical variables
002	One or more variables are dummies or categorical variables and will be disconsidered as golden variables	A vector without dummy or categorical variables
003	One or more variables are dummies or categorical variables and will be disconsidered as variables to apply lag	A list without dummy or categorical variables
004	One or more lag variables may not be included due to minimum data points requirement, linear dependency or being removed during pre-processing	Lag list with fewer lags or dataset with more observations
005	No forecast period provided	Additional dates in dataset to perform forecast
006	Missing values in forecast period lead to shorter or no projections	Explanatory variables with projections

Utility Functions

faas.download_zip()

function download_zip.(project_id, path, filename, verbose)

Makes a request and downloads all files from a project created in FaaS Modelling or Model Update.

Parameters

project_id: str:

id of the project to be downloaded - must have been concluded
path: str

Folder to which the files will be downloaded
filename: str:

name of the zipped file (without the .zip extension)
verbose: bool If the message indicating the path for the downaloaded file is to be printed

Returns: The API response

faas.list_projects()

function list_projects.(return_dict)

Retrieves a list of projects previously sent to be modelled or updated in FaaS from the user.

Parameters

return_dict: str

If a dictionary should be returned instead of a dataframe

Returns: A dataframe or dictionary containing information about the user's projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faas

Modelling Calls

faas.validate_models()

faas.run_models()

Validation Error Table

Validation Warning Table

Utility Functions

faas.download_zip()

faas.list_projects()

Clone this wiki locally