gtime.feature_generation.Calendar

class gtime.feature_generation.Calendar(region: str = 'america', country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Union[List, numpy.ndarray] = None, reindex_method: str = 'pad')

Create a feature based on the national holidays of a specific country.

The interface for this is based on the one of ‘workalendar’. To see which regions and countries are available, check the ‘workalendar’ documentation.

Parameters
regionstr, optional, default: 'america'

The region in which the country is located.

countrystr, optional, default: 'Brazil'

The name of the country from which to retrieve the holidays. The country must be located in the given region. For certain countries workalendar provides additional ‘subregions’. In order to use them instead of the whole country, just use the name of the subregion instead of the country name (e.g. ‘Vaud’ instead of ‘Switzerland’ for the canton of Vaud which is a part of Switzerland).

start_datestr, optional, default: '01/01/2019'

The date starting from which to retrieve the holidays.

end_datestr, optional, default: '01/01/2020'

The date until which to retrieve the holidays.

kernelarray-like, optional, default: None

The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.

reindex_methodstr, optional, default: pad

Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(region="europe", country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0

Methods

fit(self, X[, y])

Fit the estimator.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_feature_names(self)

Return feature names for output features.

get_params(self[, deep])

Get parameters for this estimator.

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, time_series, NoneType]=None)

Generate a DataFrame containing the events associated to the holidays of the selected country.

__init__(self, region:str='america', country:str='Brazil', start_date:str='01/01/2018', end_date:str='01/01/2020', kernel:Union[List, numpy.ndarray]=None, reindex_method:str='pad')

Initialize self. See help(type(self)) for accurate signature.

fit(self, X:pandas.core.frame.DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters
Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns
selfobject

Returns self.

fit_transform(self, X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xnumpy array of shape [n_samples, n_features]

Training set.

ynumpy array of shape [n_samples]

Target values.

**fit_paramsdict

Additional fit parameters.

Returns
X_newnumpy array of shape [n_samples, n_features_new]

Transformed array.

get_feature_names(self)

Return feature names for output features.

Returns
output_feature_namesndarray, shape (n_output_features,)

Array of feature names.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.

transform(self, time_series:Union[pandas.core.frame.DataFrame, NoneType]=None) → pandas.core.frame.DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters
time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns
eventspd.DataFrame, shape (length, 1)

A DataFrame containing the events.