gtime.feature_generation
.Calendar¶
-
class
gtime.feature_generation.
Calendar
(region: str = 'america', country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Union[List, numpy.ndarray] = None, reindex_method: str = 'pad')¶ Create a feature based on the national holidays of a specific country.
The interface for this is based on the one of ‘workalendar’. To see which regions and countries are available, check the ‘workalendar’ documentation.
- Parameters
- regionstr, optional, default:
'america'
The region in which the
country
is located.- countrystr, optional, default:
'Brazil'
The name of the country from which to retrieve the holidays. The country must be located in the given
region
. For certain countries workalendar provides additional ‘subregions’. In order to use them instead of the whole country, just use the name of the subregion instead of the country name (e.g. ‘Vaud’ instead of ‘Switzerland’ for the canton of Vaud which is a part of Switzerland).- start_datestr, optional, default:
'01/01/2019'
The date starting from which to retrieve the holidays.
- end_datestr, optional, default:
'01/01/2020'
The date until which to retrieve the holidays.
- kernelarray-like, optional, default:
None
The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
- reindex_methodstr, optional, default:
pad
Used only if X is passed in the
transform
method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.
- regionstr, optional, default:
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Calendar >>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18', ... end='2019-04-27', freq='d')) >>> cal_feature = Calendar(region="europe", country="Italy", kernel=[2, 1]) >>> cal_feature.fit_transform(X) status__Calendar 2019-04-18 0.0 2019-04-19 0.0 2019-04-20 0.0 2019-04-21 1.0 2019-04-22 2.0 2019-04-23 0.0 2019-04-24 1.0 2019-04-25 2.0 2019-04-26 0.0 2019-04-27 0.0
Methods
fit
(self, X[, y])Fit the estimator.
fit_transform
(self, X[, y])Fit to data, then transform it.
get_feature_names
(self)Return feature names for output features.
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, time_series, NoneType]=None)Generate a DataFrame containing the events associated to the holidays of the selected
country
.-
__init__
(self, region:str='america', country:str='Brazil', start_date:str='01/01/2018', end_date:str='01/01/2020', kernel:Union[List, numpy.ndarray]=None, reindex_method:str='pad')¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(self, X:pandas.core.frame.DataFrame, y=None)¶ Fit the estimator. Just used to be compatible with the sklearn API.
- Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
fit_transform
(self, X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
- Xnumpy array of shape [n_samples, n_features]
Training set.
- ynumpy array of shape [n_samples]
Target values.
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newnumpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_feature_names
(self)¶ Return feature names for output features.
- Returns
- output_feature_namesndarray, shape (n_output_features,)
Array of feature names.
-
get_params
(self, deep=True)¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfobject
Estimator instance.
-
transform
(self, time_series:Union[pandas.core.frame.DataFrame, NoneType]=None) → pandas.core.frame.DataFrame¶ Generate a DataFrame containing the events associated to the holidays of the selected
country
.- Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
If provided, both
start_date
andend_date
are going to be overwritten with the start and end date of the index oftime_series
. Also, if provided the output DataFrame is going to be re-indexed with the index oftime_series
, using the chosenreindex_method
.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- Returns
- eventspd.DataFrame, shape (length, 1)
A DataFrame containing the events.