/* ---- Google Analytics Code Below */

Sunday, February 20, 2022

Encoding Time for Machine Learning

Time encoding from NVIDIA.  Time is the most valuable component for any model, even if you only gather time that you gathered from data.  

Three Approaches to Encoding Time Information as Features for ML Models

By Eryk Lewinson  from NVIDIA

Imagine you have just started a new data science project. The goal is to build a model predicting Y, the target variable. You have already received some data from the stakeholders/data engineers, did a thorough EDA, and selected some variables you believe are relevant for the problem at hand. Then you finally built your first model. The score is acceptable, but you believe you can do much better. What do you do?

There are many ways in which you could follow up. One possibility would be to increase the complexity of the machine-learning model you have used. Alternatively, you can try to come up with some more meaningful features and continue to use the current model (at least for the time being).

For many projects, both enterprise data scientists and participants of data science competitions like Kaggle agree that it is the latter – identifying more meaningful features from the data – that can often make the most improvement to model accuracy for the least amount of effort.

You are effectively shifting the complexity from the model to the features. The features do not have to be very complex. But, ideally, we find features that have a strong yet simple relationship with the target variable.

Many data science projects contain some information about the passage of time. And this is not restricted to time series forecasting problems. For example, you can often find such features in traditional regression or classification tasks. This article investigates how to create meaningful features using date-related information. We present three approaches, but we need some preparation first.

Setup and data

For this article, we mostly use very well-known Python packages as well as relying on a relatively unknown one, scikit-lego, which is a library containing numerous useful functionalities that are expanding scikit-learn’s capabilities. We import the required libraries as follows:   (useful intro above, more at the link)  ... ' 

No comments: