Tabular Regression
Tabular regression is the task of predicting a numerical value given a set of attributes.
Car Name | Horsepower | Weight |
---|---|---|
ford torino | 140 | 3,449 |
amc hornet | 97 | 2,774 |
toyota corolla | 65 | 1,773 |
MPG (miles per gallon) |
---|
17 |
18 |
31 |
About Tabular Regression
About the Task
Tabular regression is the task of predicting a numerical value given a set of attributes/features. Tabular meaning that data is stored in a table (like an excel sheet), and each sample is contained in its own row. The features used to predict our target can be both numerical and categorical. However, including categorical features often requires additional preprocessing/feature engineering (a few models do accept categorical features directly, like CatBoost). An example of tabular regression would be predicting the weight of a fish given its' species and length.
Use Cases
Sales Prediction: a Use Case for Predicting a Continuous Target Variable
Here the objective is to predict a continuous variable based on a set of input variable(s). For example, predicting sales
of an ice cream shop based on temperature
of weather and duration of hours
shop was open. Here we can build a regression model with temperature
and duration of hours
as input variable and sales
as target variable.
Missing Value Imputation for Other Tabular Tasks
In real-world applications, due to human error or other reasons, some of the input values can be missing or there might not be any recorded data. Considering the example above, say the shopkeeper's watch was broken and they forgot to calculate the hours
for which the shop was open. This will lead to a missing value in their dataset. In this case, missing values could be replaced it with zero, or average hours for which the shop is kept open. Another approach we can try is to use temperature
and sales
variables to predict the hours
variable here.
Model Training
A simple regression model can be created using sklearn
as follows:
#set the input features
X = data[["Feature 1", "Feature 2", "Feature 3"]]
#set the target variable
y = data["Target Variable"]
#initialize the model
model = LinearRegression()
#Fit the model
model.fit(X, y)
Model Hosting and Inference
You can use skops for model hosting and inference on the Hugging Face Hub. This library is built to improve production workflows of various libraries that are used to train tabular models, including sklearn and xgboost. Using skops
you can:
- Easily use Inference Endpoints,
- Build neat UIs with one line of code,
- Programmatically create model cards,
- Securely serialize your models. (See limitations of using pickle here.)
You can push your model as follows:
from skops import hub_utils
# initialize a repository with a trained model
local_repo = "/path_to_new_repo"
hub_utils.init(model, dst=local_repo)
# push to Hub!
hub_utils.push("username/my-awesome-model", source=local_repo)
Once the model is pushed, you can infer easily.
import skops.hub_utils as hub_utils
import pandas as pd
data = pd.DataFrame(your_data)
# Load the model from the Hub
res = hub_utils.get_model_output("username/my-awesome-model", data)
You can launch a UI for your model with only one line of code!
import gradio as gr
gr.Interface.load("huggingface/username/my-awesome-model").launch()
Useful Resources
Check out interactive sklearn examples built with ❤️ using Gradio.
For starting with tabular regression:
- Doing Exploratory Data Analysis for tabular data.
- The data considered here consists of details of Olympic athletes and medal results from Athens 1896 to Rio 2016.
- Here you can learn more about how to explore and analyse the data and visualize them in order to get a better understanding of dataset.
- Building your first ML model.
- Doing Exploratory Data Analysis for tabular data.
Intermediate level tutorials on tabular regression:
- A Short Chronology of Deep Learning for Tabular Data by Sebastian Raschka.
Training your own model in just a few seconds
We have built a baseline trainer application to which you can drag and drop your dataset. It will train a baseline and push it to your Hugging Face Hub profile with a model card containing information about the model.
This page was made possible thanks to efforts of Brenden Connors and Ayush Bihani.
Compatible libraries
Note Fish weight prediction based on length measurements and species.
Note A comprehensive curation of datasets covering all benchmarks.
Note An application that can predict weight of a fish based on set of attributes.
- mse
- Mean Squared Error(MSE) is the average of the square of difference between the predicted and actual values.
- r-squared
- Coefficient of determination (or R-squared) is a measure of how well the model fits the data. Higher R-squared is considered a better fit.