What libraries can I use for Tabular Regression?

The and sklearn library is compatible with Tabular Regression.

What models can I use for Tabular Regression?

The and scikit-learn/Fish-Weight model can be used for Tabular Regression.

What datasets can I use for Tabular Regression?

The and inria-soda/tabular-benchmark dataset can be used for Tabular Regression.

What metrics can I use for Tabular Regression?

The mseand r-squared metrics can be used for Tabular Regression.

Car Name	Horsepower	Weight
ford torino	140	3,449
amc hornet	97	2,774
toyota corolla	65	1,773

MPG (miles per gallon)
17
18
31

About the Task

Tabular regression is the task of predicting a numerical value given a set of attributes/features. Tabular meaning that data is stored in a table (like an excel sheet), and each sample is contained in its own row. The features used to predict our target can be both numerical and categorical. However, including categorical features often requires additional preprocessing/feature engineering (a few models do accept categorical features directly, like CatBoost). An example of tabular regression would be predicting the weight of a fish given its' species and length.

Use Cases

Sales Prediction: a Use Case for Predicting a Continuous Target Variable

Here the objective is to predict a continuous variable based on a set of input variable(s). For example, predicting sales of an ice cream shop based on temperature of weather and duration of hours shop was open. Here we can build a regression model with temperature and duration of hours as input variable and sales as target variable.

Missing Value Imputation for Other Tabular Tasks

In real-world applications, due to human error or other reasons, some of the input values can be missing or there might not be any recorded data. Considering the example above, say the shopkeeper's watch was broken and they forgot to calculate the hours for which the shop was open. This will lead to a missing value in their dataset. In this case, missing values could be replaced it with zero, or average hours for which the shop is kept open. Another approach we can try is to use temperature and sales variables to predict the hours variable here.

Model Training

A simple regression model can be created using sklearn as follows:

#set the input features
X = data[["Feature 1", "Feature 2", "Feature 3"]]
#set the target variable
y = data["Target Variable"]
#initialize the model
model = LinearRegression()
#Fit the model
model.fit(X, y)

Model Hosting and Inference

You can use skops for model hosting and inference on the Hugging Face Hub. This library is built to improve production workflows of various libraries that are used to train tabular models, including sklearn and xgboost. Using skops you can:

Easily use Inference Endpoints,
Build neat UIs with one line of code,
Programmatically create model cards,
Securely serialize your models. (See limitations of using pickle here.)

You can push your model as follows:

from skops import hub_utils
# initialize a repository with a trained model
local_repo = "/path_to_new_repo"
hub_utils.init(model, dst=local_repo)
# push to Hub!
hub_utils.push("username/my-awesome-model", source=local_repo)

Once the model is pushed, you can infer easily.

import skops.hub_utils as hub_utils
import pandas as pd
data = pd.DataFrame(your_data)
# Load the model from the Hub
res = hub_utils.get_model_output("username/my-awesome-model", data)

You can launch a UI for your model with only one line of code!

import gradio as gr
gr.Interface.load("huggingface/username/my-awesome-model").launch()

Useful Resources

Skops documentation
Check out interactive sklearn examples built with ❤️ using Gradio.
Notebook: Persisting your scikit-learn model using skops
For starting with tabular regression:
- Doing Exploratory Data Analysis for tabular data.
  - The data considered here consists of details of Olympic athletes and medal results from Athens 1896 to Rio 2016.
  - Here you can learn more about how to explore and analyse the data and visualize them in order to get a better understanding of dataset.
- Building your first ML model.
Intermediate level tutorials on tabular regression:
- A Short Chronology of Deep Learning for Tabular Data by Sebastian Raschka.

Training your own model in just a few seconds

We have built a baseline trainer application to which you can drag and drop your dataset. It will train a baseline and push it to your Hugging Face Hub profile with a model card containing information about the model.

This page was made possible thanks to efforts of Brenden Connors and Ayush Bihani.

Tabular Regression

About Tabular Regression