Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner

Community Article Published May 9, 2024

Did you know you could train your custom models on Hugging Face Spaces!!!? Yes, its possible and super-easy to do with AutoTrain SpaceRunner 💥 All you need is a Hugging Face account (which you probably have already) and a payment method attached to your account (in case you want to use GPUs, CPU training is free!). So, stop spending time setting up everything on other cloud providers and use AutoTrain SpaceRunner to train your models: the training environment is set up for you already and you can install/uninstall any requirements you might have for your project! Sounds exciting? Let's see how to do it!

The first step would be to create a project folder. The project folder can consist of anything but must have a script.py file. This script file is the entry-point:

-- my_project
---- some_module
---- some_other_module
---- script.py
---- requirements.txt

requirements.txt is optional and required only if you want to add/remove anything. For example, the following requirements.txt removes xgboost which is preinstalled and then installs catboost:

-xgboost
catboost

- in front of package name means uninstall.

How should script.py look like?

Well, however you want it to look like. Here's a sample:

for _ in range(10):
    print("Hello World!")

You can do anything you want in the script.py. Imports from local modules is also possible as long as they are present in the project directory.

The final step is to run the code on Spaces. Here's how to do it.

Install AutoTrain if not done already, pip install -U autotrain-advanced. Then you can run autotrain spacerunner --help. This will give you all the arguments needed.

❯ autotrain spacerunner --help
usage: autotrain <command> [<args>] spacerunner [-h] --project-name PROJECT_NAME --script-path SCRIPT_PATH --username USERNAME --token TOKEN
                                                --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                                                [--env ENV] [--args ARGS]

✨ Run AutoTrain SpaceRunner

options:
  -h, --help            show this help message and exit
  --project-name PROJECT_NAME
                        Name of the project. Must be unique.
  --script-path SCRIPT_PATH
                        Path to the script
  --username USERNAME   Hugging Face Username, can also be an organization name
  --token TOKEN         Hugging Face API Token
  --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                        Hugging Face backend to use
  --env ENV             Environment variables, e.g. --env FOO=bar;FOO2=bar2;FOO3=bar3
  --args ARGS           Arguments to pass to the script, e.g. --args foo=bar;foo2=bar2;foo3=bar3;store_true_arg

--project-name is the unique name to create space and dataset (containing your project files) on Hugging Face Hub. Everything is stored privately and you can delete it after your script has run.

--script-path is local path to directory which contains script.py.

Need to pass environment variables? use --env and --args if you need to pass arguments to script.py.

You can choose any of the spaces-* backend to run your code. The space will pause itself (thus saving you money) when done. 🚀

Here's an example command:

$ autotrain spacerunner \
    --project-name custom_llama_training \
    --script-path /path/to/script/py/ \
    --username abhishek \
    --token $HF_WRITE_TOKEN \
    --backend spaces-a10g-large \
    --args padding=right;push_to_hub
    --env TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error

Locally, the script is run like:

$ TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error python script.py --padding right --push_to_hub

Available Backends:

"spaces-a10g-large": "a10g-large",
"spaces-a10g-small": "a10g-small",
"spaces-a100-large": "a100-large",
"spaces-t4-medium": "t4-medium",
"spaces-t4-small": "t4-small",
"spaces-cpu-upgrade": "cpu-upgrade",
"spaces-cpu-basic": "cpu-basic",
"spaces-l4x1": "l4x1",
"spaces-l4x4": "l4x4",
"spaces-a10g-largex2": "a10g-largex2",
"spaces-a10g-largex4": "a10g-largex4",

After running the spacerunner command, you will be provided with link to space to monitor your training. As simple as that!

Note: autotrain spacerunner will not save artifacts on its own, so you must have code to save the artifacts/outputs in your script.py. P.S. save them in a huggingface datasets repo ;)

Questions? Comments? Feature requests? Issues? Use the GitHub issues for AutoTrain Advanced: https://github.com/huggingface/autotrain-advanced ⭐️

Upvote