Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner
Did you know you could train your custom models on Hugging Face Spaces!!!? Yes, its possible and super-easy to do with AutoTrain SpaceRunner 💥 All you need is a Hugging Face account (which you probably have already) and a payment method attached to your account (in case you want to use GPUs, CPU training is free!). So, stop spending time setting up everything on other cloud providers and use AutoTrain SpaceRunner to train your models: the training environment is set up for you already and you can install/uninstall any requirements you might have for your project! Sounds exciting? Let's see how to do it!
The first step would be to create a project folder. The project folder can consist of anything but must have a script.py
file. This script file is the entry-point:
-- my_project
---- some_module
---- some_other_module
---- script.py
---- requirements.txt
requirements.txt
is optional and required only if you want to add/remove anything. For example, the following requirements.txt removes xgboost which is preinstalled and then installs catboost:
-xgboost
catboost
-
in front of package name means uninstall.
How should script.py
look like?
Well, however you want it to look like. Here's a sample:
for _ in range(10):
print("Hello World!")
You can do anything you want in the script.py
. Imports from local modules is also possible as long as they are present in the project directory.
The final step is to run the code on Spaces. Here's how to do it.
Install AutoTrain if not done already, pip install -U autotrain-advanced
. Then you can run autotrain spacerunner --help
. This will give you all the arguments needed.
❯ autotrain spacerunner --help
usage: autotrain <command> [<args>] spacerunner [-h] --project-name PROJECT_NAME --script-path SCRIPT_PATH --username USERNAME --token TOKEN
--backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
[--env ENV] [--args ARGS]
✨ Run AutoTrain SpaceRunner
options:
-h, --help show this help message and exit
--project-name PROJECT_NAME
Name of the project. Must be unique.
--script-path SCRIPT_PATH
Path to the script
--username USERNAME Hugging Face Username, can also be an organization name
--token TOKEN Hugging Face API Token
--backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
Hugging Face backend to use
--env ENV Environment variables, e.g. --env FOO=bar;FOO2=bar2;FOO3=bar3
--args ARGS Arguments to pass to the script, e.g. --args foo=bar;foo2=bar2;foo3=bar3;store_true_arg
--project-name
is the unique name to create space and dataset (containing your project files) on Hugging Face Hub. Everything is stored privately and you can delete it after your script has run.
--script-path
is local path to directory which contains script.py
.
Need to pass environment variables? use --env
and --args
if you need to pass arguments to script.py
.
You can choose any of the spaces-*
backend to run your code. The space will pause itself (thus saving you money) when done. 🚀
Here's an example command:
$ autotrain spacerunner \
--project-name custom_llama_training \
--script-path /path/to/script/py/ \
--username abhishek \
--token $HF_WRITE_TOKEN \
--backend spaces-a10g-large \
--args padding=right;push_to_hub
--env TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error
Locally, the script is run like:
$ TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error python script.py --padding right --push_to_hub
Available Backends:
"spaces-a10g-large": "a10g-large",
"spaces-a10g-small": "a10g-small",
"spaces-a100-large": "a100-large",
"spaces-t4-medium": "t4-medium",
"spaces-t4-small": "t4-small",
"spaces-cpu-upgrade": "cpu-upgrade",
"spaces-cpu-basic": "cpu-basic",
"spaces-l4x1": "l4x1",
"spaces-l4x4": "l4x4",
"spaces-a10g-largex2": "a10g-largex2",
"spaces-a10g-largex4": "a10g-largex4",
After running the spacerunner command, you will be provided with link to space to monitor your training. As simple as that!
Note: autotrain spacerunner will not save artifacts on its own, so you must have code to save the artifacts/outputs in your script.py
. P.S. save them in a huggingface datasets repo ;)
Questions? Comments? Feature requests? Issues? Use the GitHub issues for AutoTrain Advanced: https://github.com/huggingface/autotrain-advanced ⭐️