Hugging Face Hub API
Below is the documentation for the HfApi
class, which serves as a Python wrapper for the Hugging Face
Hub’s API.
All methods from the HfApi
are also accessible from the package’s root directly, both approaches are detailed
below.
The following approach uses the method from the root of the package:
from huggingface_hub import list_models
models = list_models()
The following approach uses the HfApi
class:
from huggingface_hub import HfApi
hf_api = HfApi()
models = hf_api.list_models()
Using the HfApi
class directly enables you to set a different endpoint to that of the Hugging Face’s Hub.
create_repo
< source >(
repo_id: str = None
token: typing.Optional[str] = None
organization: typing.Optional[str] = None
private: typing.Optional[bool] = None
repo_type: typing.Optional[str] = None
exist_ok: typing.Optional[bool] = False
space_sdk: typing.Optional[str] = None
name: typing.Optional[str] = None
)
→
str
Parameters
-
repo_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
.Version added: 0.5
-
token (
str
, optional) — An authentication token [1]_. -
private (
bool
, optional) — Whether the model repo should be private. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if uploading to a dataset or space,None
or"model"
if uploading to a model. Default isNone
. -
exist_ok (
bool
, optional, defaults toFalse
) — IfTrue
, do not raise an error if repo already exists. -
space_sdk (
str
, optional) — Choice of SDK to use if repo_type is “space”. Can be “streamlit”, “gradio”, or “static”.
Returns
str
URL to the newly created repo.
Create an empty repo on the HuggingFace Hub.
References:
dataset_info
< source >(
repo_id: str
revision: typing.Optional[str] = None
token: typing.Optional[str] = None
timeout: typing.Optional[float] = None
)
→
DatasetInfo
Parameters
-
repo_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
. -
revision (
str
, optional) — The revision of the dataset repository from which to get the information. -
token (
str
, optional) — An authentication token [1]_. -
timeout (
float
, optional) — Whether to set a timeout for the request to the Hub.
Returns
DatasetInfo
The dataset repository information.
Get info on one specific dataset on huggingface.co
Dataset can be private if you pass an acceptable token.
References:
delete_file
< source >( path_in_repo: str repo_id: str token: typing.Optional[str] = None repo_type: typing.Optional[str] = None revision: typing.Optional[str] = None )
Parameters
-
path_in_repo (
str
) — Relative filepath in the repo, for example:"checkpoints/1fec34a/weights.bin"
-
repo_id (
str
) — The repository from which the file will be deleted, for example:"username/custom_transformers"
-
token (
str
, optional) — Authentication token, obtained withHfApi.login
method. Will default to the stored token. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if the file is in a dataset or space,None
or"model"
if in a model. Default isNone
. -
revision (
str
, optional) — The git revision to commit from. Defaults to the head of the"main"
branch.
Deletes a file in the given repo.
Raises the following errors:
HTTPError
if the HuggingFace API returned an errorValueError
if some parameter value is invalid
delete_repo
< source >( repo_id: str = None token: typing.Optional[str] = None organization: typing.Optional[str] = None repo_type: typing.Optional[str] = None name: str = None )
Parameters
Delete a repo from the HuggingFace Hub. CAUTION: this is irreversible.
References:
Gets all valid dataset tags as a nested namespace object.
get_full_repo_name
< source >(
model_id: str
organization: typing.Optional[str] = None
token: typing.Optional[str] = None
)
→
str
Parameters
-
model_id (
str
) — The name of the model. -
organization (
str
, optional) — If passed, the repository name will be in the organization namespace instead of the user namespace. -
token (
str
, optional) — The Hugging Face authentication token
Returns
str
The repository name in the user’s namespace ({username}/{model_id}) if no organization is passed, and under the organization namespace ({organization}/{model_id}) otherwise.
Returns the repository name for a given model ID and optional organization.
Gets all valid model tags as a nested namespace object
list_datasets
< source >( filter: typing.Union[huggingface_hub.utils.endpoint_helpers.DatasetFilter, str, typing.Iterable[str], NoneType] = None author: typing.Optional[str] = None search: typing.Optional[str] = None sort: typing.Union[typing.Literal['lastModified'], str, NoneType] = None direction: typing.Optional[typing.Literal[-1]] = None limit: typing.Optional[int] = None cardData: typing.Optional[bool] = None full: typing.Optional[bool] = None use_auth_token: typing.Optional[str] = None )
Parameters
-
filter (DatasetFilter or
str
orIterable
, optional) — A string or DatasetFilter which can be used to identify datasets on the hub. -
author (
str
, optional) — A string which identify the author of the returned models -
search (
str
, optional) — A string that will be contained in the returned models. -
sort (
Literal["lastModified"]
orstr
, optional) — The key with which to sort the resulting datasets. Possible values are the properties of theDatasetInfo
class. -
direction (
Literal[-1]
orint
, optional) — Direction in which to sort. The value-1
sorts by descending order while all other values sort by ascending order. -
limit (
int
, optional) — The limit on the number of datasets fetched. Leaving this option toNone
fetches all datasets. -
cardData (
bool
, optional) — Whether to grab the metadata for the dataset as well. Can contain useful information such as the PapersWithCode ID. -
full (
bool
, optional) — Whether to fetch all dataset data, including thelastModified
and thecardData
. -
use_auth_token (
bool
orstr
, optional) — Whether to use theauth_token
provided from thehuggingface_hub
cli. If not logged in, a validauth_token
can be passed in as a string.
Get the public list of all the datasets on huggingface.co
Example usage with the filter
argument:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all datasets
>>> api.list_datasets()
>>> # Get all valid search arguments
>>> args = DatasetSearchArguments()
>>> # List only the text classification datasets
>>> api.list_datasets(filter="task_categories:text-classification")
>>> # Using the `DatasetFilter`
>>> filt = DatasetFilter(task_categories="text-classification")
>>> # With `DatasetSearchArguments`
>>> filt = DatasetFilter(task=args.task_categories.text_classification)
>>> api.list_models(filter=filt)
>>> # List only the datasets in russian for language modeling
>>> api.list_datasets(
... filter=("languages:ru", "task_ids:language-modeling")
... )
>>> # Using the `DatasetFilter`
>>> filt = DatasetFilter(languages="ru", task_ids="language-modeling")
>>> # With `DatasetSearchArguments`
>>> filt = DatasetFilter(
... languages=args.languages.ru,
... task_ids=args.task_ids.language_modeling,
... )
>>> api.list_datasets(filter=filt)
Example usage with the search
argument:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all datasets with "text" in their name
>>> api.list_datasets(search="text")
>>> # List all datasets with "text" in their name made by google
>>> api.list_datasets(search="text", author="google")
list_metrics
< source >(
)
→
List[MetricInfo]
Returns
List[MetricInfo]
a list of MetricInfo
objects which.
Get the public list of all the metrics on huggingface.co
list_models
< source >( filter: typing.Union[huggingface_hub.utils.endpoint_helpers.ModelFilter, str, typing.Iterable[str], NoneType] = None author: typing.Optional[str] = None search: typing.Optional[str] = None emissions_thresholds: typing.Union[typing.Tuple[float, float], NoneType] = None sort: typing.Union[typing.Literal['lastModified'], str, NoneType] = None direction: typing.Optional[typing.Literal[-1]] = None limit: typing.Optional[int] = None full: typing.Optional[bool] = None cardData: typing.Optional[bool] = None fetch_config: typing.Optional[bool] = None use_auth_token: typing.Union[bool, str, NoneType] = None )
Parameters
-
filter (ModelFilter or
str
orIterable
, optional) — A string or ModelFilter which can be used to identify models on the Hub. -
author (
str
, optional) — A string which identify the author (user or organization) of the returned models -
search (
str
, optional) — A string that will be contained in the returned models Example usage: -
emissions_thresholds (
Tuple
, optional) — A tuple of two ints or floats representing a minimum and maximum carbon footprint to filter the resulting models with in grams. -
sort (
Literal["lastModified"]
orstr
, optional) — The key with which to sort the resulting models. Possible values are the properties of theModelInfo
class. -
direction (
Literal[-1]
orint
, optional) — Direction in which to sort. The value-1
sorts by descending order while all other values sort by ascending order. -
limit (
int
, optional) — The limit on the number of models fetched. Leaving this option toNone
fetches all models. -
full (
bool
, optional) — Whether to fetch all model data, including thelastModified
, thesha
, the files and thetags
. This is set toTrue
by default when using a filter. -
cardData (
bool
, optional) — Whether to grab the metadata for the model as well. Can contain useful information such as carbon emissions, metrics, and datasets trained on. -
fetch_config (
bool
, optional) — Whether to fetch the model configs as well. This is not included infull
due to its size. -
use_auth_token (
bool
orstr
, optional) — Whether to use theauth_token
provided from thehuggingface_hub
cli. If not logged in, a validauth_token
can be passed in as a string.
Get the public list of all the models on huggingface.co
Example usage with the filter
argument:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all models
>>> api.list_models()
>>> # Get all valid search arguments
>>> args = ModelSearchArguments()
>>> # List only the text classification models
>>> api.list_models(filter="text-classification")
>>> # Using the `ModelFilter`
>>> filt = ModelFilter(task="text-classification")
>>> # With `ModelSearchArguments`
>>> filt = ModelFilter(task=args.pipeline_tags.TextClassification)
>>> api.list_models(filter=filt)
>>> # Using `ModelFilter` and `ModelSearchArguments` to find text classification in both PyTorch and TensorFlow
>>> filt = ModelFilter(
... task=args.pipeline_tags.TextClassification,
... library=[args.library.PyTorch, args.library.TensorFlow],
... )
>>> api.list_models(filter=filt)
>>> # List only models from the AllenNLP library
>>> api.list_models(filter="allennlp")
>>> # Using `ModelFilter` and `ModelSearchArguments`
>>> filt = ModelFilter(library=args.library.allennlp)
Example usage with the search
argument:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all models with "bert" in their name
>>> api.list_models(search="bert")
>>> # List all models with "bert" in their name made by google
>>> api.list_models(search="bert", author="google")
list_repo_files
< source >(
repo_id: str
revision: typing.Optional[str] = None
repo_type: typing.Optional[str] = None
token: typing.Optional[str] = None
timeout: typing.Optional[float] = None
)
→
List[str]
Parameters
-
repo_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
. -
revision (
str
, optional) — The revision of the model repository from which to get the information. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if uploading to a dataset or space,None
or"model"
if uploading to a model. Default isNone
. -
token (
str
, optional) — An authentication token [1]_. -
timeout (
float
, optional) — Whether to set a timeout for the request to the Hub.
Returns
List[str]
the list of files in a given repository.
Get the list of files in a given repo.
References:
login
< source >(
username: str
password: str
)
→
str
Call HF API to sign in a user and get a token if credentials are valid.
Warning: Deprecated, will be removed in v0.7. Please use HfApi.set_access_token() instead.
Raises the following errors:
HTTPError
if credentials are invalid
logout
< source >( token: typing.Optional[str] = None )
Call HF API to log out.
Warning: Deprecated, will be removed in v0.7. Please use HfApi.unset_access_token() instead.
model_info
< source >(
repo_id: str
revision: typing.Optional[str] = None
token: typing.Optional[str] = None
timeout: typing.Optional[float] = None
securityStatus: typing.Optional[bool] = None
)
→
ModelInfo
Parameters
-
repo_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
. -
revision (
str
, optional) — The revision of the model repository from which to get the information. -
token (
str
, optional) — An authentication token [1]_. -
timeout (
float
, optional) — Whether to set a timeout for the request to the Hub. -
securityStatus (
bool
, optional) — Whether to retrieve the security status from the model repository as well.
Returns
ModelInfo
The model repository information.
Get info on one specific model on huggingface.co
Model can be private if you pass an acceptable token or are logged in.
References:
move_repo
< source >( from_id: str to_id: str repo_type: typing.Optional[str] = None token: typing.Optional[str] = None )
Parameters
-
from_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
. Original repository identifier. -
to_id (
str
) — A namespace (user or an organization) and a repo name separated by a/
. Final repository identifier. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if uploading to a dataset or space,None
or"model"
if uploading to a model. Default isNone
. -
token (
str
, optional) — An authentication token [1]_.
Moving a repository from namespace1/repo_name1 to namespace2/repo_name2
Note there are certain limitations. For more information about moving repositories, please see https://hf.co/docs/hub/main#how-can-i-rename-or-transfer-a-repo.
References:
set_access_token
< source >( access_token: str )
Saves the passed access token so git can correctly authenticate the user.
Resets the user’s access token.
update_repo_visibility
< source >( repo_id: str = None private: bool = False token: typing.Optional[str] = None organization: typing.Optional[str] = None repo_type: typing.Optional[str] = None name: str = None )
Parameters
-
repo_id (
str
, optional) — A namespace (user or an organization) and a repo name separated by a/
.Version added: 0.5
-
private (
bool
, optional, defaults toFalse
) — Whether the model repo should be private. -
token (
str
, optional) — An authentication token [1]_. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if uploading to a dataset or space,None
or"model"
if uploading to a model. Default isNone
.
Update the visibility setting of a repository.
References:
upload_file
< source >(
path_or_fileobj: typing.Union[str, bytes, typing.IO]
path_in_repo: str
repo_id: str
token: typing.Optional[str] = None
repo_type: typing.Optional[str] = None
revision: typing.Optional[str] = None
identical_ok: bool = True
)
→
str
Parameters
-
path_or_fileobj (
str
,bytes
, orIO
) — Path to a file on the local machine or binary data stream / fileobj / buffer. -
path_in_repo (
str
) — Relative filepath in the repo, for example:"checkpoints/1fec34a/weights.bin"
-
repo_id (
str
) — The repository to which the file will be uploaded, for example:"username/custom_transformers"
-
token (
str
, optional) — Authentication token, obtained withHfApi.login
method. Will default to the stored token. -
repo_type (
str
, optional) — Set to"dataset"
or"space"
if uploading to a dataset or space,None
or"model"
if uploading to a model. Default isNone
. -
revision (
str
, optional) — The git revision to commit from. Defaults to the head of the"main"
branch. -
identical_ok (
bool
, optional, defaults toTrue
) — When set to false, will raise an HTTPError when the file you’re trying to upload already exists on the hub and its content did not change.
Returns
str
The URL to visualize the uploaded file on the hub
Upload a local file (up to 5GB) to the given repo. The upload is done through a HTTP post request, and doesn’t require git or git-lfs to be installed.
Raises the following errors:
HTTPError
if the HuggingFace API returned an errorValueError
if some parameter value is invalid
Example usage:
>>> with open("./local/filepath", "rb") as fobj:
... upload_file(
... path_or_fileobj=fileobj,
... path_in_repo="remote/file/path.h5",
... repo_id="username/my-dataset",
... repo_type="datasets",
... token="my_token",
... )
"https://huggingface.co/datasets/username/my-dataset/blob/main/remote/file/path.h5"
>>> upload_file(
... path_or_fileobj=".\\local\\file\\path",
... path_in_repo="remote/file/path.h5",
... repo_id="username/my-model",
... token="my_token",
... )
"https://huggingface.co/username/my-model/blob/main/remote/file/path.h5"
whoami
< source >( token: typing.Optional[str] = None )
Call HF API to know “whoami”.
Hugging Face local storage
huggingface_hub
stores the authentication information locally so that it may be re-used in subsequent
methods.
It does this using the HfFolder utility, which saves data at the root of the user.
Deletes the token from storage. Does not fail if token does not exist.
Retrieves the token
Save token, creating folder as needed.
Filtering helpers
Some helpers to filter repositories on the Hub are available in the huggingface_hub
package.
class huggingface_hub.DatasetFilter
< source >( author: str = None benchmark: typing.Union[str, typing.List[str]] = None dataset_name: str = None language_creators: typing.Union[str, typing.List[str]] = None languages: typing.Union[str, typing.List[str]] = None multilinguality: typing.Union[str, typing.List[str]] = None size_categories: typing.Union[str, typing.List[str]] = None task_categories: typing.Union[str, typing.List[str]] = None task_ids: typing.Union[str, typing.List[str]] = None )
Parameters
-
author (
str
, optional) — A string or list of strings that can be used to identify datasets on the Hub by the original uploader (author or organization), such asfacebook
orhuggingface
. -
benchmark (
str
orList
, optional) — A string or list of strings that can be used to identify datasets on the Hub by their official benchmark. -
dataset_name (
str
, optional) — A string or list of strings that can be used to identify datasets on the Hub by its name, such asSQAC
orwikineural
-
language_creators (
str
orList
, optional) — A string or list of strings that can be used to identify datasets on the Hub with how the data was curated, such ascrowdsourced
ormachine_generated
. -
languages (
str
orList
, optional) — A string or list of strings representing a two-character language to filter datasets by on the Hub. -
multilinguality (
str
orList
, optional) — A string or list of strings representing a filter for datasets that contain multiple languages. -
size_categories (
str
orList
, optional) — A string or list of strings that can be used to identify datasets on the Hub by the size of the dataset such as100K<n<1M
or1M<n<10M
. -
task_categories (
str
orList
, optional) — A string or list of strings that can be used to identify datasets on the Hub by the designed task, such asaudio_classification
ornamed_entity_recognition
. -
task_ids (
str
orList
, optional) — A string or list of strings that can be used to identify datasets on the Hub by the specific task such asspeech_emotion_recognition
orparaphrase
.
A class that converts human-readable dataset search parameters into ones compatible with the REST API. For all parameters capitalization does not matter.
Examples:
>>> from huggingface_hub import DatasetFilter
>>> # Using author
>>> new_filter = DatasetFilter(author="facebook")
>>> # Using benchmark
>>> new_filter = DatasetFilter(benchmark="raft")
>>> # Using dataset_name
>>> new_filter = DatasetFilter(dataset_name="wikineural")
>>> # Using language_creator
>>> new_filter = DatasetFilter(language_creator="crowdsourced")
>>> # Using language
>>> new_filter = DatasetFilter(language="en")
>>> # Using multilinguality
>>> new_filter = DatasetFilter(multilinguality="yes")
>>> # Using size_categories
>>> new_filter = DatasetFilter(size_categories="100K<n<1M")
>>> # Using task_categories
>>> new_filter = DatasetFilter(task_categories="audio_classification")
>>> # Using task_ids
>>> new_filter = DatasetFilter(task_ids="paraphrase")
class huggingface_hub.ModelFilter
< source >( author: str = None library: typing.Union[str, typing.List[str]] = None language: typing.Union[str, typing.List[str]] = None model_name: str = None task: typing.Union[str, typing.List[str]] = None trained_dataset: typing.Union[str, typing.List[str]] = None tags: typing.Union[str, typing.List[str]] = None )
Parameters
-
author (
str
, optional) — A string that can be used to identify models on the Hub by the original uploader (author or organization), such asfacebook
orhuggingface
. -
library (
str
orList
, optional) — A string or list of strings of foundational libraries models were originally trained from, such as pytorch, tensorflow, or allennlp. -
language (
str
orList
, optional) — A string or list of strings of languages, both by name and country code, such as “en” or “English” -
model_name (
str
, optional) — A string that contain complete or partial names for models on the Hub, such as “bert” or “bert-base-cased” -
task (
str
orList
, optional) — A string or list of strings of tasks models were designed for, such as: “fill-mask” or “automatic-speech-recognition” -
tags (
str
orList
, optional) — A string tag or a list of tags to filter models on the Hub by, such astext-generation
orspacy
. -
trained_dataset (
str
orList
, optional) — A string tag or a list of string tags of the trained dataset for a model on the Hub.
A class that converts human-readable model search parameters into ones compatible with the REST API. For all parameters capitalization does not matter.
>>> from huggingface_hub import ModelFilter
>>> # For the author_or_organization
>>> new_filter = ModelFilter(author_or_organization="facebook")
>>> # For the library
>>> new_filter = ModelFilter(library="pytorch")
>>> # For the language
>>> new_filter = ModelFilter(language="french")
>>> # For the model_name
>>> new_filter = ModelFilter(model_name="bert")
>>> # For the task
>>> new_filter = ModelFilter(task="text-classification")
>>> # Retrieving tags using the `HfApi.get_model_tags` method
>>> from huggingface_hub import HfApi
>>> api = HfApi()
# To list model tags
>>> api.get_model_tags()
# To list dataset tags
>>> api.get_dataset_tags()
>>> new_filter = ModelFilter(tags="benchmark:raft")
>>> # Related to the dataset
>>> new_filter = ModelFilter(trained_dataset="common_voice")