Hub Python Library documentation

Hugging Face Hub API

You are viewing v0.5.1 version. A newer version v0.26.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Hugging Face Hub API

Below is the documentation for the HfApi class, which serves as a Python wrapper for the Hugging Face Hub’s API.

All methods from the HfApi are also accessible from the package’s root directly, both approaches are detailed below.

The following approach uses the method from the root of the package:

from huggingface_hub import list_models

models = list_models()

The following approach uses the HfApi class:

from huggingface_hub import HfApi

hf_api = HfApi()
models = hf_api.list_models()

Using the HfApi class directly enables you to set a different endpoint to that of the Hugging Face’s Hub.

class huggingface_hub.HfApi

< >

( endpoint = None )

create_repo

< >

( repo_id: str = None token: typing.Optional[str] = None organization: typing.Optional[str] = None private: typing.Optional[bool] = None repo_type: typing.Optional[str] = None exist_ok: typing.Optional[bool] = False space_sdk: typing.Optional[str] = None name: typing.Optional[str] = None ) str

Parameters

  • repo_id (str) — A namespace (user or an organization) and a repo name separated by a /.

    Version added: 0.5

  • token (str, optional) — An authentication token [1]_.
  • private (bool, optional) — Whether the model repo should be private.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • exist_ok (bool, optional, defaults to False) — If True, do not raise an error if repo already exists.
  • space_sdk (str, optional) — Choice of SDK to use if repo_type is “space”. Can be “streamlit”, “gradio”, or “static”.

Returns

str

URL to the newly created repo.

Create an empty repo on the HuggingFace Hub.

References:

dataset_info

< >

( repo_id: str revision: typing.Optional[str] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None ) DatasetInfo

Parameters

  • repo_id (str) — A namespace (user or an organization) and a repo name separated by a /.
  • revision (str, optional) — The revision of the dataset repository from which to get the information.
  • token (str, optional) — An authentication token [1]_.
  • timeout (float, optional) — Whether to set a timeout for the request to the Hub.

Returns

DatasetInfo

The dataset repository information.

Get info on one specific dataset on huggingface.co

Dataset can be private if you pass an acceptable token.

References:

delete_file

< >

( path_in_repo: str repo_id: str token: typing.Optional[str] = None repo_type: typing.Optional[str] = None revision: typing.Optional[str] = None )

Parameters

  • path_in_repo (str) — Relative filepath in the repo, for example: "checkpoints/1fec34a/weights.bin"
  • repo_id (str) — The repository from which the file will be deleted, for example: "username/custom_transformers"
  • token (str, optional) — Authentication token, obtained with HfApi.login method. Will default to the stored token.
  • repo_type (str, optional) — Set to "dataset" or "space" if the file is in a dataset or space, None or "model" if in a model. Default is None.
  • revision (str, optional) — The git revision to commit from. Defaults to the head of the "main" branch.

Deletes a file in the given repo.

Raises the following errors:

  • HTTPError if the HuggingFace API returned an error
  • ValueError if some parameter value is invalid

delete_repo

< >

( repo_id: str = None token: typing.Optional[str] = None organization: typing.Optional[str] = None repo_type: typing.Optional[str] = None name: str = None )

Parameters

  • repo_id (str) — A namespace (user or an organization) and a repo name separated by a /.

    Version added: 0.5

  • token (str, optional) — An authentication token [1]_.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model.

Delete a repo from the HuggingFace Hub. CAUTION: this is irreversible.

References:

get_dataset_tags

< >

( )

Gets all valid dataset tags as a nested namespace object.

get_full_repo_name

< >

( model_id: str organization: typing.Optional[str] = None token: typing.Optional[str] = None ) str

Parameters

  • model_id (str) — The name of the model.
  • organization (str, optional) — If passed, the repository name will be in the organization namespace instead of the user namespace.
  • token (str, optional) — The Hugging Face authentication token

Returns

str

The repository name in the user’s namespace ({username}/{model_id}) if no organization is passed, and under the organization namespace ({organization}/{model_id}) otherwise.

Returns the repository name for a given model ID and optional organization.

get_model_tags

< >

( )

Gets all valid model tags as a nested namespace object

list_datasets

< >

( filter: typing.Union[huggingface_hub.utils.endpoint_helpers.DatasetFilter, str, typing.Iterable[str], NoneType] = None author: typing.Optional[str] = None search: typing.Optional[str] = None sort: typing.Union[typing.Literal['lastModified'], str, NoneType] = None direction: typing.Optional[typing.Literal[-1]] = None limit: typing.Optional[int] = None cardData: typing.Optional[bool] = None full: typing.Optional[bool] = None use_auth_token: typing.Optional[str] = None )

Parameters

  • filter (DatasetFilter or str or Iterable, optional) — A string or DatasetFilter which can be used to identify datasets on the hub.
  • author (str, optional) — A string which identify the author of the returned models
  • search (str, optional) — A string that will be contained in the returned models.
  • sort (Literal["lastModified"] or str, optional) — The key with which to sort the resulting datasets. Possible values are the properties of the DatasetInfo class.
  • direction (Literal[-1] or int, optional) — Direction in which to sort. The value -1 sorts by descending order while all other values sort by ascending order.
  • limit (int, optional) — The limit on the number of datasets fetched. Leaving this option to None fetches all datasets.
  • cardData (bool, optional) — Whether to grab the metadata for the dataset as well. Can contain useful information such as the PapersWithCode ID.
  • full (bool, optional) — Whether to fetch all dataset data, including the lastModified and the cardData.
  • use_auth_token (bool or str, optional) — Whether to use the auth_token provided from the huggingface_hub cli. If not logged in, a valid auth_token can be passed in as a string.

Get the public list of all the datasets on huggingface.co

Example usage with the filter argument:

>>> from huggingface_hub import HfApi

>>> api = HfApi()

>>> # List all datasets
>>> api.list_datasets()

>>> # Get all valid search arguments
>>> args = DatasetSearchArguments()

>>> # List only the text classification datasets
>>> api.list_datasets(filter="task_categories:text-classification")
>>> # Using the `DatasetFilter`
>>> filt = DatasetFilter(task_categories="text-classification")
>>> # With `DatasetSearchArguments`
>>> filt = DatasetFilter(task=args.task_categories.text_classification)
>>> api.list_models(filter=filt)

>>> # List only the datasets in russian for language modeling
>>> api.list_datasets(
...     filter=("languages:ru", "task_ids:language-modeling")
... )
>>> # Using the `DatasetFilter`
>>> filt = DatasetFilter(languages="ru", task_ids="language-modeling")
>>> # With `DatasetSearchArguments`
>>> filt = DatasetFilter(
...     languages=args.languages.ru,
...     task_ids=args.task_ids.language_modeling,
... )
>>> api.list_datasets(filter=filt)

Example usage with the search argument:

>>> from huggingface_hub import HfApi

>>> api = HfApi()

>>> # List all datasets with "text" in their name
>>> api.list_datasets(search="text")

>>> # List all datasets with "text" in their name made by google
>>> api.list_datasets(search="text", author="google")

list_metrics

< >

( ) List[MetricInfo]

Returns

List[MetricInfo]

a list of MetricInfo objects which.

Get the public list of all the metrics on huggingface.co

list_models

< >

( filter: typing.Union[huggingface_hub.utils.endpoint_helpers.ModelFilter, str, typing.Iterable[str], NoneType] = None author: typing.Optional[str] = None search: typing.Optional[str] = None emissions_thresholds: typing.Union[typing.Tuple[float, float], NoneType] = None sort: typing.Union[typing.Literal['lastModified'], str, NoneType] = None direction: typing.Optional[typing.Literal[-1]] = None limit: typing.Optional[int] = None full: typing.Optional[bool] = None cardData: typing.Optional[bool] = None fetch_config: typing.Optional[bool] = None use_auth_token: typing.Union[bool, str, NoneType] = None )

Parameters

  • filter (ModelFilter or str or Iterable, optional) — A string or ModelFilter which can be used to identify models on the Hub.
  • author (str, optional) — A string which identify the author (user or organization) of the returned models
  • search (str, optional) — A string that will be contained in the returned models Example usage:
  • emissions_thresholds (Tuple, optional) — A tuple of two ints or floats representing a minimum and maximum carbon footprint to filter the resulting models with in grams.
  • sort (Literal["lastModified"] or str, optional) — The key with which to sort the resulting models. Possible values are the properties of the ModelInfo class.
  • direction (Literal[-1] or int, optional) — Direction in which to sort. The value -1 sorts by descending order while all other values sort by ascending order.
  • limit (int, optional) — The limit on the number of models fetched. Leaving this option to None fetches all models.
  • full (bool, optional) — Whether to fetch all model data, including the lastModified, the sha, the files and the tags. This is set to True by default when using a filter.
  • cardData (bool, optional) — Whether to grab the metadata for the model as well. Can contain useful information such as carbon emissions, metrics, and datasets trained on.
  • fetch_config (bool, optional) — Whether to fetch the model configs as well. This is not included in full due to its size.
  • use_auth_token (bool or str, optional) — Whether to use the auth_token provided from the huggingface_hub cli. If not logged in, a valid auth_token can be passed in as a string.

Get the public list of all the models on huggingface.co

Example usage with the filter argument:

>>> from huggingface_hub import HfApi

>>> api = HfApi()

>>> # List all models
>>> api.list_models()

>>> # Get all valid search arguments
>>> args = ModelSearchArguments()

>>> # List only the text classification models
>>> api.list_models(filter="text-classification")
>>> # Using the `ModelFilter`
>>> filt = ModelFilter(task="text-classification")
>>> # With `ModelSearchArguments`
>>> filt = ModelFilter(task=args.pipeline_tags.TextClassification)
>>> api.list_models(filter=filt)

>>> # Using `ModelFilter` and `ModelSearchArguments` to find text classification in both PyTorch and TensorFlow
>>> filt = ModelFilter(
...     task=args.pipeline_tags.TextClassification,
...     library=[args.library.PyTorch, args.library.TensorFlow],
... )
>>> api.list_models(filter=filt)

>>> # List only models from the AllenNLP library
>>> api.list_models(filter="allennlp")
>>> # Using `ModelFilter` and `ModelSearchArguments`
>>> filt = ModelFilter(library=args.library.allennlp)

Example usage with the search argument:

>>> from huggingface_hub import HfApi

>>> api = HfApi()

>>> # List all models with "bert" in their name
>>> api.list_models(search="bert")

>>> # List all models with "bert" in their name made by google
>>> api.list_models(search="bert", author="google")

list_repo_files

< >

( repo_id: str revision: typing.Optional[str] = None repo_type: typing.Optional[str] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None ) List[str]

Parameters

  • repo_id (str) — A namespace (user or an organization) and a repo name separated by a /.
  • revision (str, optional) — The revision of the model repository from which to get the information.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • token (str, optional) — An authentication token [1]_.
  • timeout (float, optional) — Whether to set a timeout for the request to the Hub.

Returns

List[str]

the list of files in a given repository.

Get the list of files in a given repo.

References:

login

< >

( username: str password: str ) str

Parameters

  • username (str) — The username of the account with which to login.
  • password (str) — The password of the account with which to login.

Returns

str

token if credentials are valid

Call HF API to sign in a user and get a token if credentials are valid.

Warning: Deprecated, will be removed in v0.7. Please use HfApi.set_access_token() instead.

Raises the following errors:

logout

< >

( token: typing.Optional[str] = None )

Parameters

  • token (str, optional) — Hugging Face token. Will default to the locally saved token if not provided.

Call HF API to log out.

Warning: Deprecated, will be removed in v0.7. Please use HfApi.unset_access_token() instead.

model_info

< >

( repo_id: str revision: typing.Optional[str] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None securityStatus: typing.Optional[bool] = None ) ModelInfo

Parameters

  • repo_id (str) — A namespace (user or an organization) and a repo name separated by a /.
  • revision (str, optional) — The revision of the model repository from which to get the information.
  • token (str, optional) — An authentication token [1]_.
  • timeout (float, optional) — Whether to set a timeout for the request to the Hub.
  • securityStatus (bool, optional) — Whether to retrieve the security status from the model repository as well.

Returns

ModelInfo

The model repository information.

Get info on one specific model on huggingface.co

Model can be private if you pass an acceptable token or are logged in.

References:

move_repo

< >

( from_id: str to_id: str repo_type: typing.Optional[str] = None token: typing.Optional[str] = None )

Parameters

  • from_id (str) — A namespace (user or an organization) and a repo name separated by a /. Original repository identifier.
  • to_id (str) — A namespace (user or an organization) and a repo name separated by a /. Final repository identifier.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • token (str, optional) — An authentication token [1]_.

Moving a repository from namespace1/repo_name1 to namespace2/repo_name2

Note there are certain limitations. For more information about moving repositories, please see https://hf.co/docs/hub/main#how-can-i-rename-or-transfer-a-repo.

References:

set_access_token

< >

( access_token: str )

Parameters

  • access_token (str) — The access token to save.

Saves the passed access token so git can correctly authenticate the user.

unset_access_token

< >

( )

Resets the user’s access token.

update_repo_visibility

< >

( repo_id: str = None private: bool = False token: typing.Optional[str] = None organization: typing.Optional[str] = None repo_type: typing.Optional[str] = None name: str = None )

Parameters

  • repo_id (str, optional) — A namespace (user or an organization) and a repo name separated by a /.

    Version added: 0.5

  • private (bool, optional, defaults to False) — Whether the model repo should be private.
  • token (str, optional) — An authentication token [1]_.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.

Update the visibility setting of a repository.

References:

upload_file

< >

( path_or_fileobj: typing.Union[str, bytes, typing.IO] path_in_repo: str repo_id: str token: typing.Optional[str] = None repo_type: typing.Optional[str] = None revision: typing.Optional[str] = None identical_ok: bool = True ) str

Parameters

  • path_or_fileobj (str, bytes, or IO) — Path to a file on the local machine or binary data stream / fileobj / buffer.
  • path_in_repo (str) — Relative filepath in the repo, for example: "checkpoints/1fec34a/weights.bin"
  • repo_id (str) — The repository to which the file will be uploaded, for example: "username/custom_transformers"
  • token (str, optional) — Authentication token, obtained with HfApi.login method. Will default to the stored token.
  • repo_type (str, optional) — Set to "dataset" or "space" if uploading to a dataset or space, None or "model" if uploading to a model. Default is None.
  • revision (str, optional) — The git revision to commit from. Defaults to the head of the "main" branch.
  • identical_ok (bool, optional, defaults to True) — When set to false, will raise an HTTPError when the file you’re trying to upload already exists on the hub and its content did not change.

Returns

str

The URL to visualize the uploaded file on the hub

Upload a local file (up to 5GB) to the given repo. The upload is done through a HTTP post request, and doesn’t require git or git-lfs to be installed.

Raises the following errors:

  • HTTPError if the HuggingFace API returned an error
  • ValueError if some parameter value is invalid

Example usage:

>>> with open("./local/filepath", "rb") as fobj:
...     upload_file(
...         path_or_fileobj=fileobj,
...         path_in_repo="remote/file/path.h5",
...         repo_id="username/my-dataset",
...         repo_type="datasets",
...         token="my_token",
...     )
"https://huggingface.co/datasets/username/my-dataset/blob/main/remote/file/path.h5"

>>> upload_file(
...     path_or_fileobj=".\\local\\file\\path",
...     path_in_repo="remote/file/path.h5",
...     repo_id="username/my-model",
...     token="my_token",
... )
"https://huggingface.co/username/my-model/blob/main/remote/file/path.h5"

whoami

< >

( token: typing.Optional[str] = None )

Parameters

  • token (str, optional) — Hugging Face token. Will default to the locally saved token if not provided.

Call HF API to know “whoami”.

Hugging Face local storage

huggingface_hub stores the authentication information locally so that it may be re-used in subsequent methods.

It does this using the HfFolder utility, which saves data at the root of the user.

class huggingface_hub.HfFolder

< >

( )

delete_token

< >

( )

Deletes the token from storage. Does not fail if token does not exist.

get_token

< >

( ) str or None

Returns

str or None

The token, None if it doesn’t exist.

Retrieves the token

save_token

< >

( token )

Parameters

  • token (str) — The token to save to the HfFolder

Save token, creating folder as needed.

Filtering helpers

Some helpers to filter repositories on the Hub are available in the huggingface_hub package.

class huggingface_hub.DatasetFilter

< >

( author: str = None benchmark: typing.Union[str, typing.List[str]] = None dataset_name: str = None language_creators: typing.Union[str, typing.List[str]] = None languages: typing.Union[str, typing.List[str]] = None multilinguality: typing.Union[str, typing.List[str]] = None size_categories: typing.Union[str, typing.List[str]] = None task_categories: typing.Union[str, typing.List[str]] = None task_ids: typing.Union[str, typing.List[str]] = None )

Parameters

  • author (str, optional) — A string or list of strings that can be used to identify datasets on the Hub by the original uploader (author or organization), such as facebook or huggingface.
  • benchmark (str or List, optional) — A string or list of strings that can be used to identify datasets on the Hub by their official benchmark.
  • dataset_name (str, optional) — A string or list of strings that can be used to identify datasets on the Hub by its name, such as SQAC or wikineural
  • language_creators (str or List, optional) — A string or list of strings that can be used to identify datasets on the Hub with how the data was curated, such as crowdsourced or machine_generated.
  • languages (str or List, optional) — A string or list of strings representing a two-character language to filter datasets by on the Hub.
  • multilinguality (str or List, optional) — A string or list of strings representing a filter for datasets that contain multiple languages.
  • size_categories (str or List, optional) — A string or list of strings that can be used to identify datasets on the Hub by the size of the dataset such as 100K<n<1M or 1M<n<10M.
  • task_categories (str or List, optional) — A string or list of strings that can be used to identify datasets on the Hub by the designed task, such as audio_classification or named_entity_recognition.
  • task_ids (str or List, optional) — A string or list of strings that can be used to identify datasets on the Hub by the specific task such as speech_emotion_recognition or paraphrase.

A class that converts human-readable dataset search parameters into ones compatible with the REST API. For all parameters capitalization does not matter.

Examples:

>>> from huggingface_hub import DatasetFilter

>>> # Using author
>>> new_filter = DatasetFilter(author="facebook")

>>> # Using benchmark
>>> new_filter = DatasetFilter(benchmark="raft")

>>> # Using dataset_name
>>> new_filter = DatasetFilter(dataset_name="wikineural")

>>> # Using language_creator
>>> new_filter = DatasetFilter(language_creator="crowdsourced")

>>> # Using language
>>> new_filter = DatasetFilter(language="en")

>>> # Using multilinguality
>>> new_filter = DatasetFilter(multilinguality="yes")

>>> # Using size_categories
>>> new_filter = DatasetFilter(size_categories="100K<n<1M")

>>> # Using task_categories
>>> new_filter = DatasetFilter(task_categories="audio_classification")

>>> # Using task_ids
>>> new_filter = DatasetFilter(task_ids="paraphrase")

class huggingface_hub.ModelFilter

< >

( author: str = None library: typing.Union[str, typing.List[str]] = None language: typing.Union[str, typing.List[str]] = None model_name: str = None task: typing.Union[str, typing.List[str]] = None trained_dataset: typing.Union[str, typing.List[str]] = None tags: typing.Union[str, typing.List[str]] = None )

Parameters

  • author (str, optional) — A string that can be used to identify models on the Hub by the original uploader (author or organization), such as facebook or huggingface.
  • library (str or List, optional) — A string or list of strings of foundational libraries models were originally trained from, such as pytorch, tensorflow, or allennlp.
  • language (str or List, optional) — A string or list of strings of languages, both by name and country code, such as “en” or “English”
  • model_name (str, optional) — A string that contain complete or partial names for models on the Hub, such as “bert” or “bert-base-cased”
  • task (str or List, optional) — A string or list of strings of tasks models were designed for, such as: “fill-mask” or “automatic-speech-recognition”
  • tags (str or List, optional) — A string tag or a list of tags to filter models on the Hub by, such as text-generation or spacy.
  • trained_dataset (str or List, optional) — A string tag or a list of string tags of the trained dataset for a model on the Hub.

A class that converts human-readable model search parameters into ones compatible with the REST API. For all parameters capitalization does not matter.

>>> from huggingface_hub import ModelFilter

>>> # For the author_or_organization
>>> new_filter = ModelFilter(author_or_organization="facebook")

>>> # For the library
>>> new_filter = ModelFilter(library="pytorch")

>>> # For the language
>>> new_filter = ModelFilter(language="french")

>>> # For the model_name
>>> new_filter = ModelFilter(model_name="bert")

>>> # For the task
>>> new_filter = ModelFilter(task="text-classification")

>>> # Retrieving tags using the `HfApi.get_model_tags` method
>>> from huggingface_hub import HfApi

>>> api = HfApi()
# To list model tags

>>> api.get_model_tags()
# To list dataset tags

>>> api.get_dataset_tags()
>>> new_filter = ModelFilter(tags="benchmark:raft")

>>> # Related to the dataset
>>> new_filter = ModelFilter(trained_dataset="common_voice")