Evaluate documentation

Loading methods

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Loading methods

Methods for listing and loading evaluation modules:

List

evaluate.list_evaluation_modules

< >

( module_type = None include_community = True with_details = False )

Parameters

  • module_type (str, optional, default None) — Type of evaluation modules to list. Has to be one of 'metric', 'comparison', or 'measurement'. If None, all types are listed.
  • include_community (bool, optional, default True) — Include community modules in the list.
  • with_details (bool, optional, default False) — Return the full details on the metrics instead of only the ID.

List all evaluation modules available on the Hugging Face Hub.

Load

evaluate.load

< >

( path: str config_name: typing.Optional[str] = None module_type: typing.Optional[str] = None process_id: int = 0 num_process: int = 1 cache_dir: typing.Optional[str] = None experiment_id: typing.Optional[str] = None keep_in_memory: bool = False download_config: typing.Optional[evaluate.utils.file_utils.DownloadConfig] = None download_mode: typing.Optional[datasets.download.download_manager.DownloadMode] = None revision: typing.Union[str, datasets.utils.version.Version, NoneType] = None **init_kwargs )

Parameters

  • path (str) — path to the evaluation processing script with the evaluation builder. Can be either:
    • a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. './metrics/rouge' or './metrics/rouge/rouge.py'
    • a evaluation module identifier on the HuggingFace evaluate repo e.g. 'rouge' or 'bleu' that are in either 'metrics/', 'comparisons/', or 'measurements/' depending on the provided module_type.
  • config_name (str, optional) — selecting a configuration for the metric (e.g. the GLUE metric has a configuration for each subset)
  • module_type (str, default 'metric') — type of evaluation module, can be one of 'metric', 'comparison', or 'measurement'.
  • process_id (int, optional) — for distributed evaluation: id of the process
  • num_process (int, optional) — for distributed evaluation: total number of processes
  • cache_dir (Optional str) — path to store the temporary predictions and references (default to ~/.cache/huggingface/evaluate/)
  • experiment_id (str) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute metrics in distributed setups (in particular non-additive metrics like F1).
  • keep_in_memory (bool) — Whether to store the temporary results in memory (defaults to False)
  • download_config (Optional evaluate.DownloadConfig — specific download configuration parameters.
  • download_mode (DownloadMode, default REUSE_DATASET_IF_EXISTS) — Download/generate mode.
  • revision (Optional Union[str, evaluate.Version]) — if specified, the module will be loaded from the datasets repository at this version. By default it is set to the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues.

Load a evaluate.EvaluationModule.