text-generation-inference documentation


Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started


TGI exposes multiple metrics that can be collected via the /metrics Prometheus endpoint. These metrics can be used to monitor the performance of TGI, autoscale deployment and to help identify bottlenecks.

The following metrics are exposed:

Metric Name Description Type Unit
tgi_batch_current_max_tokens Maximum tokens for the current batch Gauge Count
tgi_batch_current_size Current batch size Gauge Count
tgi_batch_decode_duration Time spent decoding a batch per method (prefill or decode) Histogram Seconds
tgi_batch_filter_duration Time spent filtering batches and sending generated tokens per method (prefill or decode) Histogram Seconds
tgi_batch_forward_duration Batch forward duration per method (prefill or decode) Histogram Seconds
tgi_batch_inference_count Inference calls per method (prefill or decode) Counter Count
tgi_batch_inference_duration Batch inference duration Histogram Seconds
tgi_batch_inference_success Number of successful inference calls per method (prefill or decode) Counter Count
tgi_batch_next_size Batch size of the next batch Histogram Count
tgi_queue_size Current queue size Gauge Count
tgi_request_count Total number of requests Counter Count
tgi_request_duration Total time spent processing the request (e2e latency) Histogram Seconds
tgi_request_generated_tokens Generated tokens per request Histogram Count
tgi_request_inference_duration Request inference duration Histogram Seconds
tgi_request_input_length Input token length per request Histogram Count
tgi_request_max_new_tokens Maximum new tokens per request Histogram Count
tgi_request_mean_time_per_token_duration Mean time per token per request (inter-token latency) Histogram Seconds
tgi_request_queue_duration Time spent in the queue per request Histogram Seconds
tgi_request_skipped_tokens Speculated tokens per request Histogram Count
tgi_request_success Number of successful requests Counter
tgi_request_validation_duration Time spent validating the request Histogram Seconds
< > Update on GitHub