267 14 60

Joao Gante

joaogante

https://github.com/gante

AI & ML interests

None yet

Articles

Organizations

Posts 3

Post

2783

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥

Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! 🥬

Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?
👉 High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation)
👉 No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)

You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.

Kudos to @kalomaze and @menhguin for creating this technique 🔥 Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670)

Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd

Have fun experimenting! 😎

Post

2565

Adding a long prompt can help you fight LLM hallucinations. However, if you know exactly how you want your LLM output constrained, there are much better strategies! 💪

Did you know you can force your LLM to ALWAYS generate a valid JSON file? Or to follow a well-defined answer template? You can do that and more with the 🤗 transformers-compatible outlines library.

It doesn't only allow you to master your LLM -- your text generation application will also become faster! 🔥 The more constrained your text generation is, the bigger speedups you'll see!

Follow @remi and other outlines folks to stay on top of the constrained generation game 🧠

View all posts

spaces 6

Running

📊

models 8

joaogante/dummy_synthid_detector

Updated 29 days ago • 174

joaogante/tiny-random-gpt2-with-generation-config

Updated Mar 7 • 159

joaogante/Mistral-7B-Instruct-v0.2-medusa-wikitext

Updated Jan 7 • 4

joaogante/TinyLlama-1.1B-Chat-v1.0-medusa-wikitext

Updated Jan 6

joaogante/Mistral-7B-Instruct-v0.2-medusa-vicuna

Updated Jan 5

joaogante/test_audio

Automatic Speech Recognition • Updated Sep 13, 2023 • 21

joaogante/test_text

Fill-Mask • Updated Jun 15, 2022 • 17

Joao Gante

AI & ML interests

Articles

Universal Assisted Generation: Faster Decoding with Any Assistant Model

Introducing SynthID Text

Faster Assisted Generation with Dynamic Speculation

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Code Llama: Llama 2 learns to code

Assisted Generation: a new direction toward low-latency text generation

Faster Text Generation with TensorFlow and XLA

Organizations

Posts 3

spaces 6

Color Coded Text Generation

Assisted Generation Benchmarks

Tf Xla Generate Benchmarks

Assisted Generation Demo

Medusa Maker

Generate Quality Improvement

models 8

joaogante/dummy_synthid_detector

joaogante/tiny-random-gpt2-with-generation-config

joaogante/Mistral-7B-Instruct-v0.2-medusa-wikitext

joaogante/TinyLlama-1.1B-Chat-v1.0-medusa-wikitext

joaogante/Mistral-7B-Instruct-v0.2-medusa-vicuna

joaogante/test_audio

joaogante/test_text

joaogante/test_img

datasets 1

joaogante/assisted_generation

Joao Gante

AI & ML interests

Articles

Universal Assisted Generation: Faster Decoding with Any Assistant Model

Introducing SynthID Text

Faster Assisted Generation with Dynamic Speculation

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Code Llama: Llama 2 learns to code

Assisted Generation: a new direction toward low-latency text generation

Faster Text Generation with TensorFlow and XLA

Organizations

Posts 3

spaces 6 Sort: Recently updated

Color Coded Text Generation

Assisted Generation Benchmarks

Tf Xla Generate Benchmarks

Assisted Generation Demo

Medusa Maker

Generate Quality Improvement

models 8 Sort: Recently updated

datasets 1

spaces 6

models 8