Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

merve

posted an update 2 days ago

Post

2413

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb

davidberenstein1957

posted an update about 24 hours ago

Post

826

🌟 Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

🖼 Image Field: Seamlessly work with multimodal datasets
🌓 Dark Mode: Reduce eye strain with our sleek new look
🤗 Enhanced Hugging Face Hub import with the SDK
🇪🇸 Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0

m-ric

posted an update 1 day ago

Post

774

🤯 𝗔 𝗻𝗲𝘄 𝟳𝟬𝗕 𝗼𝗽𝗲𝗻-𝘄𝗲𝗶𝗴𝗵𝘁𝘀 𝗟𝗟𝗠 𝗯𝗲𝗮𝘁𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝟯.𝟱-𝗦𝗼𝗻𝗻𝗲𝘁 𝗮𝗻𝗱 𝗚𝗣𝗧-𝟰𝗼!

@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?

Even better: inside of that, you could nest other sections, to reflect critically on previous output. Let’s name this part <reflection>. Planning is also put in a separate step.

He named the method “Reflection tuning” and set out to fine-tune a Llama-3.1-70B with it.

Well it turns out, it works mind-boggingly well!

🤯 Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!

𝗧𝗟;𝗗𝗥
🥊 This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
⏰ 405B in training, coming soon
📚 Report coming next week
⚙️ Uses GlaiveAI synthetic data
🤗 Available on HF!

I’m starting an Inference Endpoint right now for this model to give it a spin!

Check it out 👉 mattshumer/Reflection-Llama-3.1-70B

bartowski

posted an update about 13 hours ago

Post

538

Reposting from twitter:

Just so you all know, I'll be on vacation for the following two weeks and away from home! I'm hoping to get on at least once a day to load up some quants, but I won't be as bleeding edge and on the ball :) feel free to shoot me a message if you see one I should make!

In the meantime if you need something bleeding edge make sure to check out @MaziyarPanahi or @bullerwins who both put out great work!

1 reply

gabrielmbmb

posted an update about 20 hours ago

Post

527

Yesterday @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.

clem

posted an update about 22 hours ago

Post

633

"LLM inference at scale with TGI". Cool blogpost: https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi

Well done
@martinigoyanes @rafa-hernandez @Vidusharma @frisokingma @hannahwright @jeanmarcs @antonioramos & the whole https://huggingface.co/adyen team. Could be useful to cross-post here: https://huggingface.co/blog/community

2 replies

bartowski

posted an update 1 day ago

Post

1365

Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few)

The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16

Had claude make me a script, using the new Reflection-70B, and these are the results:

Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%

0.03%!!!!

A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)

This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.

Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.

5 replies

rwightman

posted an update 1 day ago

Post

851

The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?

merve

posted an update 2 days ago

Post

1969

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗

Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲

How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝

This is much faster + you do not lose out on any information + much easier to maintain too! 🥳

Multimodal RAG merve/multimodal-rag-66d97602e781122aae0a5139 💬
Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

m-ric

posted an update 3 days ago

Post

2007

🥳 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀 𝗔𝗴𝗲𝗻𝘁𝘀 𝗻𝗼𝘄 𝘀𝘂𝗽𝗽𝗼𝗿𝘁𝘀 𝗠𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺𝘀!

Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.

You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)

To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.

Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks 🚀

Read more in the doc 👉 https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md

Checkout an advanced multi-agent system that tops the GAIA leaderboard 👉 https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py

Recently active users