Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

merveย 
posted an update 2 days ago
view post
Post
2413
I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval ๐Ÿ“– it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation ๐Ÿ’ฌ directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new ๐Ÿญ Byaldi library by @bclavie ๐Ÿค—
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb
davidberenstein1957ย 
posted an update about 24 hours ago
view post
Post
826
๐ŸŒŸ Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

๐Ÿ–ผ Image Field: Seamlessly work with multimodal datasets
๐ŸŒ“ Dark Mode: Reduce eye strain with our sleek new look
๐Ÿค— Enhanced Hugging Face Hub import with the SDK
๐Ÿ‡ช๐Ÿ‡ธ Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0
m-ricย 
posted an update 1 day ago
view post
Post
774
๐Ÿคฏ ๐—” ๐—ป๐—ฒ๐˜„ ๐Ÿณ๐Ÿฌ๐—• ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜๐˜€ ๐—Ÿ๐—Ÿ๐—  ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐Ÿฏ.๐Ÿฑ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—ฎ๐—ป๐—ฑ ๐—š๐—ฃ๐—ง-๐Ÿฐ๐—ผ!

@mattshumer , CEO from Hyperwrite AI, had an idea he wanted to try out: why not fine-tune LLMs to always output their thoughts in specific parts, delineated by <thinking> tags?

Even better: inside of that, you could nest other sections, to reflect critically on previous output. Letโ€™s name this part <reflection>. Planning is also put in a separate step.

He named the method โ€œReflection tuningโ€ and set out to fine-tune a Llama-3.1-70B with it.

Well it turns out, it works mind-boggingly well!

๐Ÿคฏ Reflection-70B beats GPT-4o, Sonnet-3.5, and even the much bigger Llama-3.1-405B!

๐—ง๐—Ÿ;๐——๐—ฅ
๐ŸฅŠ This new 70B open-weights model beats GPT-4o, Claude Sonnet, et al.
โฐ 405B in training, coming soon
๐Ÿ“š Report coming next week
โš™๏ธ Uses GlaiveAI synthetic data
๐Ÿค— Available on HF!

Iโ€™m starting an Inference Endpoint right now for this model to give it a spin!

Check it out ๐Ÿ‘‰ mattshumer/Reflection-Llama-3.1-70B
bartowskiย 
posted an update about 13 hours ago
view post
Post
538
Reposting from twitter:

Just so you all know, I'll be on vacation for the following two weeks and away from home! I'm hoping to get on at least once a day to load up some quants, but I won't be as bleeding edge and on the ball :) feel free to shoot me a message if you see one I should make!

In the meantime if you need something bleeding edge make sure to check out @MaziyarPanahi or @bullerwins who both put out great work!
  • 1 reply
ยท
gabrielmbmbย 
posted an update about 20 hours ago
view post
Post
527
Yesterday ย  @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie ๐Ÿฆ in combination with meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think ๐Ÿ’ญ and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.

clemย 
posted an update about 22 hours ago
bartowskiย 
posted an update 1 day ago
view post
Post
1365
Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few)

The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16

Had claude make me a script, using the new Reflection-70B, and these are the results:

Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%

0.03%!!!!

A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)

This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.

Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.
ยท
rwightmanย 
posted an update 1 day ago
view post
Post
851
The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
merveย 
posted an update 2 days ago
view post
Post
1969
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try ๐Ÿค—

Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. ๐Ÿฅฒ

How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. ๐Ÿค

This is much faster + you do not lose out on any information + much easier to maintain too! ๐Ÿฅณ

Multimodal RAG merve/multimodal-rag-66d97602e781122aae0a5139 ๐Ÿ’ฌ
Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e ๐Ÿ“–
m-ricย 
posted an update 3 days ago
view post
Post
2007
๐Ÿฅณ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€ ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐—ป๐—ผ๐˜„ ๐˜€๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜๐˜€ ๐— ๐˜‚๐—น๐˜๐—ถ-๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€!

Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.

You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)

To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.

Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks ๐Ÿš€

Read more in the doc ๐Ÿ‘‰ https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md

Checkout an advanced multi-agent system that tops the GAIA leaderboard ๐Ÿ‘‰ https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py