Matt Valoatto PRO

mvaloatto

AI & ML interests

Image classification, image feature extraction, text classification, design, art, tech, science. ๐Ÿค— since 2016.

Recent Activity

updated a Space 20 days ago
mvaloatto/TCTF

Organizations

mvaloatto's activity

reacted to victor's post with ๐Ÿ”ฅ 7 months ago
view post
Post
4267
The hype is real: a mysterious gpt2-chatbot model has appeared on the LLM Arena Leaderboard ๐Ÿ‘€.
It seems to be at least on par with the top performing models (closed and open).

To try it out: https://chat.lmsys.org/ -> then click on the Direct Chat tab and select gpt2-chatbot.

Take your bet, what do you think it is?
ยท
reacted to clem's post with ๐Ÿค— 8 months ago
view post
Post
2531
Introducing gretelai/synthetic_text_to_sql by https://huggingface.co/gretelai

It stands as the largest and most diverse synthetic Text-to-SQL dataset available to-date.

The dataset includes:

- 105,851 records partitioned into 100,000 train and 5,851 test records
~23M total tokens, including ~12M SQL tokens
- Coverage across 100 distinct domains/verticals
- Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
- Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
- Database context, including table and view create statements
- Natural language explanations of what the SQL query is doing
- Contextual tags to optimize model training

Blogpost: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Dataset: gretelai/synthetic_text_to_sql
  • 1 reply
ยท
reacted to sayakpaul's post with ๐Ÿ”ฅ 8 months ago
view post
Post
We released ๐Ÿงจ Diffusers 0.27.0, and it's a versatile release ๐Ÿ’ซ

Among other things, we shipped:

* Stable Cascade
* Playground v2.5 and EDM-style training
* EDM-formulated schedulers
* Trajectory Consistency Distillation for accelerated sampling
* A new guide on merging LoRAs
* A new image editing pipeline -- LEDITS++

Check out the release notes to catch everything that went into the release
https://github.com/huggingface/diffusers/releases/tag/v0.27.0

Thanks to everyone that contributed to the release ๐Ÿค—
replied to their post 9 months ago
view reply

Yes, time will tell! Still, good news for the open AI ecosystem ๐Ÿ‘

posted an update 9 months ago
reacted to osanseviero's post with ๐Ÿ‘ 9 months ago
view post
Post
Diaries of Open Source. Part 3! OS goes to the moon!

๐Ÿ’ป OpenCodeInterpreter, a family of very powerful code generation models
Models: m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
Paper: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (2402.14658)
Demo m-a-p/OpenCodeInterpreter_demo

๐Ÿ”ท๐Ÿ”ถZephyr 7B Gemma, Gemma fine-tuned with the Zephyr recipe
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat
GH Repo: https://github.com/huggingface/alignment-handbook

๐Ÿช†The MixedBread folks released a 2D Matryoshka text embedding model, which means you can dynamically change the embedding size and layer counts
Model: mixedbread-ai/mxbai-embed-2d-large-v1
Release blog post: https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1

๐Ÿ‹Microsoft released Orca Math, which includes 200K grade school math problems
Dataset: microsoft/orca-math-word-problems-200k

๐ŸฅทIBM silently released Merlinite, a cool model trained on Mixtral-generated synthetic data using a novel LAB method ibm/merlinite-7b

๐ŸŒš Moondream2 - a small vision language model to run on-device!
Model: vikhyatk/moondream2
Demo: vikhyatk/moondream2

๐Ÿ™๏ธCityDreamer: 3D City Generation
Demo: hzxie/city-dreamer
Repo: https://github.com/hzxie/city-dreamer
Model: hzxie/city-dreamer

๐ŸŒML in all languages
Sailor, a family of South-East Asian languages models sail/sailor-language-models-65e19a749f978976f1959825
Samvaad dataset, which includes 140k QA pairs in Hindi, Bengali, Marathi, Tamil, Telugu, Oriya, Punjabi, and Gujarati GenVRadmin/Samvaad-Mixed-Language-2

You can see the previous part at https://huggingface.co/posts/osanseviero/674644082063278
  • 1 reply
ยท
reacted to Xenova's post with ๐Ÿ‘ 9 months ago
view post
Post
Introducing the ๐Ÿค— Transformers.js WebGPU Embedding Benchmark! โšก๏ธ
๐Ÿ‘‰ Xenova/webgpu-embedding-benchmark ๐Ÿ‘ˆ

On my device, I was able to achieve a 64.04x speedup over WASM! ๐Ÿคฏ How much does WebGPU speed up ML models running locally in your browser? Try it out and share your results! ๐Ÿš€
ยท
reacted to Tonic's post with ๐Ÿ‘ 9 months ago
view post
Post
Last day on Spaces of the Week ,
and we made it to last place on trending.
i really thought it couldnt get any better, but i'm crying ! ๐Ÿ˜ญ

The thing i like the most about ZeroGPU , import spaces , is that i dont have to always check to see if someone decided to test if i have hard character limits , and it reloads the application flawlessly .

drop a like on my spaces here :
Spaces of the Week : https://huggingface.co/spaces/tonic/starcoder2
9 other ZeroGPU demos : https://huggingface.co/tonic
reacted to akhaliq's post with โค๏ธ 9 months ago
view post
Post
PixArt-ฮฃ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

PixArt-ฮฃ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation (2403.04692)

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-\Sigma represents a significant advancement over its predecessor, PixArt-\alpha, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-\Sigma is its training efficiency. Leveraging the foundational pre-training of PixArt-\alpha, it evolves from the weaker' baseline to a stronger' model via incorporating higher quality data, a process we term "weak-to-strong training". The advancements in PixArt-\Sigma are twofold: (1) High-Quality Training Data: PixArt-\Sigma incorporates superior-quality image data, paired with more precise and detailed image captions. (2) Efficient Token Compression: we propose a novel attention module within the DiT framework that compresses both keys and values, significantly improving efficiency and facilitating ultra-high-resolution image generation. Thanks to these improvements, PixArt-\Sigma achieves superior image quality and user prompt adherence capabilities with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters). Moreover, PixArt-\Sigma's capability to generate 4K images supports the creation of high-resolution posters and wallpapers, efficiently bolstering the production of high-quality visual content in industries such as film and gaming.


reacted to vladbogo's post with ๐Ÿ‘ 9 months ago
view post
Post
"Multi-LoRA Composition for Image Generation" introduces two new approaches for combining multiple visual elements in text-to-image generation using Low-Rank Adaptations (LoRAs)! ๐ŸŽจ

Key Points:
* Proposes two methods - LoRA Switch and LoRA Composite - that activate/combine LoRAs during the denoising process rather than merging weights
* LoRA Switch cycles through different LoRAs at each step, while LoRA Composite averages guidance from all LoRAs simultaneously

Paper: Multi-LoRA Composition for Image Generation (2402.16843)
Project page: https://maszhongming.github.io/Multi-LoRA-Composition

Congrats to the authors for their work!
reacted to clefourrier's post with ๐Ÿ‘ 9 months ago
view post
Post
๐Ÿ”ฅ New multimodal leaderboard on the hub: ConTextual!

Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ... ๐Ÿ–ผ๏ธ
So how do you evaluate performance on this task?

The ConTextual team introduced a brand new dataset of instructions and images, to test LMM (large multimodal models) reasoning capabilities, and an associated leaderboard (with a private test set).

This is super exciting imo because it has the potential to be a good benchmark both for multimodal models and for assistants' vision capabilities, thanks to the instructions in the dataset.

Congrats to @rohan598 , @hbXNov , @kaiweichang and @violetpeng !!

Learn more in the blog: https://huggingface.co/blog/leaderboard-contextual
Leaderboard: ucla-contextual/contextual_leaderboard
reacted to osanseviero's post with ๐Ÿ‘ 9 months ago
view post
Post
Diaries of Open Source. Part 2. Open Source is going brrrrr

๐Ÿš€The European Space Agency releases MajorTOM, a dataset of earth observation covering half the earth. The dataset has 2.5 trillion pixels! Congrats @aliFrancis and @mikonvergence !
Dataset: Major-TOM/Core-S2L2A
Viewer: Major-TOM/MajorTOM-Core-Viewer

๐ŸžRe-ranking models by MixedBreadAI, with very high quality, Apache 2 license, and easy to use!
Models: https://huggingface.co/models?other=reranker&sort=trending&search=mixedbread-ai
Blog: https://www.mixedbread.ai/blog/mxbai-rerank-v1

๐ŸงŠStabilityAI and TripoAI release TripoSR, a super-fast MIT-licensed image-to-3D model!
Model: stabilityai/TripoSR
Demo: stabilityai/TripoSR

๐ŸคTogether AI and HazyResearch release Based
Models and datasets: hazyresearch/based-65d77fb76f9c813c8b94339c
GH repo: https://github.com/HazyResearch/based

๐ŸŒŠLaVague: an open-source pipeline to turn natural language into browser actions! It can run locally with HuggingFaceH4/zephyr-7b-gemma-v0.1
Read more about it at https://huggingface.co/posts/dhuynh95/717319217106504

๐Ÿ†Berkeley Function-Calling Leaderboard
Read about it: https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
Leaderboard: https://gorilla.cs.berkeley.edu/leaderboard.html

๐ŸฌSailor-Chat: chat models built on top of OpenOrca and @sarahooker CohereForAI Aya project. They can be used for South-East Asia languages such as Indonesian, Thai, Vietnamese, Malay and Lao!
Models: sail/sailor-language-models-65e19a749f978976f1959825
Demo: sail/Sailor-7B-Chat

๐Ÿค—Arabic-OpenHermes-2.5: OpenHermes dataset translated to Arabic 2A2I/Arabic-OpenHermes-2.5

See the previous part here https://huggingface.co/posts/osanseviero/622788932781684
  • 3 replies
ยท
reacted to andrewyng's post with ๐Ÿ‘ 9 months ago
view post
Post
DeepLearning.AI just announced a new short course: Open Source Models with Hugging Face ๐Ÿค—, taught by Hugging Face's own Maria Khalusova, Marc Sun and Younes Belkada!

As many of you already know, Hugging Face has been a game changer by letting developers quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models.

You'll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, youโ€™ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces.

Thank you very much to Hugging Face's wonderful team for working with us on this.

You can sign up for the course here: https://www.deeplearning.ai/short-courses/open-source-models-hugging-face/
  • 1 reply
ยท
reacted to osanseviero's post with โค๏ธ 9 months ago
view post
Post
Diaries of Open Source. Part 1.

What a week! Here are some of the exciting Open Source releases of the week!

1. BigCode releases The Stack v2 and StarCoder 2
Resources in https://huggingface.co/posts/loubnabnl/596860170283496
Blog https://huggingface.co/blog/starcoder2
Collection: bigcode/starcoder2-65de6da6e87db3383572be1a

2. Playground v2.5, a very powerful new text-to-image model
Model: playgroundai/playground-v2.5-1024px-aesthetic
Demo: playgroundai/playground-v2.5
Blog: https://playground.com/blog/playground-v2-5

3.Evo: DNA foundation models
Blog: https://arcinstitute.org/news/blog/evo
Models: togethercomputer/evo-1-131k-base

4. OpenHermesPreferences: a dataset of ~1 million AI Preferences argilla/OpenHermesPreferences

5. SpeechBrain 1.0: a toolkit with hundreds of recipes and pretrained models for audio-related tasks, such as speech recognition, diarization, and enhancement. New major release!
HF repos: https://huggingface.co/speechbrain
Website: https://speechbrain.github.io/

6. Tower: a suite of Llama-based multilingual translation models Unbabel/tower-659eaedfe36e6dd29eb1805c

7. AllenAI releases OLMo-7B-Instruct
allenai/olmo-suite-65aeaae8fe5b6b2122b46778

8. DIBT - An crowdsourced effort to human-rate prompts. Its 10k prompts dataset is released ttps://huggingface.co/datasets/DIBT/10k_prompts_ranked

9. ChatMusician: A Llama 2 fine-tuned model for music generation m-a-p/ChatMusician

10. Bonito, an model that converts data into synthetic instruction datasets
GitHub: https://github.com/BatsResearch/bonito
Model: BatsResearch/bonito-v1
Paper: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation (2402.18334)
ยท
reacted to akhaliq's post with โค๏ธ 9 months ago
view post
Post
VisionLLaMA

A Unified LLaMA Interface for Vision Tasks

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks (2403.00522)

Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. VisionLLaMA is a unified and generic modelling framework for solving most vision tasks. We extensively evaluate its effectiveness using typical pre-training paradigms in a good portion of downstream tasks of image perception and especially image generation. In many cases, VisionLLaMA have exhibited substantial gains over the previous state-of-the-art vision transformers. We believe that VisionLLaMA can serve as a strong new baseline model for vision generation and understanding.
reacted to aliFrancis's post with ๐Ÿค— 9 months ago
view post
Post
๐Ÿ—บ Major TOM: Expandable Datasets for Earth Observation

๐Ÿšจ RECORD-BREAKING EO DATASET: the largest ever ML-ready Sentinel-2 dataset! It covers almost every single point on Earth captured by the Copernicus Sentinel-2 satellite. @mikonvergence and I are thrilled to finally announce the release of Major-TOM/Core-S2L2A and Major-TOM/Core-S2L1C

๐ŸŒ About half of the entire planet is covered. That's 2,245,886 patches of 1068 x 1068 pixels, available in both L1C and L2A. At 10 m resolution, we've got 256 million square km with over 2.5 trillion pixels. It's all yours with a few lines of code. See the paper linked below ๐Ÿ”ฝ for more info!

๐Ÿงฑ And this is just the beginning. We are currently preparing more datasets from different satellites for the Major TOM org. TOM stands for Terrestrial Observation Metaset - a simple set of rules for building an ecosystem of ML-ready EO datasets, which can be seamlessly combined as if they were Lego bricks.

๐Ÿšดโ€โ™€๏ธ Want to take the dataset for a spin? We have a viewer app on spaces that lets you go anywhere on Earth and shows you the data, if its available Major-TOM/MajorTOM-Core-Viewer

๐Ÿ“ฐ Preprint paper: Major TOM: Expandable Datasets for Earth Observation (2402.12095)
๐Ÿ’ป Colab example: https://colab.research.google.com/github/ESA-PhiLab/Major-TOM/blob/main/03-Filtering-in-Colab.ipynb

Thank you to the amazing ๐Ÿค—Hugging Face team for the support on this one! @osanseviero @lhoestq @BrigitteTousi
  • 1 reply
ยท
reacted to multimodalart's post with ๐Ÿ‘ 9 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! ๐Ÿ“

Model
๐Ÿ“ 2 base model variants mentioned: 2B and 8B sizes

๐Ÿ“ New architecture in all abstraction levels:
- ๐Ÿ”ฝ UNet; โฌ†๏ธ Multimodal Diffusion Transformer, bye cross attention ๐Ÿ‘‹
- ๐Ÿ†• Rectified flows for the diffusion process
- ๐Ÿงฉ Still a Latent Diffusion Model

๐Ÿ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

๐Ÿ—ƒ๏ธ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
๐Ÿ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
โœ๏ธ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
โœ… State of the art in automated evals for composition and prompt understanding
โœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
ยท
reacted to julien-c's post with ๐Ÿ‘ 9 months ago
view post
Post
What if you could casually access your remote GPU in HF Spaces from the comfort of your local VSCode ๐Ÿคฏ
ยท
reacted to chiphuyen's post with ๐Ÿค— 9 months ago
view post
Post
It feels awkward having my first post sharing my stuff, but this is a weekend project that I really enjoyed working on. I'd love to meet more people interested in random ideas like this.

A hard part of building AI applications is choosing which model to use. What if we donโ€™t have to? What if we can predict the best model for any prompt?

Predictive human preference aims to predict which model users might prefer for a specific query.

https://huyenchip.com/2024/02/28/predictive-human-preference.html

One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instantโ€™s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.

One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Hereโ€™s a visualization of predicted human preference for an easy prompt (โ€œhello, how are you?โ€) and a challenging prompt (โ€œExplain why Planc length โ€ฆโ€).

Preference predictors make it possible to create leaderboards unique to any prompt and domain.
ยท
reacted to yjernite's post with ๐Ÿค— 9 months ago
view post
Post
๐Ÿ‘ท๐Ÿฝโ€โ™€๏ธ๐Ÿ“š๐Ÿ”จ Announcing the Foundation Model Development Cheatsheet!

My first ๐Ÿค—Post๐Ÿค— ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
๐Ÿง‘๐Ÿผโ€๐Ÿคโ€๐Ÿง‘๐Ÿผ data selection, curation, and governance;
๐Ÿ“– accurate and limitations-aware documentation;
โšก energy efficiency throughout the training phase;
๐Ÿ“Š thorough capability assessments and risk evaluations;
๐ŸŒ environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice ๐Ÿค—

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha ๐Ÿ”ฅ
  • 1 reply
ยท