Just tested Argilla's new data annotation feature - it's a game changer for AI project quality.
Upload CSVs, work with published datasets, or improve existing ones directly on HuggingFace Hub. Setup took < 2 minutes, no code needed (see example below where I selected a dataset to classify tweets in categories).
Real world impact: Missing in Chicago won a Pulitzer using a similar approach - 200 volunteers labeled police misconduct files to train their model. That's the power of good data annotation.
Three immediate use cases I see: - Build collaborative training sets with your community (surprisingly underused in AI journalism) - Turn your website chatbot logs into high-quality fine-tuning data - Compare generated vs published content (great for SEO headlines)
Works for solo projects or teams up to 100 people. All integrated with HuggingFace Hub for immediate model training.
Interesting to see tools like this making data quality more accessible. Data quality is the hidden driver of AI success that we don't talk about enough.
Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
๐ท๐ฝโโ๏ธ๐๐จ Announcing the Foundation Model Development Cheatsheet!
My first ๐คPost๐ค ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/
The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects: ๐ง๐ผโ๐คโ๐ง๐ผ data selection, curation, and governance; ๐ accurate and limitations-aware documentation; โก energy efficiency throughout the training phase; ๐ thorough capability assessments and risk evaluations; ๐ environmentally and socially conscious deployment strategies.
We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice ๐ค
๐คฏ Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...
Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.
This is no Woodstock AI but will be fun nonetheless haha. Iโll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
I feel like this incredible resource hasn't gotten the attention it deserves in the community!
@clefourrier and generally the HuggingFace evaluation team put together a fantastic guidebook covering a lot about ๐๐ฉ๐๐๐จ๐๐ง๐๐ข๐ก from basics to advanced tips.
Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.
The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.
We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.
Just started going through the latest "State of AI Report 2024", and I cannot get over the predictions!
The report predicts major developments in AI over the next 12 months, including a $10B+ investment from a sovereign state into a large US AI lab, triggering national security scrutiny, and a viral app created by someone without coding skills.
It forecasts changes in data collection practices due to frontier labs facing trials, softer-than-expected EU AI Act implementations, and the rise of an open-source alternative to OpenAI GPT-4 outperforming in benchmarks.
NVIDIAโs dominance will remain largely unchallenged, investment in humanoid robots will decline, Appleโs on-device AI research will gain momentum, and a research paper by an AI scientist will be accepted at a major conference.
Lastly, a GenAI-based video game is expected to achieve breakout success.
Yet to go through all 200+ pages... will post summarized thoughts later.
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
Meta AI vision has been cooking @facebook They shipped multiple models and demos for their papers at @ECCV๐ค
Here's a compilation of my top picks: - Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos ๐
All models have their demos and even torchscript checkpoints! A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc - VFusion3D is state-of-the-art consistent 3D generation model from images
Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.
๐ก Quick reminder on how floats are coded on 8 bits (FP8): In the e4m3 FP8 standard, you encode a number as: Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits) Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625) Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):
โก๏ธย You get (1 + 0.625) ร 2^(8-7) = 3.25
Now back to the paper. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
โก๏ธ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!
๐งฎ Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) ยท 2^xe ยท (1 + ym) ยท 2^ye = (1 + xm + ym + xm ยท ym) ยท 2^(xe+ye)
๐ก L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) ยท 2^(xe+ye), eliminating the costly xm ยท ym term
๐ง l(m) term is adaptively set based on mantissa size for optimal accuracy
๐ Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision
๐ฌ Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."
This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but itโs a really exciting path for more sustainable AI! ๐ฑ
Maybe like me you have always wanted a super easy way to compare llama3.2-1B vs. llama3.2-3B? or the same model with different temperatures?
Trying and comparing warm Inference API models has never been easier! Just go to https://hf.co/playground, set your token and you're ready to go. We'll keep improving, feedback welcome ๐