ZennyKenny (Kenneth Hamilton)

reacted to jsulz's post with 🚀 about 11 hours ago

Post

1924

In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248

reacted to davanstrien's post with 🚀 1 day ago

Post

1146

huggingface.co/DIBT is dead!

Long live https://huggingface.co/data-is-better-together!

We're working on some very cool projects so we're doing a bit of tidying of the Data is Better Together Hub org 🤓

reacted to ArthurZ's post with 🔥 2 days ago

Post

1980

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

reacted to fdaudens's post with 🚀 4 days ago

Post

1153

🪄 MagicQuill: AI that reads your mind for image edits! Point at what bugs you, and it suggests the perfect fixes. No more manual editing headaches. Try it here: AI4Editing/MagicQuill

posted an update 3 months ago

Post

692

Very excited to have made the list and been invited to OpenAI DevDay 2024 at the London event 30 October! Looking forward to seeing what the future of AI dev holds, connecting with other professionals in the field, and advocating for open source AI!

https://openai.com/devday/

reacted to Taylor658's post with 👍 3 months ago

Post

2345

💡Andrew Ng recently gave a strong defense of Open Source AI models and the need to slow down legislative efforts in the US and the EU to restrict innovation in Open Source AI at Stanford GSB.

🎥See video below
https://youtu.be/yzUdmwlh1sQ?si=bZc690p8iubolXm_

4 replies

·

replied to Taylor658's post 3 months ago

As usual, Andrew Ng states the cogent position concisely and clearly for people who may not be familiar with the memes of the AI world.

Personally, I think some government committee or agency that focuses on AI could be a good thing, but having seen regulatory body after regulatory body in the United States fumble well meaning attempts to stay informed and turn those attempts into suffocating legislation, it seems that the only realistic position to advocate is no regulation whatsoever simply because any foot in the door oversight or law is simply going to be warped into red tape and bureaucracy based on the ever-changing winds of the election cycle.

replied to KingNish's post 3 months ago

Looking forward to trying!

replied to merve's post 3 months ago

Thank you for sharing.

reacted to merve's post with 🔥 3 months ago

Post

3924

New smol-vision tutorial dropped: QLoRA fine-tuning IDEFICS3-Llama 8B on VQAv2 🐶

Learn how to efficiently fine-tune the latest IDEFICS3-Llama on visual question answering in this notebook 📖
Fine-tuning notebook: https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb
Resulting model: merve/idefics3llama-vqav2

3 replies

·

reacted to severo's post with 🚀 4 months ago

Post

3437

[New tool] Follow interesting ML persons 👩‍🎨 👨‍🎤 👩‍🏫 with Followgraph

severo/followgraph

Please try it and tell me if it helped you discover high-quality content 👍 👎

I repurposed "Followgraph for Mastodon" (https://followgraph.vercel.app/).

My new follows: @TheBloke @mlabonne @teknium @KnutJaegersberg @SkalskiP @AmelieSchreiber @lbourdois @ceyda @andrewyng @Pclanglais @karpathy

And you?

5 replies

·

reacted to nroggendorff's post with 😎 4 months ago

Post

4080

Datasets are down, I offer a solution

git lfs install

git clone https://huggingface.co/datasets/{dataset/id}

from datasets import load_dataset

dataset = load_dataset("id")

reacted to qnguyen3's post with 🔥 5 months ago

Post

3738

nanoLLaVA-1.5 is here! Same size (1B), better performance 🔥🔥🔥
And it is much more powerful than v1.0
Try it out now on HF Spaces: qnguyen3/nanoLLaVA
Model: qnguyen3/nanoLLaVA-1.5

3 replies

·

reacted to dvilasuero's post with 🚀 5 months ago

Post

7939

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

·

reacted to fdaudens's post with 🔥 6 months ago

Post

1399

Impressed by the work of @guipenedo @hynky @loubnabnl @anton-l @craffel @lvwerra @thomwolf on FineWeb.

LLMs are only as good as the data they have been trained on, but the crucial aspect of pretraining data remains obscure. Our approach lifts the veil on building high-quality pretraining datasets by sharing every detail about this process to enable a wider community to build on top of it.

- The FineWeb-Edu dataset, which outperforms all openly accessible web datasets in a number of educational benchmarks. We built it by developing a quality classifier using annotations generated by an LLM.

- A new technical report explaining in detail how to create a large and high-quality web-scale dataset for LLM pretraining such as FineWeb

👉 HuggingFaceFW/blogpost-fineweb-v1

replied to alielfilali01's post 6 months ago

Great achievement, congratulations to the entire team!

reacted to alielfilali01's post with 🔥 6 months ago

Post

1063

The 100 models milestone on the OALL/Open-Arabic-LLM-Leaderboard is successfully reached within 10 days after the leaderboard's release 🥳

meta-llama/Meta-Llama-3-70B-Instruct is still the king of the leaderboard 👑 with a 3.46 points difference compared to its successor CohereForAI/c4ai-command-r-plus who took the 2nd place 🥈 from his younger brother CohereForAI/c4ai-command-r-v01 that lives today in the 5th floor just behind Ashmal/MBZUAI-oryx -3rd place 🥉- (AFAIK an experimental model from MBZUAI) and https://huggingface.co/core42/jais-30b-chat-v3 -4th place- from Core42.

PS : I should consider a career in sports commentary 😂
Would you recommend me to BeIN Sports 😀 ?

1 reply

·

reacted to davanstrien's post with 🔥 6 months ago

Post

1852

I've begun adding valuable blog posts on using/creating synthetic datasets to my curated list.

I am starting with a great post by @MoritzLaurer on utilizing an open LLM to generate data for training a specialized Roberta model.

Read the blog post: https://huggingface.co/blog/synthetic-data-save-costs
See the rest of the list: https://github.com/davanstrien/awesome-synthetic-datasets

posted an update 6 months ago

Post

1172

Thanks to the incredible collaboration of 14 community annotators, @davanstrien of HF and @dvilasuero et. al of Argilla, DIBT (https://huggingface.co/DIBT) is pleased to make available a Russian-language dataset of 500 of the best curated LLM prompts translated to Russian and available for use: https://huggingface.co/datasets/DIBT/MPEP_RUSSIAN.

More to come from the MPEP initiative! Interested in annotating or leading a language team? https://github.com/huggingface/data-is-better-together/tree/main/prompt_translation

2 replies

·

reacted to davanstrien's post with 🔥 7 months ago

Post

2254

Only 14 languages have DPO preference style datasets on the Hugging Face Hub (https://huggingface.co/spaces/DIBT/preference_data_by_language) Let's improve that! How?

The Cohere For AI Aya dataset CohereForAI/aya_dataset has human-annotated prompt-completion pairs in 71 languages. We can use this to create DPO datasets for more languages!

Using Aya's prompt/response pairs as a starting point we can use an LLM to generate an additional response to each prompt. We then use an LLM Judge to rank each response.

✅ In some/many languages, human responses may be better than LLM ones but we may want to check that assumption for some languages.
🚀 We use Argilla's distilabel library to push data to Argilla for validation. This also allows us to determine if an LLM judge is effective for different languages.

As an example of what this pipeline produces:
- https://huggingface.co/datasets/DIBT/aya_dutch_dpo a DPO style dataset for Dutch using Llama 3 as a generator/judge LM.
- An annotation Space that anyone with a HF account can contribute to: https://dibt-demo-argilla-space.hf.space/dataset/924ef8a8-a447-4563-8806-0e2a668a5314/annotation-mode?page=1&status=pending

As part of Data is Better Together we want to build more DPO datasets. Join us here: https://github.com/huggingface/data-is-better-together#4-dpoorpo-datasets-for-more-languages 🤗

Kenneth Hamilton PRO

AI & ML interests

Recent Activity

Organizations

ZennyKenny's activity