Lee Junbum PRO

beomi

AI & ML interests

AI/ML GDE. Advancing Low-Resource Language Open Access LLM

Organizations

beomi's activity

posted an update 21 days ago
view post
Post
3279
# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)


but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273
reacted to danielhanchen's post with ๐Ÿค—โค๏ธ๐Ÿš€ 4 months ago
view post
Post
3548
Yay we got 500K+ monthly HF downloads on our Unsloth HF repo! :) Super appreciate everyone in the OSS community - and thanks for using Unsloth!!
ยท
reacted to Tonic's post with โค๏ธ 5 months ago
view post
Post
2474
appreciation post for @osanseviero + huggingface staff ( @reach-vb , @merve , many others many many others) , that fight hard for many weeks / months to fix the releases in many organisations to make it easier for us to test out so many things ... ๐Ÿค—๐Ÿค—๐Ÿค— thanks for that folks !
  • 1 reply
ยท
reacted to clem's post with โค๏ธ๐Ÿš€ 5 months ago
view post
Post
3640
Who said you couldn't build a big business based on open-source AI? Congrats Mistral team: https://huggingface.co/mistralai
reacted to maywell's post with ๐Ÿ‘๐Ÿš€ 7 months ago
view post
Post
8359
๐Ÿ”ฅ Transfer model's Chat feature, Context length and Knowledge to another under 1 minute without any train.

Imagine being able to create chat models, expand context, and transfer domain-specific knowledge to models, all within a matter of minutes. Our innovative approach, based on a combination of diff-based techniques and sigmoid ratio calculations, makes this possible.

By considering the diffs between the desired information model (long context or chat) and the base model, as well as the diffs between the base model and the target model, we can efficiently transfer features and expand context without the need for extensive training or resources.

Our method minimizes model degradation and ensures that only the desired information is captured, resulting in high-quality models that can be created with just a single click. Whether you need a chat model, expanded context, or domain-specific knowledge transfer, our approach offers a rapid and effective solution.

In blog post below, we will dive into the details of our method, provide code examples, and showcase the impressive results achieved using our approach. Get ready to revolutionize your model creation process and unlock new possibilities with this powerful technique.

Blog - https://huggingface.co/blog/maywell/llm-feature-transfer
  • 2 replies
ยท
reacted to vikhyatk's post with ๐Ÿš€๐Ÿ”ฅ 7 months ago
view post
Post
3042
Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!
posted an update 7 months ago
view post
Post
12240
#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image ๐Ÿฅฒ

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax
reacted to tomaarsen's post with ๐Ÿ”ฅ 7 months ago
view post
Post
3153
๐Ÿš€ Sentence Transformers v2.7.0 is out! Featuring a new loss function, easier Matryoshka model inference & evaluation, CrossEncoder improvements & Intel Gaudi2 Accelerator support. Details:

1๏ธโƒฃ A new loss function: CachedGISTEmbedLoss
This loss function is a combination of CachedMultipleNegativesRankingLoss and the GISTEmbedLoss, both of which are already excellent. The caching mechanism allows for much higher batch sizes with constant memory usage, which boosts training performance. The GIST part introduces a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

2๏ธโƒฃ Automatic Matryoshka model truncation
Matryoshka models produce embeddings that are still useful after truncation. However, this truncation always had to be done manually, until now! We've added a truncate_dim option to the Sentence Transformer constructor. This also allows truncation when using HuggingFaceEmbeddings from LlamaIndex or LangChain.

3๏ธโƒฃ Additionally, you can now specify truncate_dim in evaluators to get the performance after truncation. (Hint: it's surprisingly good, even for models not trained with MatryoshkaLoss, and it can speed up e.g. clustering, retrieval, etc.)

4๏ธโƒฃ CrossEncoder improvements
The CrossEncoder now supports 'push_to_hub' to upload trained reranker models to Hugging Face. Additionally, CrossEncoders now support trust_remote_code to load models with custom modelling code.

5๏ธโƒฃ Inference on Intel Gaudi2
If you have an Intel Gaudi2 Accelerator, Sentence Transformers now uses it automatically for even faster inference. No changes are necessary to your code, the device is automatically detected!

Check out the release notes for all of the details: https://github.com/UKPLab/sentence-transformers/releases/tag/v2.7.0

I'm very excited for the upcoming releases: I'm making great progress with a notable v3 refactor that should heavily improve the training process for embedding models!
  • 2 replies
ยท
replied to their post 7 months ago
view reply

I'm testing on it, with 32K(claimed at paper) and 1M seq len.
IMG_6332.jpeg
I'm training those models with minipile dataset and for now, it seems minimal continual training let model to adapt 'memory' could be sufficient.(less than <1B tokens)

Train is not finished yet, but after the loss converges then I could test haystack test or inference tests. it won't be take long :)

posted an update 7 months ago
view post
Post
12222
๐Ÿš€ **InfiniTransformer, Gemma/Llama3 based Implementation!** ๐ŸŒŒ

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

๐Ÿ“„ **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengthsโ€”imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**
bash git clone https://github.com/Beomi/InfiniTransformer
2. **Install necessary tools:**
bash pip install -r requirements.txt pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers
3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! ๐Ÿ˜Š

ps. Training loss plot is here ๐Ÿ˜‰
  • 2 replies
ยท
reacted to clefourrier's post with โค๏ธ 9 months ago
reacted to victor's post with ๐Ÿคฏโค๏ธ๐Ÿค— 9 months ago
view post
Post
๐Ÿ”ฅ New on HuggingChat: Assistants!

Today we are releasing Assistants on HuggingChat!
Assistants are a fun way to package your prompts and share them with the world - powered by Open source Models of course!

Learn more about Assistants here: huggingchat/chat-ui#357
Browse Assistants here: https://huggingface.co/chat/assistants
ยท
reacted to BramVanroy's post with โค๏ธ 9 months ago
view post
Post
๐Ÿ“ฃ DPO Dutch model release + datasets

After teasing for a while, I am finally releasing **GEITje 7B Ultra**, building upon the great GEITje 7B by @Rijgersberg . New contributions include: large new datasets for SFT (instruction/chat), two datasets for DPO training (i.e. RLAIF), and an SFT and DPO version of GEITje. The READMEs describe everything well (I hope), and I'll also share more info on social medias tomorrow.

For me this is a huge release, the datasets more so than the models. I'm especially pleased with UltraChat, which I created with the intent of having a diverse dataset - the model must be able to communicate with different types of users. So the user questions are created as if they were written by different personas, e.g. language learners, young children, experts, critics, etc. The focus with this is "building a good communication bot that is accessible and can handle different kinds of user input".

I wish I could find the time to also write a paper to get some "academic recognition" but that'll have to wait for now. I just want to bring it to the public so that others can play with it and use it to build new, cool stuff!

I hope that you can all appreciate the work. Let's build some cool stuff with it!

Models:
- Demo: https://huggingface.co/spaces/BramVanroy/GEITje-7B-ultra
- DPO Model: BramVanroy/GEITje-7B-ultra
- SFT model (not recommended): BramVanroy/GEITje-7B-ultra-sft

Datasets with GPT-4 turbo completions:
- No robots (~10k instructions): BramVanroy/no_robots_dutch
- UltraChat (~200k instructions): BramVanroy/ultrachat_200k_dutch
- UltraFeedback (DPO with GPT4+GEITje chat, ~50k): BramVanroy/ultra_feedback_dutch
- Orca DPO Pairs (DPO with GPT4+GEITje chat, ~10k): BramVanroy/orca_dpo_pairs_dutch
ยท