Sayak Paul

sayakpaul

AI & ML interests

Diffusion models, representation learning

Articles

Organizations

sayakpaul's activity

posted an update about 1 month ago
view post
Post
2418
Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:

* Decrease the rank of a LoRA
* Increase the rank of a LoRA

The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to torch.compile() them.

Check it out here:
sayakpaul/flux-lora-resizing
  • 1 reply
Β·
reacted to dn6's post with ❀️ 2 months ago
view post
Post
2511
Sharing for anyone using Diffusers from_single_file loading and affected by the Runway SD 1.5 issue.

If you have runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")


If you do not have the model repo saved in your cache, then automatically inferring the pipeline config will not work since the reference repo runwayml/stable-diffusion-v1-5 doesn't exist anymore.

You can use an alternative SD1.5 repo id to still configure your pipeline.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")


We're working on resolving the issue ASAP.
  • 2 replies
Β·
posted an update 3 months ago
posted an update 3 months ago
view post
Post
4450
Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday πŸ€—
Β·
posted an update 3 months ago
view post
Post
3777
With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.

We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.

We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.

Diffusers 🀝 Quanto ❀️

This was a juicy collaboration between @dacorvo and myself.

Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers
Β·
reacted to alex-abb's post with πŸ”₯ 5 months ago
view post
Post
4751
Hi everyone!
I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.

alex-abb/LLM_Feeling_Analyzer
Β·
posted an update 5 months ago
posted an update 5 months ago
view post
Post
3112
What is your favorite part of our Diffusers integration of Stable Diffusion 3?

My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.

Learn more about them here:
https://huggingface.co/blog/sd3
replied to lunarflu's post 5 months ago
posted an update 6 months ago
view post
Post
1844
🧨 Diffusers 0.28.0 is out πŸ”₯

It features the first non-generative pipeline of the library -- Marigold πŸ₯

Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.

This release also features a massive refactor (led by @DN6 ) of the from_single_file() method, highlighting our efforts for making our library more amenable to community features πŸ€—

Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
reacted to lunarflu's post with ❀️ 6 months ago
view post
Post
1881
cooking up something....anyone interested in a daily activity tracker for HF?
Β·
posted an update 6 months ago
view post
Post
2015
Custom pipelines and components in Diffusers 🎸

Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?

Found it inflexible?

Since the first dawn on earth, we have supported loading custom pipelines via a custom_pipeline argument πŸŒ„

These pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.

We have many cool pipelines, implemented that way. They all share the same benefits available to a DiffusionPipeline, no compromise there πŸ€—

Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community

Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.

All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.

SDXL Japanese was implemented like this πŸ”₯
stabilityai/japanese-stable-diffusion-xl

Full guide is available here ⬇️
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview

And, of course, these share all the benefits that come with DiffusionPipeline.
reacted to chansung's post with β€οΈπŸ€— 7 months ago
view post
Post
4389
πŸ’» Smoothing the Transition from Service LLM to Local LLM

Imagine your go-to LLM service is down, or you need to use it offline – yikes! This project is all about having that "Plan B" ready to go. Here's LLaMA Duo I've been building with @sayakpaul :

✨ Fine-tune a smaller LLM: We used Hugging Face's alignment-handbook to teach a smaller-sized LLM to mimic my favorite large language model. Think of it as that super-smart AI assistant getting a capable understudy.

πŸ€– Batch Inference: Let's get that fine-tuned LLM working! My scripts generate lots of text like a champ, and we've made sure things run smoothly even with bigger workloads.

🧐 Evaluation: How well is my small LLM doing? We integrated with the Gemini API to use it as an expert judge – it compares my model's work to the original. Talk about a tough critic!

πŸͺ„ Synthetic Data Generation: Need to boost that model's performance? Using Gemini's feedback, we can create even more training data, custom-made to make the LLM better.

🧱 Building Blocks: This isn't just a one-time thing – it's a toolkit for all kinds of LLMOps work. Want to change your evaluation metrics? Bring in models trained differently? Absolutely, let's make it happen.

Why this project is awesome:

πŸ’ͺ Reliability: Keep things running no matter what happens to your main LLM source.
πŸ”’ Privacy: Process sensitive information on your own terms.
πŸ—ΊοΈ Offline capable: No internet connection? No problem!
πŸ•°οΈ Version Control: Lock in your favorite LLM's behavior, even if the service model changes.

We'm excited to share the code on GitHub. Curious to see what you all think! πŸ‘‰πŸ» https://github.com/deep-diver/llamaduo
posted an update 7 months ago
posted an update 8 months ago
view post
Post
2381
Worked on a short blog post discussing how we semi-automated the release process of the diffusers library. The post delves deeper into the workflows responsible for:

* Publishing the package on Test PyPI and main PyPI servers.
* Notifying an internal Slack channel after a release is published on the repository.

Check it out here πŸ‘‰
https://sayak.dev/posts/streamlined-releases.html
posted an update 8 months ago
view post
Post
1909
How about engaging in a creative chat with your favorite video character? πŸ’¬

@chansung and I worked on a weekend project combining the benefits of Gemini 1.0 and powerful chat models like Zephyr to demo this.

We use Gemini 1.0 to produce the personality traits of any character found in an input video. We then prepare a system prompt with the discovered traits to start chatting with an LLM (Zephyr in this case).

Managing a video captioning model is a little out of our expertise, hence Gemini FTW here πŸ˜Άβ€πŸŒ«οΈ

πŸ‘¨β€πŸ’» Code: https://github.com/deep-diver/Vid2Persona
πŸ€— Demo: chansung/vid2persona
reacted to chiphuyen's post with ❀️ 8 months ago
posted an update 8 months ago
view post
Post
We released 🧨 Diffusers 0.27.0, and it's a versatile release πŸ’«

Among other things, we shipped:

* Stable Cascade
* Playground v2.5 and EDM-style training
* EDM-formulated schedulers
* Trajectory Consistency Distillation for accelerated sampling
* A new guide on merging LoRAs
* A new image editing pipeline -- LEDITS++

Check out the release notes to catch everything that went into the release
https://github.com/huggingface/diffusers/releases/tag/v0.27.0

Thanks to everyone that contributed to the release πŸ€—
replied to chansung's post 10 months ago
view reply

I mean we should be able to make the most out of the GPU by reducing the idle-time as much as possible while also ensuring the throughput is really the highest we can get out of the card.

For example, if we are getting 60 QPS, is that the highest we can get out of the card? Is it the maximum limit?