Dmitry Ryumin

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Organizations

DmitryRyumin's activity

reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ 18 days ago
view post
Post
1800
Good folks at @nvidia have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!

Here is what they are proposing:

1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.

2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.

3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.

4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.

5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.

6. Rescale the intermediate states of the MLP block using learnable scaling factors.

7. Implement rescaling of the output logits using learnable scaling factors.

8. Remove weight decay and learning rate warmup from the optimization process.

9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.

10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.

11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.

12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.

13. Use the Adam optimizer without weight decay for training the model.

14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.

15. During inference, follow the same normalization and scaling procedures as in training.

Excited to see how it scales to larger models and datasets!
reacted to merve's post with ๐Ÿ”ฅ 18 days ago
reacted to albertvillanova's post with ๐Ÿ‘ 23 days ago
view post
Post
1913
๐Ÿšจ Weโ€™ve just released a new tool to compare the performance of models in the ๐Ÿค— Open LLM Leaderboard: the Comparator ๐ŸŽ‰
open-llm-leaderboard/comparator

Want to see how two different versions of LLaMA stack up? Letโ€™s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. ๐Ÿฆ™๐Ÿงต๐Ÿ‘‡

1/ Load the Models' Results
- Go to the ๐Ÿค— Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!

2/ Compare Metric Results in the Results Tab ๐Ÿ“Š
- Head over to the Results tab.
- Here, youโ€™ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! ๐ŸŒŸ
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.

3/ Check Config Alignment in the Configs Tab โš™๏ธ
- To ensure youโ€™re comparing apples to apples, head to the Configs tab.
- Review both modelsโ€™ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, itโ€™s good to know before drawing conclusions! โœ…

4/ Compare Predictions by Sample in the Details Tab ๐Ÿ”
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each modelโ€™s outputs.

5/ With this tool, itโ€™s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether youโ€™re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.

๐Ÿš€ Try the ๐Ÿค— Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
reacted to m-ric's post with ๐Ÿ‘€ 23 days ago
view post
Post
1678
By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. ๐Ÿค”๐Ÿ”Ž

Try it out here ๐Ÿ‘‰ open-llm-leaderboard/comparator
  • 2 replies
ยท
reacted to TuringsSolutions's post with ๐Ÿ‘ 30 days ago
view post
Post
1639
Microsoft released a method that allows you to vectorize word vectors themselves! It is called VPTQ. You can check out their full paper including the method and all of the math for the algorithm, or you can watch this video where I did all of that for you, then reconstructed their entire method within Python!

https://youtu.be/YwlKzV1y62s
ยท
reacted to nyuuzyou's post with ๐Ÿ‘€ 30 days ago
view post
Post
1921
๐ŸŽ“ Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
reacted to tomaarsen's post with ๐Ÿ”ฅ 30 days ago
view post
Post
6019
๐Ÿ“ฃ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1๏ธโƒฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2๏ธโƒฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐Ÿ˜‰

๐Ÿ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1๏ธโƒฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2๏ธโƒฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3699
Meta AI vision has been cooking @facebook
They shipped multiple models and demos for their papers at @ECCV ๐Ÿค—

Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos ๐Ÿ‘

All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images

Model: facebook/vfusion3d
Demo: facebook/VFusion3D

- CoTracker is the state-of-the-art point (pixel) tracking model

Demo: facebook/cotracker
Model: facebook/cotracker
reacted to m-ric's post with ๐Ÿ‘ about 1 month ago
view post
Post
3024
๐Ÿ“œ ๐Ž๐ฅ๐-๐ฌ๐œ๐ก๐จ๐จ๐ฅ ๐‘๐๐๐ฌ ๐œ๐š๐ง ๐š๐œ๐ญ๐ฎ๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ข๐ฏ๐š๐ฅ ๐Ÿ๐š๐ง๐œ๐ฒ ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

โšก๏ธ As a result, you can use a โ€œparallel scanโ€ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

๐Ÿ”ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐Ÿš€

๐Ÿค” Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

๐Ÿ’ฌย Franรงois Chollet wrote in a tweet about this paper:

โ€œThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ€

โ€œCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ€

Itโ€™s the Bitter lesson by Rich Sutton striking again: donโ€™t need fancy thinking architectures, just scale up your model and data!

Read the paper ๐Ÿ‘‰ย  Were RNNs All We Needed? (2410.01201)
  • 2 replies
ยท
reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2689
NVIDIA just dropped a gigantic multimodal model called NVLM 72B ๐Ÿฆ–
nvidia/NVLM-D-72B
Paper page NVLM: Open Frontier-Class Multimodal LLMs (2409.11402)

The paper contains many ablation studies on various ways to use the LLM backbone ๐Ÿ‘‡๐Ÿป

๐Ÿฆฉ Flamingo-like cross-attention (NVLM-X)
๐ŸŒ‹ Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D)
โœจ a hybrid architecture (NVLM-H)

Checking evaluations, NVLM-D and NVLM-H are best or second best compared to other models ๐Ÿ‘

The released model is NVLM-D based on Qwen-2 Instruct, aligned with InternViT-6B using a huge mixture of different datasets

You can easily use this model by loading it through transformers' AutoModel ๐Ÿ˜
reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3955
If you feel like you missed out for ECCV 2024, there's an app to browse the papers, rank for popularity, filter for open models, datasets and demos ๐Ÿ“

Get started at ECCV/ECCV2024-papers โœจ
reacted to their post with ๐Ÿค—๐Ÿ”ฅ about 1 month ago
view post
Post
2457
๐Ÿ”ฅ๐ŸŽญ๐ŸŒŸ New Research Alert - HeadGAP (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿ”ฅ
๐Ÿ“„ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐Ÿ”

๐Ÿ“ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.

๐Ÿ‘ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu

๐Ÿ“„ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)

๐ŸŒ Github Page: https://headgap.github.io

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
posted an update about 1 month ago
view post
Post
2457
๐Ÿ”ฅ๐ŸŽญ๐ŸŒŸ New Research Alert - HeadGAP (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿ”ฅ
๐Ÿ“„ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐Ÿ”

๐Ÿ“ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.

๐Ÿ‘ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu

๐Ÿ“„ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)

๐ŸŒ Github Page: https://headgap.github.io

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
reacted to abidlabs's post with โค๏ธ about 1 month ago
view post
Post
4020
๐Ÿ‘‹ Hi Gradio community,

I'm excited to share that Gradio 5 will launch in October with improvements across security, performance, SEO, design (see the screenshot for Gradio 4 vs. Gradio 5), and user experience, making Gradio a mature framework for web-based ML applications.

Gradio 5 is currently in beta, so if you'd like to try it out early, please refer to the instructions below:

---------- Installation -------------

Gradio 5 depends on Python 3.10 or higher, so if you are running Gradio locally, please ensure that you have Python 3.10 or higher, or download it here: https://www.python.org/downloads/

* Locally: If you are running gradio locally, simply install the release candidate with pip install gradio --pre
* Spaces: If you would like to update an existing gradio Space to use Gradio 5, you can simply update the sdk_version to be 5.0.0b3 in the README.md file on Spaces.

In most cases, thatโ€™s all you have to do to run Gradio 5.0. If you start your Gradio application, you should see your Gradio app running, with a fresh new UI.

-----------------------------

Fore more information, please see: https://github.com/gradio-app/gradio/issues/9463
  • 2 replies
ยท
reacted to fdaudens's post with ๐Ÿš€ about 1 month ago
view post
Post
3329
๐Ÿš€ 1,000,000 public models milestone achieved on Hugging Face! ๐Ÿคฏ

This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats

Can't wait to see what's next!
  • 2 replies
ยท
reacted to their post with ๐Ÿค—๐Ÿ”ฅ about 1 month ago
view post
Post
1898
๐Ÿš€๐Ÿ•บ๐ŸŒŸ New Research Alert - ECCV 2024 (Avatars Collection)! ๐ŸŒŸ๐Ÿ’ƒ๐Ÿš€
๐Ÿ“„ Title: Expressive Whole-Body 3D Gaussian Avatar ๐Ÿ”

๐Ÿ“ Description: ExAvatar is a model that generates animatable 3D human avatars with facial expressions and hand movements from short monocular videos using a hybrid mesh and 3D Gaussian representation.

๐Ÿ‘ฅ Authors: Gyeongsik Moon, Takaaki Shiratori, and @psyth

๐Ÿ“… Conference: ECCV, 29 Sep โ€“ 4 Oct, 2024 | Milano, Italy ๐Ÿ‡ฎ๐Ÿ‡น

๐Ÿ“„ Paper: MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos (2407.08414)

๐Ÿ“„ Paper: Expressive Whole-Body 3D Gaussian Avatar (2407.21686)

๐ŸŒ Github Page: https://mks0601.github.io/ExAvatar
๐Ÿ“ Repository: https://github.com/mks0601/ExAvatar_RELEASE

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #ExAvatar #3DAvatar #FacialExpressions #HandMotions #MonocularVideo #3DModeling #GaussianSplatting #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #ECCV2024
posted an update about 1 month ago
view post
Post
1898
๐Ÿš€๐Ÿ•บ๐ŸŒŸ New Research Alert - ECCV 2024 (Avatars Collection)! ๐ŸŒŸ๐Ÿ’ƒ๐Ÿš€
๐Ÿ“„ Title: Expressive Whole-Body 3D Gaussian Avatar ๐Ÿ”

๐Ÿ“ Description: ExAvatar is a model that generates animatable 3D human avatars with facial expressions and hand movements from short monocular videos using a hybrid mesh and 3D Gaussian representation.

๐Ÿ‘ฅ Authors: Gyeongsik Moon, Takaaki Shiratori, and @psyth

๐Ÿ“… Conference: ECCV, 29 Sep โ€“ 4 Oct, 2024 | Milano, Italy ๐Ÿ‡ฎ๐Ÿ‡น

๐Ÿ“„ Paper: MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos (2407.08414)

๐Ÿ“„ Paper: Expressive Whole-Body 3D Gaussian Avatar (2407.21686)

๐ŸŒ Github Page: https://mks0601.github.io/ExAvatar
๐Ÿ“ Repository: https://github.com/mks0601/ExAvatar_RELEASE

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #ExAvatar #3DAvatar #FacialExpressions #HandMotions #MonocularVideo #3DModeling #GaussianSplatting #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #ECCV2024
reacted to fdaudens's post with ๐Ÿค— about 1 month ago