Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

·

SebastianG74019

AI & ML interests

Pushing performance in small language models

Organizations

Locutusque's activity

upvoted 2 papers 2 months ago

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11 • 51

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 33

upvoted an article 4 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 312

upvoted 3 papers 4 months ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29 • 43

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 86

upvoted a collection 4 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20 • 89

upvoted an article 4 months ago

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By

•

May 7

• 7

upvoted 2 articles 5 months ago

Article

Introducing the Open Chain of Thought Leaderboard

Apr 23

• 23

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 221

upvoted a collection 5 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673

upvoted a paper 5 months ago

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Paper • 2404.07647 • Published Apr 11 • 4

upvoted a collection 5 months ago

OpenCerebrum-2.0

My open source take on Aether Research's proprietary Cerebrum dataset. • 3 items • Updated Apr 13 • 1

upvoted 2 papers 5 months ago

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 63

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 103

upvoted a paper 6 months ago

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 69

upvoted a collection 6 months ago

Augmentable

A collection of datasets that should be augmented further with gpt-4 • 13 items • Updated Jan 2 • 4

upvoted 2 collections 7 months ago

Hub Models

615 items • Updated 1 day ago • 5

Merges

Experimental LLM merging • 1292 items • Updated Jul 21 • 7

upvoted 4 collections 8 months ago

Qwen1.5

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 1 day ago • 205

ZeroGPU Spaces

ZeroGPU Spaces made by the community • 17 items • Updated Jun 6 • 217

Medical Evaluation Datasets

41 items • Updated 29 days ago • 6

Tiny Series

Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26 • 34

upvoted a collection 9 months ago

Pretrained Text-Generation Models Below 250M Parameters

Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 8 items • Updated Aug 10 • 7

upvoted a paper 9 months ago

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

Paper • 2312.14187 • Published Dec 20, 2023 • 49

upvoted 2 collections 9 months ago

smol llama

🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29 • 6

Trained Models 🏋️

They may be small, but they're training like giants! • 8 items • Updated May 13 • 16

upvoted a paper 9 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 256

upvoted a collection 9 months ago

InstructWise

InstructWise is a series of model created to act as helpful virtual assistant while maintaing the memory efficiency. • 2 items • Updated Dec 3, 2023 • 2