Peter Szemraj's picture

Peter Szemraj PRO

pszemraj

·

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Recent Activity

upvoted a paper 2 days ago

liked a Space 2 days ago

AtlaAI/judge-arena

New activity 2 days ago

pszemraj/bart-base-instructiongen-w-inputs

Organizations

pszemraj's activity

upvoted a paper 2 days ago

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published 6 days ago • 37

upvoted a paper 3 days ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published 8 days ago • 37

upvoted 2 papers 11 days ago

Personalization of Large Language Models: A Survey

Paper • 2411.00027 • Published 24 days ago • 31

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published 14 days ago • 63

upvoted a collection 16 days ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 10 items • Updated about 7 hours ago • 172

upvoted 2 papers 22 days ago

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published 24 days ago • 29

A Survey of Small Language Models

Paper • 2410.20011 • Published 27 days ago • 37

upvoted a collection 26 days ago

OPT

OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3. • 12 items • Updated about 6 hours ago • 4

upvoted a collection about 1 month ago

INTELLECT-1 Dataset

INTELLECT-1 Training dataset • 5 items • Updated Oct 8 • 10

upvoted an article about 1 month ago

Article

Improving Parquet Dedupe on Hugging Face Hub

Oct 5

• 31

upvoted a collection about 1 month ago

Florence

9 items • Updated Jul 11 • 160

upvoted a paper about 2 months ago

The AdEMAMix Optimizer: Better, Faster, Older

Paper • 2409.03137 • Published Sep 5 • 5

upvoted a collection about 2 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 372

upvoted a paper about 2 months ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published Sep 12 • 66

upvoted a collection 2 months ago

NanoLM

a collection of nano LMs • 13 items • Updated Sep 11 • 4

upvoted a collection 3 months ago

Switch-Transformers release

This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 15

upvoted 3 articles 3 months ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Aug 21

• 22

Article

∞🧙🏼‍♂️AnyClassifier - Generating Synthetic Data For Text Classification

By

•

Aug 19

• 8

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

By

•

Aug 4

• 26

upvoted a collection 4 months ago

📈 Scaling Laws with Vocabulary

Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11 • 4