admarcosai
's Collections
Alignment: FineTuning-Preference
updated
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
•
2311.02805
•
Published
•
3
Ultra-Long Sequence Distributed Transformer
Paper
•
2311.02382
•
Published
•
2
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
•
2309.11235
•
Published
•
16
SiRA: Sparse Mixture of Low Rank Adaptation
Paper
•
2311.09179
•
Published
•
8
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
•
2311.13231
•
Published
•
26
Language Models are Super Mario: Absorbing Abilities from Homologous
Models as a Free Lunch
Paper
•
2311.03099
•
Published
•
28
Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models
Paper
•
2312.07046
•
Published
•
12
"I Want It That Way": Enabling Interactive Decision Support Using Large
Language Models and Constraint Programming
Paper
•
2312.06908
•
Published
•
5
Federated Full-Parameter Tuning of Billion-Sized Language Models with
Communication Cost under 18 Kilobytes
Paper
•
2312.06353
•
Published
•
5
TOFU: A Task of Fictitious Unlearning for LLMs
Paper
•
2401.06121
•
Published
•
14
Patchscope: A Unifying Framework for Inspecting Hidden Representations
of Language Models
Paper
•
2401.06102
•
Published
•
19
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
•
2401.05811
•
Published
•
5
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
•
2401.02412
•
Published
•
36
TrustLLM: Trustworthiness in Large Language Models
Paper
•
2401.05561
•
Published
•
64
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
•
2310.13639
•
Published
•
24
selfrag/selfrag_train_data
Viewer
•
Updated
•
146k
•
130
•
66
Efficient Exploration for LLMs
Paper
•
2402.00396
•
Published
•
21
Structured Code Representations Enable Data-Efficient Adaptation of Code
Language Models
Paper
•
2401.10716
•
Published
•
1
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
•
2401.06080
•
Published
•
25
Secrets of RLHF in Large Language Models Part I: PPO
Paper
•
2307.04964
•
Published
•
28
Transforming and Combining Rewards for Aligning Large Language Models
Paper
•
2402.00742
•
Published
•
11
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
27
SciGLM: Training Scientific Language Models with Self-Reflective
Instruction Annotation and Tuning
Paper
•
2401.07950
•
Published
•
4
Generative Representational Instruction Tuning
Paper
•
2402.09906
•
Published
•
51
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Paper
•
2402.10210
•
Published
•
29
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
•
2402.10893
•
Published
•
10