M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published 3 days ago • 28
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published Aug 20 • 10
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7 • 17
Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming Paper • 2406.11654 • Published Jun 17 • 6
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling Paper • 2406.11617 • Published Jun 17 • 8
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment Paper • 2406.15193 • Published Jun 21 • 12
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic Paper • 2402.11746 • Published Feb 19 • 2
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations Paper • 2406.11801 • Published Jun 17 • 15
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published May 28 • 9
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment Paper • 2308.09662 • Published Aug 18, 2023 • 3
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning Paper • 2307.02053 • Published Jul 5, 2023 • 23
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models Paper • 2306.04757 • Published Jun 7, 2023 • 6