-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper • 2402.03293 • Published • 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper • 2401.11316 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51
Collections
Discover the best community collections!
Collections including paper arxiv:2405.12130
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 135 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 35
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 41 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 83 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 49
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 59 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28
-
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2 -
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 59 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 7 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 25 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61
-
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 14 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 22 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper • 2402.03293 • Published • 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper • 2401.11316 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 130 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 59 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 135 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 35
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 7 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 25 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 41 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 83 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 49
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 59 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28
-
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 14 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 22 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30
-
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2 -
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2