
tsessk/llm-course-hw2-dpo
Text Generation
•
0.1B
•
Updated
•
1
llm course @ HSE and vk llm A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness