my-ai-university
/

LLaMA-TOMMI-1.0

Question Answering

Model card Files Files and versions Community

rahulgulati commited on Mar 21

Commit

d51287b

·

verified ·

1 Parent(s): a658fd9

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -9,18 +9,18 @@ pipeline_tag: question-answering
 # Model Details
-This model is a fine-tuned version of base model [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) using LoRA on the [qa_with_chat_template_250201.csv](https://drive.google.com/file/d/1uv-kVP0z3E8u9-u8PWAKA9tkr3ENHeZv/view?usp=sharing) dataset combing materials from FEM courses of [Prof. Krishna Garikipati](https://viterbi.usc.edu/directory/faculty/Garikipati/Krishna).
-Compared with [TOMMI-0.3](https://huggingface.co/my-ai-university/TOMMI-0.2/), TOMMI-0.35 uses the same hyperparameter full data (without student asked QA pairs) and increased token length of 700 from 500.
 ## **Hyperparameters**
-* learning_rate: 5e-4
 * gradient_accumulation_steps: 2
 * epoch: 5
-* r (lora rank): 17
-* lora_alpha: 40
-* lora_dropout: 0.1
 * target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
 ## **Usage**

 # Model Details
+This model is a fine-tuned version of base model [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) using LoRA on the [train_qa_wo_students.csv](https://drive.google.com/file/d/1uv-kVP0z3E8u9-u8PWAKA9tkr3ENHeZv/view?usp=sharing) dataset combing materials from FEM courses of [Prof. Krishna Garikipati](https://viterbi.usc.edu/directory/faculty/Garikipati/Krishna).
+Compared with [TOMMI-0.35](https://huggingface.co/my-ai-university/TOMMI-0.35/), TOMMI-1.0 uses the optimal hyperparameters (without student asked QA pairs) and increased token length of 700 from 500.
 ## **Hyperparameters**
+* learning_rate: 5e-5
 * gradient_accumulation_steps: 2
 * epoch: 5
+* r (lora rank): 45
+* lora_alpha: 65
+* lora_dropout: 0.05
 * target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
 ## **Usage**