pL-Community
/

GermanEduScorer-Qwen2-1.5b

@@ -64,7 +64,26 @@ Training specifics included:
 This configuration resulted in significant performance improvements particularly in terms relevant feature recognition and response structuring according prescribed evaluation metrics.
-**5.Future Work & Acknowledgements:**
 Continued efforts will focus on labeling additional datasets within this domain which will be made publicly available under our organizational repository enhancing accessibility for further research applications within academic settings or other pedagogical assessments tools development initiatives .
 Special thanks are due David and Daryoush from Vago Solutions; Björn and Jan from Ellamind / DiscoResearch whose insights into dataset reviews ,prompt formulations discussions about final trained model configurations were invaluable throughout this project’s lifecycle .

 This configuration resulted in significant performance improvements particularly in terms relevant feature recognition and response structuring according prescribed evaluation metrics.
+**5. Analysis of Failed Model Trainings and Decision to Use Qwen2-1.5b Model:**
+During the development phase, several models were evaluated for their efficacy in classifying educational content quality. Two notable models that did not meet our final requirements were the Bert regression model and the T5 seq2seq model.
+- **Bert Regression:**
+  The Bert regression model was initially promising due to its high speed of processing, achieving a good quality score of approximately 85%. However, its major limitation was the short context length capability of only 512 tokens. This restriction hindered its ability to process longer texts comprehensively, which is often required in educational materials that encompass detailed explanations or extensive subject matter discussions.
+- **T5 Seq2Seq:**
+  Similarly, the T5 seq2seq model also supported a maximum context length of 512 tokens. Although it slightly outperformed Bert with an average quality score around 88%, it had additional drawbacks such as slower processing speeds and inefficient token usage due to prompt inclusion reducing effective context space further. These factors combined made it less suitable for our needs where prompt flexibility and faster response times were crucial.
+Given these limitations observed in both models regarding token economy and context length capacity, we explored more robust alternatives leading us towards adopting **Qwen2-1.5b** as our primary model:
+- **Qwen2-1.5b:**
+   The decision to utilize Qwen2 stemmed from its superior performance metrics where it achieved an impressive highest quality rating close to ~95%. Notably, this larger model supports up to a substantial 32k tokens in context length allowing comprehensive analysis over extended texts which is vital for educational content evaluation spanning multiple academic levels from elementary through university.
+   Additionally, despite being a larger scale model potentially implying higher computational demands; various optimized inference solutions such as Token Grouping Inference (TGI) or very large language models (vLLM) adaptations have been integrated effectively enhancing operational efficiency making real-time applications feasible without compromising on analytical depth or accuracy.
+In conclusion, while earlier iterations with other models provided valuable insights into necessary features and performance thresholds; transitioning towards using Qwen2 has significantly advanced our project’s capability delivering refined assessments aligned closely with set objectives ensuring robustness scalability future expansions within this domain.
+**6.Future Work & Acknowledgements:**
 Continued efforts will focus on labeling additional datasets within this domain which will be made publicly available under our organizational repository enhancing accessibility for further research applications within academic settings or other pedagogical assessments tools development initiatives .
 Special thanks are due David and Daryoush from Vago Solutions; Björn and Jan from Ellamind / DiscoResearch whose insights into dataset reviews ,prompt formulations discussions about final trained model configurations were invaluable throughout this project’s lifecycle .