pL-Community
/

GermanEduScorer-Qwen2-1.5b

@@ -211,9 +211,9 @@ Amongst all tested models 'Command r plus' showed highest precision especially n
 The final model was developed using full ORPO finetune technique on VAGOsolutions/SauerkrautLM-1.5b which is based on qwen2 architecture achieving an impressive accuracy rate close approximating that seen with ‘Command r plus’.
 Training specifics included:
-   * Dataset size : 500k unique entries
-   * Epochs : One
-   * Batch Size : 64
 This configuration resulted in significant performance improvements particularly in terms relevant feature recognition and response structuring according prescribed evaluation metrics.
@@ -236,7 +236,48 @@ Given these limitations observed in both models regarding token economy and cont
 In conclusion, while earlier iterations with other models provided valuable insights into necessary features and performance thresholds; transitioning towards using Qwen2 has significantly advanced our project’s capability delivering refined assessments aligned closely with set objectives ensuring robustness scalability future expansions within this domain.
-**6.Future Work & Acknowledgements:**
 Continued efforts will focus on labeling additional datasets within this domain which will be made publicly available under our organizational repository enhancing accessibility for further research applications within academic settings or other pedagogical assessments tools development initiatives .
 Special thanks are due David and Daryoush from Vago Solutions; Björn and Jan from Ellamind / DiscoResearch whose insights into dataset reviews ,prompt formulations discussions about final trained model configurations were invaluable throughout this project’s lifecycle .

 The final model was developed using full ORPO finetune technique on VAGOsolutions/SauerkrautLM-1.5b which is based on qwen2 architecture achieving an impressive accuracy rate close approximating that seen with ‘Command r plus’.
 Training specifics included:
+   * Dataset size : 380k unique entries
+   * Epochs : 3
+   * Batch Size : 512
 This configuration resulted in significant performance improvements particularly in terms relevant feature recognition and response structuring according prescribed evaluation metrics.
 In conclusion, while earlier iterations with other models provided valuable insights into necessary features and performance thresholds; transitioning towards using Qwen2 has significantly advanced our project’s capability delivering refined assessments aligned closely with set objectives ensuring robustness scalability future expansions within this domain.
+**7. How to use**
+```python
+from transformers import pipeline
+import datasets
+pipe = pipeline("text-generation", model="pL-Community/GermanEduScorer-Qwen2-1.5b", device = 0)
+ds_eval = datasets.load_dataset("cis-lmu/GlotCC-V1", "deu-Latn", split="train").shuffle(42)
+iterations = 0
+right = 0
+diff = 0
+false_counter = 0
+for i in ds_eval:
+    messages = [
+        {"role": "system", "content": """Nachfolgend findest du einen Auszug aus einer Webseite. Beurteile, ob die Seite einen hohen pädagogischen Wert hat und in einem pädagogischen Umfeld für den Unterricht von der Grundschule bis zur Universität nützlich sein könnte, indem du das unten beschriebene 5-Punkte-Bewertungssystem anwendest. Die Punkte werden auf der Grundlage der Erfüllung der am besten passenden Kriterien gewählt:
+- 0 Punkte: Der Inhalt ist nicht organisiert und schwer zu lesen. Der Text enthält Werbung oder irrelevante Informationen zum lehren von Inhalten. Der Text ist nicht neutral sondern enthält persöhnliche Sichtweisen. Beispiel: Tweets, Chatnachrichten oder Forenbeiträge.
+- 1 Punkt: Der Text ist für den privaten Gebrauch bestimmt und enthält Werbung oder irrelevante Informationen. Der Text ist nicht neutral und spiegelt zum Teil persönliche Sichtweisen wider. Beispiel: Ein Blogbeitrag, der hauptsächlich auf persönliche Erfahrungen eingeht und nur gelegentlich nützliche Informationen bietet.
+- 2 Punkte: Der Text ist neutral geschrieben, aber enthält Werbung oder irrelevante Informationen. Die enthaltenen Informationen können zeitlich vergänglich sein. Beispiel: Ein Artikel oder Nachrichtenbeitrag.
+- 3 Punkte: Der Text enthält viele Informationen und ist leicht verständlich. Der Text ist neutral geschrieben und enthält keine Werbung oder irrelevante Informationen. Beispiel: Ein Wikipedia-Artikel.
+- 4 Punkte: Der Text ist neutral geschrieben und enthält keine Werbung oder irrelevante Informationen. Der Text enthält tiefergehendes Wissen und ist für den Unterricht von der Grundschule bis zur Universität nützlich. Beispiel: Ein wissenschaftlicher Artikel oder ein Lehrbuch
+- 5 Punkte: Der Text beeinhaltet tiefergehendes Wissen, ist dabei aber dennoch leicht verständlich, sodass jeder daraus lernen und sich neue Fähigkeiten aneignen kann. Beispielsweise Schritt für Schritt Anleitungen, Erklärungen oder Definitionen.
+Nachdem du den Auszug geprüft hast:
+- Wähle eine Punktzahl von 0 bis 5, die am besten beschreibt, wie nützlich der Inhalt für den Unterricht von der Grundschule bis zur Universität ist.
+- Begründe kurz deine ausgewählte Punktzahl, bis zu 100 Wörter.
+- Antworte im folgenden Format "<Gesamtpunktzahl>"""},
+        {"role": "user", "content": i["text"]},
+    ]
+    result = pipe(messages, do_sample=False, temperature=0.1, max_new_tokens=1)
+    pred = result[0]["generated_text"][-1]["content"]
+    pred = int(pred)
+    print("Score: ", pred)
+```
+**7.Future Work & Acknowledgements:**
 Continued efforts will focus on labeling additional datasets within this domain which will be made publicly available under our organizational repository enhancing accessibility for further research applications within academic settings or other pedagogical assessments tools development initiatives .
 Special thanks are due David and Daryoush from Vago Solutions; Björn and Jan from Ellamind / DiscoResearch whose insights into dataset reviews ,prompt formulations discussions about final trained model configurations were invaluable throughout this project’s lifecycle .