Spaces:

hmbert
/

README

Running

App Files Files Community

stefan-it commited on Oct 3, 2023

Commit

26477c5

1 Parent(s): cbcc293

readme: add initial version

Browse files

Files changed (1) hide show

README.md +51 -4

README.md CHANGED Viewed

@@ -1,10 +1,57 @@
 ---
 title: README
-emoji: 🔥
-colorFrom: gray
-colorTo: pink
 sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 ---
 title: README
+emoji: 📚
+colorFrom: indigo
+colorTo: purple
 sdk: static
 pinned: false
 ---
+# hmBERT
+Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:
+* English (British Library Corpus - Books)
+* German (Europeana Newspaper)
+* French (Europeana Newspaper)
+* Finnish (Europeana Newspaper)
+* Swedish (Europeana Newspaper)
+More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
+[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).
+# Leaderboard
+We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
+The following table shows an overview of used datasets:
+| Language | Datasets                                                         |
+|----------|------------------------------------------------------------------|
+| English  | [AjMC] - [TopRes19th]                                            |
+| German   | [AjMC] - [NewsEye] - [HIPE-2020]                                 |
+| French   | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] |
+| Finnish  | [NewsEye]                                                        |
+| Swedish  | [NewsEye]                                                        |
+| Dutch    | [ICDAR-Europeana]                                                |
+[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
+[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
+[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
+[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
+[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
+[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md
+Results:
+| Model                                                                     | English AjMC | German AjMC  | French AjMC  | German NewsEye | French NewsEye | Finnish NewsEye | Swedish NewsEye | Dutch ICDAR  | French ICDAR | French LeTemps | English TopRes19th | German HIPE-2020 | French HIPE-2020 | Avg.      |
+|---------------------------------------------------------------------------|--------------|--------------|--------------|----------------|----------------|-----------------|-----------------|--------------|--------------|----------------|--------------------|------------------|------------------|-----------|
+| hmBERT (32k) [Schweter et al.](https://ceur-ws.org/Vol-3180/paper-87.pdf) | 85.36 ± 0.94 | 89.08 ± 0.09 | 85.10 ± 0.60 | 39.65 ± 1.01   | 81.47 ± 0.36   | 77.28 ± 0.37    | 82.85 ± 0.83    | 82.11 ± 0.61 | 77.21 ± 0.16 | 65.73 ± 0.56   | 80.94 ± 0.86       | 79.18 ± 0.38     | 83.47 ± 0.80     | 77.65     |
+| [hmTEAMS](https://huggingface.co/hmteams)                                 | 86.41 ± 0.36 | 88.64 ± 0.42 | 85.41 ± 0.67 | 41.51 ± 2.82   | 83.20 ± 0.79   | 79.27 ± 1.88    | 82.78 ± 0.60    | 88.21 ± 0.39 | 78.03 ± 0.39 | 66.71 ± 0.46   | 81.36 ± 0.59       | 80.15 ± 0.60     | 86.07 ± 0.49     | **79.06** |
+# Acknowledgements
+We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
+[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historic Language Models.
+Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
+Many Thanks for providing access to the TPUs ❤️