README / README.md
stefan-it's picture
readme: minor update
6d5a4fc
|
raw
history blame
3.62 kB
metadata
title: README
emoji: 📚
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false

hmBERT

Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:

  • English (British Library Corpus - Books)
  • German (Europeana Newspaper)
  • French (Europeana Newspaper)
  • Finnish (Europeana Newspaper)
  • Swedish (Europeana Newspaper)

More details can be found in our GitHub repository and in our hmBERT paper.

Leaderboard

We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. The following table shows an overview of used datasets:

Language Datasets
English AjMC - TopRes19th
German AjMC - NewsEye - HIPE-2020
French AjMC - ICDAR-Europeana - LeTemps - NewsEye - HIPE-2020
Finnish NewsEye
Swedish NewsEye
Dutch ICDAR-Europeana

Results:

Model English AjMC German AjMC French AjMC German NewsEye French NewsEye Finnish NewsEye Swedish NewsEye Dutch ICDAR French ICDAR French LeTemps English TopRes19th German HIPE-2020 French HIPE-2020 Avg.
hmBERT (32k) Schweter et al. 85.36 ± 0.94 89.08 ± 0.09 85.10 ± 0.60 39.65 ± 1.01 81.47 ± 0.36 77.28 ± 0.37 82.85 ± 0.83 82.11 ± 0.61 77.21 ± 0.16 65.73 ± 0.56 80.94 ± 0.86 79.18 ± 0.38 83.47 ± 0.80 77.65
hmTEAMS 86.41 ± 0.36 88.64 ± 0.42 85.41 ± 0.67 41.51 ± 2.82 83.20 ± 0.79 79.27 ± 1.88 82.78 ± 0.60 88.21 ± 0.39 78.03 ± 0.39 66.71 ± 0.46 81.36 ± 0.59 80.15 ± 0.60 86.07 ± 0.49 79.06

Acknowledgements

We thank Luisa März, Katharina Schmid and Erion Çano for their fruitful discussions about Historical Language Models.

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️