--- license: cc-by-nc-sa-4.0 --- # Inclusively Classification Model This model is an Italian classification model fine-tuned from the [Italian BERT model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased) for the classification of inclusive language in Italian. It has been trained to detect three classes: - `inclusive`: the sentence is inclusive (e.g. "Il personale docente e non docente") - `not_inclusive`: the sentence is not inclusive (e.g. "I professori") - `not_pertinent`: the sentence is not pertinent to the task (e.g. "La scuola è chiusa") ## Training data The model has been trained on a dataset containing: - 8580 training sentences - 1073 validation sentences - 1072 test sentences The data collection has been manually annotated by experts in the field of inclusive language (dataset is not publicly available yet). ## Training procedure The model has been fine-tuned from the [Italian BERT model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased) using the following hyperparameters: - `max_length`: 128 - `batch_size`: 128 - `learning_rate`: 5e-5 - `warmup_steps`: 500 - `epochs`: 10 (best model is selected based on validation accuracy) - `optimizer`: AdamW ## Evaluation results The model has been evaluated on the test set and obtained the following results: | Model | Accuracy | Inclusive F1 | Not inclusive F1 | Not pertinent F1 | |-------|----------|--------------|------------------|------------------| | TF-IDF + MLP | 0.68 | 0.63 | 0.69 | 0.66 | | TF-IDF + SVM | 0.61 | 0.53 | 0.60 | 0.78 | | TF-IDF + GB | 0.74 | 0.74 | 0.76 | 0.72 | | multilingual | 0.86 | 0.88 | 0.89 | 0.83 | | **This** | 0.89 | 0.88 | 0.92 | 0.85 | The model has been compared with a multilingual model trained on the same data and obtained better results. ## Citation If you use this model, please make sure to cite the following papers: **Main paper**: ```bibtex @article{10.1145/3729237, author = {Greco, Salvatore and La Quatra, Moreno and Cagliero, Luca and Cerquitelli, Tania}, title = {Towards AI-Assisted Inclusive Language Writing in Italian Formal Communications}, year = {2025}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {2157-6904}, url = {https://doi.org/10.1145/3729237}, doi = {10.1145/3729237}, note = {Just Accepted}, journal = {ACM Trans. Intell. Syst. Technol.}, month = apr, } ``` **Demo paper**: ```bibtex @InProceedings{PKDD23_inclusively, author="La Quatra, Moreno and Greco, Salvatore and Cagliero, Luca and Cerquitelli, Tania", title="Inclusively: An AI-Based Assistant for Inclusive Writing", booktitle="Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track", year="2023", publisher="Springer Nature Switzerland", address="Cham", pages="361--365", isbn="978-3-031-43430-3", doi="10.1007/978-3-031-43430-3_31" } ```