Publications

My publications are listed below. You can also download a full pdf.

2025

Occhipinti, D., Guerini, M., & Nissim, M. (2025). When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Main Conference, Vienna, Austria. To Appear.

@inproceedings{occhipinti2025when,
  title = {When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation},
  author = {Occhipinti, Daniela and Guerini, Marco and Nissim, Malvina},
  booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL~2025), Main Conference, Vienna, Austria. To appear.},
  year = {2025}
}

Nabhani, S., Al-Khatib, K., Pianzola, F., & Nissim, M. (2025). Storytelling in Argumentative Discussions: Exploring the Use of Narratives in ChangeMyView. Proceedings of the 12th Workshop on Argument Mining, Co-Located with ACL 2025, Vienna, Austria. To Appear.

@inproceedings{nabhani2025storytelling,
  title = {Storytelling in Argumentative Discussions: Exploring the Use of Narratives in ChangeMyView},
  author = {Nabhani, Sara and Al-Khatib, Khalid and Pianzola, Federico and Nissim, Malvina},
  booktitle = {Proceedings of the 12th Workshop on Argument Mining, co-located with ACL~2025, Vienna, Austria. To appear.},
  year = {2025}
}

Zotos, L., van Rijn, H., & Nissim, M. (2025). Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation. Palermo, Italy.

@inproceedings{zotos2024are,
  title = {Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation},
  author = {Zotos, Leonidas and {van Rijn}, Hedderik and Nissim, Malvina},
  book = {Proceedings of the 18th International Conference on Educational Data Mining (EDM 2025). To appear.},
  address = {Palermo, Italy},
  year = {2025}
}

Zotos, L., van Rijn, H., & Nissim, M. (2025). Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? In O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, & S. Schockaert (Eds.), Proceedings of the 31st International Conference on Computational Linguistics (pp. 11304–11316). Abu Dhabi, UAE: Association for Computational Linguistics.
```
@inproceedings{zotos-etal-2025-model,
  title = {Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?},
  author = {Zotos, Leonidas and van Rijn, Hedderik and Nissim, Malvina},
  editor = {Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
  month = jan,
  year = {2025},
  address = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2025.coling-main.749/},
  pages = {11304--11316}
}
```
Estimating the difficulty of multiple-choice questions would be great help for educators who must spend substantial time creating and piloting stimuli for their tests, and for learners who want to practice. Supervised approaches to difficulty estimation have yielded to date mixed results. In this contribution we leverage an aspect of generative large models which might be seen as a weakness when answering questions, namely their uncertainty. Specifically, we exploit model uncertainty towards exploring correlations between two different metrics of uncertainty, and the actual student response distribution. While we observe some present but weak correlations, we also discover that the models’ behaviour is different in the case of correct vs wrong answers, and that correlations differ substantially according to the different question types which are included in our fine-grained, previously unused dataset of 451 questions from a Biopsychology course. In discussing our findings, we also suggest potential avenues to further leverage model uncertainty as an additional proxy for item difficulty.

2024

Shi, S., Matusevych, Y., & Nissim, M. (2024). Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence. Proceedings of the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning.
```
@inproceedings{shi2024choosybabiesneedcoach,
  title = {Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence},
  author = {Shi, Shaozhen and Matusevych, Yevgen and Nissim, Malvina},
  year = {2024},
  booktitle = {Proceedings of the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning}
}
```
This study presents our submission to the Strict-Small Track of the 2nd BabyLM Challenge. We use a teacher-student distillation setup with the BabyLLaMa model (Timiryasov and Tastet, 2023) as a backbone. To make the student’s learning process more focused, we replace the objective function with a reverse Kullback-Leibler divergence, known to cause mode-seeking (rather than mode-averaging) behaviour in computational learners. We further experiment with having a single teacher (instead of an ensemble of two teachers) and implement additional optimization strategies to improve the distillation process. Our experiments show that under reverse KL divergence, a single-teacher model often outperforms or matches multiple-teacher models across most tasks. Additionally, incorporating advanced optimization techniques further enhances model performance, demonstrating the effectiveness and robustness of our proposed approach. These findings support our idea that "choosy babies need one coach".

Rinaldi, M., Gili, J., Francis, M., Goffetti, M., Patti, V., & Nissim, M. (2024). Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge. Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-It 2024), Pisa, Italy. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{rinaldi2024mult,
  title = {Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge},
  author = {Rinaldi, Matteo and Gili, Jacopo and Francis, Maria and Goffetti, Mattia and Patti, Viviana and Nissim, Malvina},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), Pisa, Italy},
  year = {2024},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/131_calamita_long.pdf}
}

Attanasio, G., Basile, P., Borazio, F., Croce, D., Francis, M., Gili, J., … others. (2024). CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian. Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-It 2024), Pisa, Italy. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{attanasio2024calamita,
  title = {CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian},
  author = {Attanasio, Giuseppe and Basile, Pierpaolo and Borazio, Federico and Croce, Danilo and Francis, Maria and Gili, Jacopo and Musacchio, Elio and Nissim, Malvina and Patti, Viviana and Rinaldi, Matteo and others},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), Pisa, Italy},
  year = {2024},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/116_calamita_preface_long.pdf}
}

Sarti, G., Caselli, T., Bisazza, A., & Nissim, M. (2024). EurekaRebus - Verbalized Rebus Solving with LLMs: A CALAMITA Challenge. Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-It 2024), Pisa, Italy. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{sarti2024eurekarebus,
  title = {EurekaRebus - Verbalized Rebus Solving with LLMs: A CALAMITA Challenge},
  author = {Sarti, Gabriele and Caselli, Tommaso and Bisazza, Arianna and Nissim, Malvina},
  year = {2024},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), Pisa, Italy},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/132_calamita_long.pdf}
}

Francis, M., Rinaldi, M., Gili, J., De Cosmo, L., Iannaccone, S., Nissim, M., & Patti, V. (2024). GATTINA-GenerAtion of TiTles for Italian News Articles: A CALAMITA Challenge. Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-It 2024), Pisa, Italy. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{francis2024gattina,
  title = {GATTINA-GenerAtion of TiTles for Italian News Articles: A CALAMITA Challenge},
  author = {Francis, Maria and Rinaldi, Matteo and Gili, Jacopo and De Cosmo, Leonardo and Iannaccone, Sandro and Nissim, Malvina and Patti, Viviana},
  year = {2024},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), Pisa, Italy},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/121_calamita_long.pdf}
}

Sarti, G., Caselli, T., Nissim, M., & Bisazza, A. (2024). Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses. Proceedings of the 10th Italian Conference on Computational Linguistics. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{sarti2024non,
  title = {Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses},
  author = {Sarti, Gabriele and Caselli, Tommaso and Nissim, Malvina and Bisazza, Arianna},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics},
  year = {2024},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/96_main_long.pdf}
}

Scalena, D., Fersini, E., & Nissim, M. (2024). A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering. Proceedings of the 10th Italian Conference on Computational Linguistics. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{scalena2024gentle,
  title = {A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering},
  author = {Scalena, Daniel and Fersini, Elisabetta and Nissim, Malvina},
  booktitle = {Proceedings of the 10th Italian Conference on Computational Linguistics},
  year = {2024},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)},
  url = {https://ceur-ws.org/Vol-3878/98_main_long.pdf}
}

Sarti, G., Feldhus, N., Qi, J., Nissim, M., & Bisazza, A. (2024). Democratizing Advanced Attribution Analyses of Generative Language Models with the Inseq Toolkit. Joint Proceedings of the 2nd World Conference on EXplainable Artificial Intelligence Late-Breaking Work, Demos and Doctoral Consortium, XAI-2024: LB/D/DC, 289–296. CEUR Workshop Proceedings (CEUR-WS. org).
```
@inproceedings{sarti2024democratizing,
  title = {Democratizing Advanced Attribution Analyses of Generative Language Models with the Inseq Toolkit},
  author = {Sarti, Gabriele and Feldhus, Nils and Qi, Jirui and Nissim, Malvina and Bisazza, Arianna},
  booktitle = {Joint Proceedings of the 2nd World Conference on eXplainable Artificial Intelligence Late-Breaking Work, Demos and Doctoral Consortium, xAI-2024: LB/D/DC},
  pages = {289--296},
  year = {2024},
  url = {https://ceur-ws.org/Vol-3793/paper_37.pdf},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)}
}
```
Inseq is a recent toolkit providing an intuitive and optimized interface to conduct feature attribution analyses of generative language models. In this work, we present the latest improvements to the library, including efforts to simplify the attribution of large language models on consumer hardware, additional attribution approaches, and a new client command to detect and attribute context usage in language model generations. We showcase an online demo using Inseq as an attribution backbone for context reliance analysis, and we highlight interesting contextual patterns in language model generations. Ultimately, this release furthers Inseq’s mission of centralizing good interpretability practices and enabling fair and reproducible model evaluations.
Scalena, D., Sarti, G., & Nissim, M. (2024). Multi-property Steering of Large Language Models with Dynamic Activation Composition. In Y. Belinkov, N. Kim, J. Jumelet, H. Mohebbi, A. Mueller, & H. Chen (Eds.), Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP (pp. 577–603). Miami, Florida, US: Association for Computational Linguistics.
```
@inproceedings{scalena-etal-2024-multi,
  title = {Multi-property Steering of Large Language Models with Dynamic Activation Composition},
  author = {Scalena, Daniel and Sarti, Gabriele and Nissim, Malvina},
  editor = {Belinkov, Yonatan and Kim, Najoung and Jumelet, Jaap and Mohebbi, Hosein and Mueller, Aaron and Chen, Hanjie},
  booktitle = {Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP},
  month = nov,
  year = {2024},
  address = {Miami, Florida, US},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.blackboxnlp-1.34},
  doi = {10.18653/v1/2024.blackboxnlp-1.34},
  pages = {577--603}
}
```
Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models’ intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.
Stopponi, S., Pedrazzini, N., Peels-Matthey, S., McGillivray, B., & Nissim, M. (2024). Natural Language Processing for Ancient Greek. Diachronica, 41, 414–435.
```
@article{stopponi-diachronica-2024,
  author = {Stopponi, Silvia and Pedrazzini, Nilo and Peels-Matthey, Saskia and McGillivray, Barbara and Nissim, Malvina},
  title = {Natural Language Processing for Ancient Greek},
  journal = {Diachronica},
  volume = {41},
  number = {3},
  pages = {414--435},
  issn = {0176-4225},
  year = {2024},
  publisher = {John Benjamins},
  url = {https://www.jbe-platform.com/content/journals/10.1075/dia.23013.sto},
  keywords = {language models}
}
```
Computational methods have produced meaningful and usable results to study word semantics, including semantic change. These methods, belonging to the field of Natural Language Processing, have recently been applied to ancient languages; in particular, language modelling has been applied to Ancient Greek, the language on which we focus. In this contribution we explain how vector representations can be computed from word co-occurrences in a corpus and can be used to locate words in a semantic space, and what kind of semantic information can be extracted from language models. We compare three different kinds of language models that can be used to study Ancient Greek semantics: a count-based model, a word embedding model and a syntactic embedding model; and we show examples of how the quality of their representations can be assessed. We highlight the advantages and potential of these methods, especially for the study of semantic change, together with their limitations.
Stopponi, S., Peels-Matthey, S., & Nissim, M. (2024). Viability of Automatic Lexical Semantic Change Detection on a Diachronic Corpus of Literary Ancient Greek. In C. Swaelens, M. Deforche, I. De Vos, & E. Lefever (Eds.), The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024) (pp. 47–57). Ghent University.
```
@inproceedings{stopponi-ghent-2024,
  title = {Viability of Automatic Lexical Semantic Change Detection on a Diachronic Corpus of Literary Ancient Greek},
  keywords = {semantic change detection, Ancient Greek, language modelling, ancient language, word embeddings, word2vec},
  author = {Stopponi, Silvia and Peels-Matthey, Saskia and Nissim, Malvina},
  year = {2024},
  month = jun,
  day = {27},
  language = {English},
  isbn = {9789078848127},
  pages = {47--57},
  editor = {Swaelens, Colin and Deforche, Maxime and {De Vos}, Ilse and Lefever, Els},
  booktitle = {The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024)},
  publisher = {Ghent University},
  note = {The First Workshop on Data-driven Approaches to Ancient Languages (DAAL 2024)
, DAAL 2024 ; Conference date: 27-06-2024 Through 27-06-2024},
  url = {https://www.dbbe2024.ugent.be/workshop/}
}
```
We apply two measures of lexical semantic change detection to Word2Vec embeddings trained on a diachronic corpus of literary Ancient Greek texts. The two measures are the Vector Coherence, based on the comparison between vectors of the same word in different time periods, and the J, based on the Jaccard coefficient, which quantifies the overlap between the k nearest neighbours in each possible combination of time slices. Through the analysis of the most stable and unstable words detected with both measures, we show that the two measures are effective at finding non-changed words, while Vector Coherence seems to be more reliable than J at detecting changed words. Still, low J could indicate a real semantic change when the same word also has a low Vector Coherence. For both measures, the detection of changed words is hampered by the presence of lemmatization errors in the training corpus.
Stopponi, S., den Ouden, M., Peels-Matthey, S., & Nissim, M. (2024). AGALMA, the Ancient Greek Accessible Language Models for linguistic Analysis.
```
@misc{stopponi-AGALMA-2024,
  title = {AGALMA, the Ancient Greek Accessible Language Models for linguistic Analysis},
  keywords = {language models, ancient Greek, interface},
  author = {Stopponi, Silvia and {den Ouden}, Mark and Peels-Matthey, Saskia and Nissim, Malvina},
  year = {2024},
  language = {English}
}
```
The aim of the AGALMA web interface is to make language models trained on Ancient Greek available to all interested people, respectless of their coding skills.It allowes users to extract the nearest neighbours to lemmas of interest, to calculate the cosine similarity between two lemmas, to create a 3D graphic representation of a semantic space, and to access the Liddell-Scott-Jones dictionary.A FAQ section and extra documentation are also provided.
Lai, H., & Nissim, M. (2024). mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12012–12026). Bangkok, Thailand: Association for Computational Linguistics.
```
@inproceedings{lai-nissim-2024-mcot,
  title = {m{C}o{T}: Multilingual Instruction Tuning for Reasoning Consistency in Language Models},
  author = {Lai, Huiyuan and Nissim, Malvina},
  editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
  booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month = aug,
  year = {2024},
  address = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.acl-long.649},
  pages = {12012--12026}
}
```
Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve various downstream tasks. As most research mainly focuses on English, with few explorations in a multilingual context, the question of how reliable this reasoning capability is in different languages is still open. To address it directly, we study multilingual reasoning consistency across multiple languages, using popular open-source LLMs. First, we compile the first large-scale multilingual math reasoning dataset, *mCoT-MATH*, covering eleven diverse languages. Then, we introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency. While existing LLMs show substantial variation across the languages we consider, and especially low performance for lesser resourced languages, our 7B parameter model *mCoT* achieves impressive consistency across languages, and superior or comparable performance to close- and open-source models even of much larger sizes.
Occhipinti, D., Marchi, M., Mondella, I., Lai, H., Dell’Orletta, F., Nissim, M., & Guerini, M. (2024). Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Findings of the Association for Computational Linguistics ACL 2024 (pp. 11892–11907). Bangkok, Thailand and virtual meeting: Association for Computational Linguistics.
```
@inproceedings{occhipinti-etal-2024-fine,
  title = {Fine-tuning with {HED}-{IT}: The impact of human post-editing for dialogical language models},
  author = {Occhipinti, Daniela and Marchi, Michele and Mondella, Irene and Lai, Huiyuan and Dell{'}Orletta, Felice and Nissim, Malvina and Guerini, Marco},
  editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
  booktitle = {Findings of the Association for Computational Linguistics ACL 2024},
  month = aug,
  year = {2024},
  address = {Bangkok, Thailand and virtual meeting},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.findings-acl.707},
  pages = {11892--11907}
}
```
Automatic methods for generating and gathering linguistic data have proven effective for fine-tuning Language Models (LMs) in languages less resourced than English. Still, while there has been emphasis on data quantity, less attention has been given to its quality. In this work, we investigate the impact of human intervention on machine-generated data when fine-tuning dialogical models. In particular, we study (1) whether post-edited dialogues exhibit higher perceived quality compared to the originals that were automatically generated; (2) whether fine-tuning with post-edited dialogues results in noticeable differences in the generated outputs; and (3) whether post-edited dialogues influence the outcomes when considering the parameter size of the LMs. To this end we created HED-IT, a large-scale dataset where machine-generated dialogues are paired with the version post-edited by humans. Using both the edited and unedited portions of HED-IT, we fine-tuned three different sizes of an LM. Results from both human and automatic evaluation show that the different quality of training data is clearly perceived and it has an impact also on the models trained on such data. Additionally, our findings indicate that larger models are less sensitive to data quality, whereas this has a crucial impact on smaller models. These results enhance our comprehension of the impact of human intervention on training data in the development of high-quality LMs.
Li, Y., Lai, H., Toral, A., & Nissim, M. (2024). ReproHum #0033-3: Comparable Relative Results with Lower Absolute Values in a Reproduction Study. In S. Balloccu, A. Belz, R. Huidrom, E. Reiter, J. Sedoc, & C. Thomson (Eds.), Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024 (pp. 238–249). Torino, Italia: ELRA and ICCL.
```
@inproceedings{li-etal-2024-reprohum,
  title = {{R}epro{H}um {\#}0033-3: Comparable Relative Results with Lower Absolute Values in a Reproduction Study},
  author = {Li, Yiru and Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Balloccu, Simone and Belz, Anya and Huidrom, Rudali and Reiter, Ehud and Sedoc, Joao and Thomson, Craig},
  booktitle = {Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024},
  month = may,
  year = {2024},
  address = {Torino, Italia},
  publisher = {ELRA and ICCL},
  url = {https://aclanthology.org/2024.humeval-1.21},
  pages = {238--249}
}
```
In the context of the ReproHum project aimed at assessing the reliability of human evaluation, we replicated the human evaluation conducted in “Generating Scientific Definitions with Controllable Complexity” by August et al. (2022). Specifically, humans were asked to assess the fluency of automatically generated scientific definitions by three different models, with output complexity varying according to target audience. Evaluation conditions were kept as close as possible to the original study, except of necessary and minor adjustments. Our results, despite yielding lower absolute performance, show that relative performance across the three tested systems remains comparable to what was observed in the original paper. On the basis of lower inter-annotator agreement and feedback received from annotators in our experiment, we also observe that the ambiguity of the concept being evaluated may play a substantial role in human assessment.
Mondella, I., Lai, H., & Nissim, M. (2024). ReproHum #0892-01: The painful route to consistent results: A reproduction study of human evaluation in NLG. In S. Balloccu, A. Belz, R. Huidrom, E. Reiter, J. Sedoc, & C. Thomson (Eds.), Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024 (pp. 261–268). Torino, Italia: ELRA and ICCL.
```
@inproceedings{mondella-etal-2024-reprohum,
  title = {{R}epro{H}um {\#}0892-01: The painful route to consistent results: A reproduction study of human evaluation in {NLG}},
  author = {Mondella, Irene and Lai, Huiyuan and Nissim, Malvina},
  editor = {Balloccu, Simone and Belz, Anya and Huidrom, Rudali and Reiter, Ehud and Sedoc, Joao and Thomson, Craig},
  booktitle = {Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024},
  month = may,
  year = {2024},
  address = {Torino, Italia},
  publisher = {ELRA and ICCL},
  url = {https://aclanthology.org/2024.humeval-1.24},
  pages = {261--268}
}
```
In spite of the core role human judgement plays in evaluating the performance of NLP systems, the way human assessments are elicited in NLP experiments, and to some extent the nature of human judgement itself, pose challenges to the reliability and validity of human evaluation. In the context of the larger ReproHum project, aimed at running large scale multi-lab reproductions of human judgement, we replicated the understandability assessment by humans on several generated outputs of simplified text described in the paper “Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table” by Shardlow and Nawaz, appeared in the Proceedings of ACL 2019. Although we had to implement a series of modifications compared to the original study, which were necessary to run our human evaluation on exactly the same data, we managed to collect assessments and compare results with the original study. We obtained results consistent with those of the reference study, confirming their findings. The paper is complete with as much information as possible to foster and facilitate future reproduction.
Sarti, G., & Nissim, M. (2024). IT5: Text-to-text Pretraining for Italian Language Understanding and Generation. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 9422–9433). Torino, Italia: ELRA and ICCL.
```
@inproceedings{sarti-nissim-2024-it5-text,
  title = {{IT}5: Text-to-text Pretraining for {I}talian Language Understanding and Generation},
  author = {Sarti, Gabriele and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
  booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  month = may,
  year = {2024},
  address = {Torino, Italia},
  publisher = {ELRA and ICCL},
  url = {https://aclanthology.org/2024.lrec-main.823},
  pages = {9422--9433}
}
```
We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the performance of IT5 models and multilingual baselines. We find monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for Italian language generation.
Sarti, G., Chrupała, G., Nissim, M., & Bisazza, A. (2024). Quantifying the Plausibility of Context Reliance in Neural Machine Translation. The Twelfth International Conference on Learning Representations (ICLR 2024). Vienna, Austria: OpenReview.
```
@inproceedings{sarti-etal-2023-quantifying,
  title = {Quantifying the Plausibility of Context Reliance in Neural Machine Translation},
  author = {Sarti, Gabriele and Chrupa{\l}a, Grzegorz and Nissim, Malvina and Bisazza, Arianna},
  booktitle = {The Twelfth International Conference on Learning Representations (ICLR 2024)},
  month = may,
  year = {2024},
  address = {Vienna, Austria},
  publisher = {OpenReview},
  url = {https://openreview.net/forum?id=XTHfNGI3zT}
}
```
Establishing whether language models can use contextual information in a human-plausible way is important to ensure their safe adoption in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, and current plausibility evaluations are practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models’ generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use PECoRe to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated generations to identify context-mediated predictions and highlight instances of (im)plausible context usage in model translations.
Stopponi, S., Peels-Matthey, S., & Nissim, M. (2024). AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek. Digital Scholarship in the Humanities, 39, 373–392.
```
@article{stopponi2024agree,
  author = {Stopponi, Silvia and Peels-Matthey, Saskia and Nissim, Malvina},
  title = {{AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek}},
  journal = {Digital Scholarship in the Humanities},
  volume = {39},
  number = {1},
  pages = {373-392},
  year = {2024},
  month = jan,
  issn = {2055-7671},
  doi = {10.1093/llc/fqad087},
  url = {https://doi.org/10.1093/llc/fqad087},
  eprint = {https://academic.oup.com/dsh/article-pdf/39/1/373/57134494/fqad087.pdf}
}
```
The last years have seen the application of Natural Language Processing, in particular, language models, to the study of the Semantics of ancient Greek, but only a little work has been done to create gold data for the evaluation of such models. In this contribution we introduce AGREE, the first benchmark for intrinsic evaluation of semantic models of ancient Greek created from expert judgements. In the absence of native speakers, eliciting expert judgements to create a gold standard is a way to leverage a competence that is the closest to that of natives. Moreover, this method allows for collecting data in a uniform way and giving precise instructions to participants. Human judgements about word relatedness were collected via two questionnaires: in the first, experts provided related lemmas to some proposed seeds, while in the second, they assigned relatedness judgements to pairs of lemmas. AGREE was built from a selection of the collected data.
Sivak, E., Pankowska, P., Mendrik, A., Emery, T., Garcia-Bernardo, J., Höcük, S., … Stulp, G. (2024). Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer). Journal of Computational Social Science, 1–29.
```
@article{sivak2024combining,
  title = {Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)},
  author = {Sivak, Elizaveta and Pankowska, Paulina and Mendrik, Adri{\"e}nne and Emery, Tom and Garcia-Bernardo, Javier and H{\"o}c{\"u}k, Seyit and Karpinska, Kasia and Maineri, Angelica and Mulder, Joris and Nissim, Malvina and Stulp, Gert},
  journal = {Journal of Computational Social Science},
  pages = {1--29},
  year = {2024},
  publisher = {Springer Nature Singapore},
  url = {https://link.springer.com/article/10.1007/s42001-024-00275-6}
}
```
The social sciences have produced an impressive body of research on determinants of fertility outcomes, or whether and when people have children. However, the strength of these determinants and underlying theories are rarely evaluated on their predictive ability on new data. This prevents us from systematically comparing studies, hindering the evaluation and accumulation of knowledge. In this paper, we present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands. One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics, including individual preferences and values. The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents. We provide information about the datasets and the samples, and describe the fertility outcome of interest. We also introduce the fertility prediction data challenge PreFer which is based on these datasets and will start in Spring 2024. We outline the ways in which measuring the predictability of fertility outcomes using these datasets and combining their strengths in the data challenge can advance our understanding of fertility behaviour and computational social science. We further provide details for participants on how to take part in the data challenge.
Eikelboom, S., Esteve-Del-Valle, M., & Nissim, M. (2024). Learning from climate change news: Is the world on the same page? Plos One, 19, e0297644.
```
@article{eikelboom2024learning,
  title = {Learning from climate change news: Is the world on the same page?},
  author = {Eikelboom, Stijn and Esteve-Del-Valle, Marc and Nissim, Malvina},
  journal = {Plos one},
  volume = {19},
  number = {3},
  pages = {e0297644},
  year = {2024},
  publisher = {Public Library of Science San Francisco, CA USA},
  url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0297644}
}
```
Climate change challenges countries around the world, and news media are key to the public’s awareness and perception of it. But how are news media approaching climate change across countries? With the problem of climate change and its solution being global, it is key to determine whether differences in climate change news reports exist and what they are across countries. This study employs supervised machine learning to uncover topical and terminological differences between newspaper articles on climate change. An original dataset of climate change articles is presented, originating from 7 newspapers and 3 countries across the world, and published in English during 26 Conference of the Parties (COP) meetings from the United Nations Framework Convention on Climate Change (UNFCC). Three aspects are used to discriminate between articles, being (1) countries, (2) political orientations, and (3) COP meetings. Our results reveal differences with regard to how newspaper articles approach climate change globally. Specifically, climate change-related terminology of left-oriented newspapers is more prevalent compared to their right-oriented counterparts. Also, over the years, newspapers’ climate change-related terminology has evolved to convey a greater sense of urgency.
Lai, H., & Nissim, M. (2024). A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models. ACM Computing Surveys.
```
@article{lai2024survey,
  title = {A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models},
  author = {Lai, Huiyuan and Nissim, Malvina},
  journal = {ACM Computing Surveys},
  year = {2024},
  publisher = {ACM New York, NY},
  url = {https://dl.acm.org/doi/10.1145/3654795}
}
```
Figurative language generation (FLG) is the task of reformulating a given text to include a desired figure of speech, such as a hyperbole, a simile, and several others, while still being faithful to the original context. This is a fundamental, yet challenging task in Natural Language Processing (NLP), which has recently received increased attention due to the promising performance brought by pre-trained language models. Our survey provides a systematic overview of the development of FLG, mostly in English, starting with the description of some common figures of speech, their corresponding generation tasks and datasets. We then focus on various modelling approaches and assessment strategies, leading us to discussing some challenges in this field, and suggesting some potential directions for future research. To the best of our knowledge, this is the first survey that summarizes the progress of FLG including the most recent development in NLP. We also organize corresponding resources, e.g., paper lists and datasets, and make them accessible in an open repository. We hope this survey can help researchers in NLP and related fields to easily track the academic frontier, providing them with a landscape and a roadmap of this area.

2023

Mollanorozy, S., Tanti, M., & Nissim, M. (2023). Cross-lingual Transfer Learning with Persian. In L. Beinborn, K. Goswami, S. Muradoğlu, A. Sorokin, R. Kumar, A. Shcherbakov, … E. Vylomova (Eds.), Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (pp. 89–95). Dubrovnik, Croatia: Association for Computational Linguistics.
```
@inproceedings{mollanorozy-etal-2023-cross,
  title = {Cross-lingual Transfer Learning with {P}ersian},
  author = {Mollanorozy, Sepideh and Tanti, Marc and Nissim, Malvina},
  editor = {Beinborn, Lisa and Goswami, Koustava and Murado{\u{g}}lu, Saliha and Sorokin, Alexey and Kumar, Ritesh and Shcherbakov, Andreas and Ponti, Edoardo M. and Cotterell, Ryan and Vylomova, Ekaterina},
  booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
  month = may,
  year = {2023},
  address = {Dubrovnik, Croatia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.sigtyp-1.9},
  doi = {10.18653/v1/2023.sigtyp-1.9},
  pages = {89--95}
}
```
The success of cross-lingual transfer learning for POS tagging has been shown to be strongly dependent, among other factors, on the (typological and/or genetic) similarity of the low-resource language used for testing and the language(s) used in pre-training or to fine-tune the model. We further unpack this finding in two directions by zooming in on a single language, namely Persian. First, still focusing on POS tagging we run an in-depth analysis of the behaviour of Persian with respect to closely related languages and languages that appear to benefit from cross-lingual transfer with Persian. To do so, we also use the World Atlas of Language Structures to determine which properties are shared between Persian and other languages included in the experiments. Based on our results, Persian seems to be a reasonable potential language for Kurmanji and Tagalog low-resource languages for other tasks as well. Second, we test whether previous findings also hold on a task other than POS tagging to pull apart the benefit of language similarity and the specific task for which such benefit has been shown to hold. We gather sentiment analysis datasets for 31 target languages and through a series of cross-lingual experiments analyse which languages most benefit from Persian as the source. The set of languages that benefit from Persian had very little overlap across the two tasks, suggesting a strong task-dependent component in the usefulness of language similarity in cross-lingual transfer.
Lai, H., Toral, A., & Nissim, M. (2023). Multidimensional evaluation for text style transfer using ChatGPT. ArXiv Preprint ArXiv:2304.13462.
```
@article{lai2023multidimensional,
  title = {Multidimensional evaluation for text style transfer using {ChatGPT}},
  author = {Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  journal = {arXiv preprint arXiv:2304.13462},
  url = {https://arxiv.org/abs/2304.13462},
  year = {2023}
}
```
We investigate the potential of ChatGPT as a multidimensional evaluator for the task of Text Style Transfer, alongside, and in comparison to, existing automatic metrics as well as human judgements. We focus on a zero-shot setting, i.e. prompting ChatGPT with specific task instructions, and test its performance on three commonly-used dimensions of text style transfer evaluation: style strength, content preservation, and fluency. We perform a comprehensive correlation analysis for two transfer directions (and overall) at different levels. Compared to existing automatic metrics, ChatGPT achieves competitive correlations with human judgments. These preliminary results are expected to provide a first glimpse into the role of large language models in the multidimensional evaluation of stylized text generation.

Belz, A., Thomson, C., Reiter, E., Abercrombie, G., Alonso-Moral, J. M., Arvan, M., … Yang, D. (2023). Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP. In S. Tafreshi, A. Akula, J. Sedoc, A. Drozd, A. Rogers, & A. Rumshisky (Eds.), Proceedings of the Fourth Workshop on Insights from Negative Results in NLP (pp. 1–10). Dubrovnik, Croatia: Association for Computational Linguistics.

@inproceedings{belz-etal-2023-missing,
  title = {Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in {NLP}},
  author = {Belz, Anya and Thomson, Craig and Reiter, Ehud and Abercrombie, Gavin and Alonso-Moral, Jose M. and Arvan, Mohammad and Braggaar, Anouck and Cieliebak, Mark and Clark, Elizabeth and van Deemter, Kees and Dinkar, Tanvi and Du{\v{s}}ek, Ond{\v{r}}ej and Eger, Steffen and Fang, Qixiang and Gao, Mingqi and Gatt, Albert and Gkatzia, Dimitra and Gonz{\'a}lez-Corbelle, Javier and Hovy, Dirk and H{\"u}rlimann, Manuela and Ito, Takumi and Kelleher, John D. and Klubicka, Filip and Krahmer, Emiel and Lai, Huiyuan and van der Lee, Chris and Li, Yiru and Mahamood, Saad and Mieskes, Margot and van Miltenburg, Emiel and Mosteiro, Pablo and Nissim, Malvina and Parde, Natalie and Pl{\'a}tek, Ond{\v{r}}ej and Rieser, Verena and Ruan, Jie and Tetreault, Joel and Toral, Antonio and Wan, Xiaojun and Wanner, Leo and Watson, Lewis and Yang, Diyi},
  editor = {Tafreshi, Shabnam and Akula, Arjun and Sedoc, Jo{\~a}o and Drozd, Aleksandr and Rogers, Anna and Rumshisky, Anna},
  booktitle = {Proceedings of the Fourth Workshop on Insights from Negative Results in NLP},
  month = may,
  year = {2023},
  address = {Dubrovnik, Croatia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.insights-1.1},
  doi = {10.18653/v1/2023.insights-1.1},
  pages = {1--10}
}

Minnema, G., Lai, H., Muscato, B., & Nissim, M. (2023). Responsibility Perspective Transfer for Italian Femicide News. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 7907–7918). Toronto, Canada: Association for Computational Linguistics.
```
@inproceedings{minnema-etal-2023-responsibility,
  title = {Responsibility Perspective Transfer for {I}talian Femicide News},
  author = {Minnema, Gosse and Lai, Huiyuan and Muscato, Benedetta and Nissim, Malvina},
  editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  month = jul,
  year = {2023},
  address = {Toronto, Canada},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.findings-acl.501},
  doi = {10.18653/v1/2023.findings-acl.501},
  pages = {7907--7918}
}
```
Different ways of linguistically expressing the same real-world event can lead to different perceptions of what happened. Previous work has shown that different descriptions of gender-based violence (GBV) influence the reader’s perception of who is to blame for the violence, possibly reinforcing stereotypes which see the victim as partly responsible, too. As a contribution to raise awareness on perspective-based writing, and to facilitate access to alternative perspectives, we introduce the novel task of automatically rewriting GBV descriptions as a means to alter the perceived level of blame on the perpetrator. We present a quasi-parallel dataset of sentences with low and high perceived responsibility levels for the perpetrator, and experiment with unsupervised (mBART-based), zero-shot and few-shot (GPT3-based) methods for rewriting sentences. We evaluate our models using a questionnaire study and a suite of automatic metrics.
Lai, H., Toral, A., & Nissim, M. (2023). Multilingual Multi-Figurative Language Detection. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 9254–9267). Toronto, Canada: Association for Computational Linguistics.
```
@inproceedings{lai-etal-2023-multilingual,
  title = {Multilingual Multi-Figurative Language Detection},
  author = {Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  month = jul,
  year = {2023},
  address = {Toronto, Canada},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.findings-acl.589},
  doi = {10.18653/v1/2023.findings-acl.589},
  pages = {9254--9267}
}
```
Figures of speech help people express abstract concepts and evoke stronger emotions than literal expressions, thereby making texts more creative and engaging. Due to its pervasive and fundamental character, figurative language understanding has been addressed in Natural Language Processing, but it’s highly understudied in a multilingual setting and when considering more than one figure of speech at the same time. To bridge this gap, we introduce multilingual multi-figurative language modelling, and provide a benchmark for sentence-level figurative language detection, covering three common figures of speech and seven languages. Specifically, we develop a framework for figurative language detection based on template-based prompt learning. In so doing, we unify multiple detection tasks that are interrelated across multiple figures of speech and languages, without requiring task- or language-specific modules. Experimental results show that our framework outperforms several strong baselines and may serve as a blueprint for the joint modelling of other interrelated tasks.
Wang, C., Lai, H., Nissim, M., & Bos, J. (2023). Pre-Trained Language-Meaning Models for Multilingual Parsing and Generation. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 5586–5600). Toronto, Canada: Association for Computational Linguistics.
```
@inproceedings{wang-etal-2023-pre,
  title = {Pre-Trained Language-Meaning Models for Multilingual Parsing and Generation},
  author = {Wang, Chunliu and Lai, Huiyuan and Nissim, Malvina and Bos, Johan},
  editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  month = jul,
  year = {2023},
  address = {Toronto, Canada},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.findings-acl.345},
  doi = {10.18653/v1/2023.findings-acl.345},
  pages = {5586--5600}
}
```
Pre-trained language models (PLMs) have achieved great success in NLP and have recently been used for tasks in computational semantics. However, these tasks do not fully benefit from PLMs since meaning representations are not explicitly included. We introduce multilingual pre-trained language-meaning models based on Discourse Representation Structures (DRSs), including meaning representations besides natural language texts in the same model, and design a new strategy to reduce the gap between the pre-training and fine-tuning objectives. Since DRSs are language neutral, cross-lingual transfer learning is adopted to further improve the performance of non-English tasks. Automatic evaluation results show that our approach achieves the best performance on both the multilingual DRS parsing and DRS-to-text generation tasks. Correlation analysis between automatic metrics and human judgements on the generation task further validates the effectiveness of our model. Human inspection reveals that out-of-vocabulary tokens are the main cause of erroneous results.
Bacco, L., Dell’Orletta, F., Lai, H., Merone, M., & Nissim, M. (2023). A text style transfer system for reducing the physician–patient expertise gap: An analysis with automatic and human evaluations. Expert Systems with Applications, 233, 120874.
```
@article{bacco2023text,
  title = {A text style transfer system for reducing the physician--patient expertise gap: An analysis with automatic and human evaluations},
  author = {Bacco, Luca and Dell’Orletta, Felice and Lai, Huiyuan and Merone, Mario and Nissim, Malvina},
  journal = {Expert Systems with Applications},
  volume = {233},
  pages = {120874},
  year = {2023},
  publisher = {Pergamon},
  issn = {0957-4174},
  doi = {https://doi.org/10.1016/j.eswa.2023.120874},
  url = {https://www.sciencedirect.com/science/article/pii/S0957417423013763}
}
```
Physicians and patients often come from different backgrounds and have varying levels of education, which can result in communication difficulties in the healthcare process. To address this expertise gap, we present a “Text Style Transfer” system. Our system uses Semantic Textual Similarity techniques based on Sentence Transformers models to create pseudo-parallel datasets from a large, non-parallel corpus of lay and expert texts. This approach allowed us to train a denoising autoencoder model (BART), overcoming the limitations of previous systems. Our extensive analysis, which includes both automatic metrics and human evaluations from both lay (patients) and expert (physicians) individuals, shows that our system outperforms state-of-the-art models and is comparable to human-provided gold references in some cases.
Caselli, T., Lieto, A., Nissim, M., & Patti, V. (2023). Sono solo parole. ChatGPT: anatomia e raccomandazioni per l’uso. Sistemi Intelligenti, 35, 307–320.
```
@article{caselli2023sono,
  title = {Sono solo parole. ChatGPT: anatomia e raccomandazioni per l’uso},
  author = {Caselli, Tommaso and Lieto, Antonio and Nissim, Malvina and Patti, Viviana},
  journal = {Sistemi intelligenti},
  volume = {35},
  number = {2},
  pages = {307--320},
  year = {2023},
  publisher = {Societ{\`a} editrice il Mulino},
  issn = {1120-9550},
  doi = {10.1422/108131},
  url = {https://www.rivisteweb.it/doi/10.1422/108131},
  keywords = {ChatGPT, natural language processing, language generation, large language models, generalized pretrained transformers, fair artificial intelligence}
}
```
ChatGPT has revolutionised the way people view and interact with language-based artificial agents. But is it a real revolution? And are people using ChatGPT with appropriate knowledge of its inner workings, its abilities, and potential risks? We think ChatGPT is very much in need of some proper contextualisation. In this short contribution we show how ChatGPT has come to life, both historically and technically, describing in detail the anatomy of large language models, and on the basis of this we clarify what ChatGPT can (be expected to) do, and what it cannot. We also discuss its limitations, specifically related to its intrinsic inability to be factual in what it generates, its reflection of societal biases, and the broader ethical implications of its use.

Scalena, D., Sarti, G., Nissim, M., & Fersini, E. (2023). Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence. Proceedings of BlackBox NLP at EMNLP 2023.

@inproceedings{scalena2023let,
  title = {Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence},
  author = {Scalena, Daniel and Sarti, Gabriele and Nissim, Malvina and Fersini, Elisabetta},
  booktitle = {Proceedings of BlackBox NLP at EMNLP 2023},
  year = {2023}
}

Li, Y., Lai, H., Toral, A., & Nissim, M. (2023). Same Trends, Different Answers: Insights from a Replication Study of Human Plausibility Judgments on Narrative Continuations. In A. Belz, M. Popović, E. Reiter, C. Thomson, & J. Sedoc (Eds.), Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems (pp. 190–203). Varna, Bulgaria: INCOMA Ltd., Shoumen, Bulgaria.
```
@inproceedings{li-etal-2023-trends,
  title = {Same Trends, Different Answers: Insights from a Replication Study of Human Plausibility Judgments on Narrative Continuations},
  author = {Li, Yiru and Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Belz, Anya and Popovi{\'c}, Maja and Reiter, Ehud and Thomson, Craig and Sedoc, Jo{\~a}o},
  booktitle = {Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems},
  month = sep,
  year = {2023},
  address = {Varna, Bulgaria},
  publisher = {INCOMA Ltd., Shoumen, Bulgaria},
  url = {https://aclanthology.org/2023.humeval-1.15},
  pages = {190--203}
}
```
We reproduced the human-based evaluation of the continuation of narratives task presented by Chakrabarty et al. (2022). This experiment is performed as part of the ReproNLP Shared Task on Reproducibility of Evaluations in NLP (Track C). Our main goal is to reproduce the original study under conditions as similar as possible. Specifically, we follow the original experimental design and perform human evaluations of the data from the original study, while describing the differences between the two studies. We then present the results of these two studies together with an analysis of similarities between them. Inter-annotator agreement (Krippendorff’s alpha) in the reproduction study is lower than in the original study, while the human evaluation results of both studies have the same trends, that is, our results support the findings in the original study.
Stopponi, S., Pedrazzini, N., Peels, S., McGillivray, B., & Nissim, M. (2023). Evaluation of Distributional Semantic Models of Ancient Greek: Preliminary Results and a Road Map for Future Work. In A. Anderson, S. Gordin, B. Li, Y. Liu, & M. C. Passarotti (Eds.), Proceedings of the Ancient Language Processing Workshop (pp. 49–58). Varna, Bulgaria: INCOMA Ltd., Shoumen, Bulgaria.
```
@inproceedings{stopponi-etal-2023-evaluation,
  title = {Evaluation of Distributional Semantic Models of {A}ncient {G}reek: Preliminary Results and a Road Map for Future Work},
  author = {Stopponi, Silvia and Pedrazzini, Nilo and Peels, Saskia and McGillivray, Barbara and Nissim, Malvina},
  editor = {Anderson, Adam and Gordin, Shai and Li, Bin and Liu, Yudong and Passarotti, Marco C.},
  booktitle = {Proceedings of the Ancient Language Processing Workshop},
  month = sep,
  year = {2023},
  address = {Varna, Bulgaria},
  publisher = {INCOMA Ltd., Shoumen, Bulgaria},
  url = {https://aclanthology.org/2023.alp-1.6},
  pages = {49--58}
}
```
We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a larger-scale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both human- and machine-generated data) and different evaluation metrics.

Caselli, T., Lieto, A., Nissim, M., & Patti, V. (2023). They are just words. ChatGPT: Anatomy and recommendations for use. Sistemi Intelligenti, 35, 307–320.

@article{caselli2023they,
  title = {They are just words. ChatGPT: Anatomy and recommendations for use},
  author = {Caselli, Tommaso and Lieto, Antonio and Nissim, Malvina and Patti, Viviana},
  journal = {Sistemi Intelligenti},
  volume = {35},
  number = {2},
  pages = {307--320},
  year = {2023},
  publisher = {Il Mulino}
}

Bacco, L., Minnema, G., Caselli, T., Dell’Orletta, F., Merone, M., & Nissim, M. (2023). On the instability of further pre-training: Does a single sentence matter to BERT? Natural Language Processing Journal, 5, 100037.
```
@article{BACCO2023100037,
  title = {On the instability of further pre-training: Does a single sentence matter to BERT?},
  journal = {Natural Language Processing Journal},
  volume = {5},
  pages = {100037},
  year = {2023},
  issn = {2949-7191},
  doi = {https://doi.org/10.1016/j.nlp.2023.100037},
  url = {https://www.sciencedirect.com/science/article/pii/S2949719123000341},
  author = {Bacco, Luca and Minnema, Gosse and Caselli, Tommaso and Dell’Orletta, Felice and Merone, Mario and Nissim, Malvina},
  keywords = {Natural language processing, Large language models, Further pre-training, Transformers, Instability, BERT}
}
```
We observe a remarkable instability in BERT-like models: minimal changes in the internal representations of BERT, as induced by one-step further pre-training with even a single sentence, can noticeably change the behaviour of subsequently fine-tuned models. While the pre-trained models seem to be essentially the same, also by means of established similarity assessment techniques, the measurable tiny changes appear to substantially impact the models’ tuning path, leading to significantly different fine-tuned systems and affecting downstream performance. After testing a very large number of combinations, which we briefly summarize, the experiments reported in this short paper focus on an intermediate phase consisting of a single-step and single-sentence masked language modeling stage and its impact on a sentiment analysis task. We discuss a series of unexpected findings which leave some open questions over the nature and stability of further pre-training.
de Vries, W., Wieling, M., & Nissim, M. (2023). DUMB: A Benchmark for Smart Evaluation of Dutch Models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7221–7241). Singapore: Association for Computational Linguistics.
```
@inproceedings{de-vries-etal-2023-dumb,
  title = {{DUMB}: A Benchmark for Smart Evaluation of {D}utch Models},
  author = {de Vries, Wietse and Wieling, Martijn and Nissim, Malvina},
  editor = {Bouamor, Houda and Pino, Juan and Bali, Kalika},
  booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  month = dec,
  year = {2023},
  address = {Singapore},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.emnlp-main.447},
  doi = {10.18653/v1/2023.emnlp-main.447},
  pages = {7221--7241}
}
```
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of nine tasks includes four tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of language models to a strong baseline which can be referred to in the future even when assessing different sets of language models. Through a comparison of 14 pre-trained language models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In addition to highlighting best strategies for training larger Dutch models, DUMB will foster further research on Dutch. A public leaderboard is available at https://dumbench.nl.
Sarti, G., Feldhus, N., Sickert, L., Van Der Wal, O., Nissim, M., & Bisazza, A. (2023). Inseq: An interpretability toolkit for sequence generation models. ArXiv Preprint ArXiv:2302.13942.
```
@article{sarti2023inseq,
  title = {Inseq: An interpretability toolkit for sequence generation models},
  author = {Sarti, Gabriele and Feldhus, Nils and Sickert, Ludwig and Van Der Wal, Oskar and Nissim, Malvina and Bisazza, Arianna},
  journal = {arXiv preprint arXiv:2302.13942},
  year = {2023},
  url = {https://doi.org/10.48550/arXiv.2302.13942}
}
```
Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models’ internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.

2022

de Vries, W., Wieling, M., & Nissim, M. (2022). Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 7676–7685). Dublin, Ireland: Association for Computational Linguistics.
```
@inproceedings{de-vries-etal-2022-make,
  title = {Make the Best of Cross-lingual Transfer: Evidence from {POS} Tagging with over 100 Languages},
  author = {de Vries, Wietse and Wieling, Martijn and Nissim, Malvina},
  editor = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month = may,
  year = {2022},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.acl-long.529},
  doi = {10.18653/v1/2022.acl-long.529},
  pages = {7676--7685}
}
```
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective approach for low-resource languages with no labeled training data. Existing evaluations of zero-shot cross-lingual generalisability of large pre-trained models use datasets with English training data, and test data in a selection of target languages. We explore a more extensive transfer learning setup with 65 different source languages and 105 target languages for part-of-speech tagging. Through our analysis, we show that pre-training of both source and target language, as well as matching language families, writing systems, word order systems, and lexical-phonetic distance significantly impact cross-lingual performance. The findings described in this paper can be used as indicators of which factors are important for effective zero-shot cross-lingual transfer to zero- and low-resource languages.

Sarti, G., & Nissim, M. (2022). It5: Large-scale text-to-text pretraining for italian language understanding and generation. ArXiv Preprint ArXiv:2203.03759.

@article{sarti2022it5,
  title = {It5: Large-scale text-to-text pretraining for italian language understanding and generation},
  author = {Sarti, Gabriele and Nissim, Malvina},
  journal = {arXiv preprint arXiv:2203.03759},
  year = {2022}
}

Minnema, G., Gemelli, S., Zanchi, C., Caselli, T., & Nissim, M. (2022). SocioFillmore: A Tool for Discovering Perspectives. In V. Basile, Z. Kozareva, & S. Stajner (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 240–250). Dublin, Ireland: Association for Computational Linguistics.
```
@inproceedings{minnema-etal-2022-sociofillmore,
  title = {{S}ocio{F}illmore: A Tool for Discovering Perspectives},
  author = {Minnema, Gosse and Gemelli, Sara and Zanchi, Chiara and Caselli, Tommaso and Nissim, Malvina},
  editor = {Basile, Valerio and Kozareva, Zornitsa and Stajner, Sanja},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  month = may,
  year = {2022},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.acl-demo.24},
  doi = {10.18653/v1/2022.acl-demo.24},
  pages = {240--250}
}
```
SOCIOFILLMORE is a multilingual tool which helps to bring to the fore the focus or the perspective that a text expresses in depicting an event. Our tool, whose rationale we also support through a large collection of human judgements, is theoretically grounded on frame semantics and cognitive linguistics, and implemented using the LOME frame semantic parser. We describe SOCIOFILLMORE’s development and functionalities, show how non-NLP researchers can easily interact with the tool, and present some example case studies which are already incorporated in the system, together with the kind of analysis that can be visualised.
Lai, H., Toral, A., & Nissim, M. (2022). Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 262–271). Dublin, Ireland: Association for Computational Linguistics.
```
@inproceedings{lai-etal-2022-multilingual,
  title = {Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer},
  author = {Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  month = may,
  year = {2022},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.acl-short.29},
  doi = {10.18653/v1/2022.acl-short.29},
  pages = {262--271}
}
```
We exploit the pre-trained seq2seq model mBART for multilingual text style transfer. Using machine translated data as well as gold aligned English sentences yields state-of-the-art results in the three target languages we consider. Besides, in view of the general scarcity of parallel data, we propose a modular approach for multilingual formality transfer, which consists of two training strategies that target adaptation to both language and task. Our approach achieves competitive performance without monolingual task-specific parallel data and can be applied to other style transfer tasks as well as to other languages.

Caselli, T., & Nissim, M. (2022). Harvesting Perspectives in Social Media. In P. Vossen & A. Vossen (Eds.), Creating a More Transparent Internet: The Perspective Web (pp. 244–259). Cambridge University Press.

@incollection{caselli2022harvesting,
  title = {Harvesting Perspectives in Social Media},
  author = {Caselli, Tommaso and Nissim, Malvina},
  booktitle = {Creating a More Transparent Internet: The Perspective Web},
  pages = {244-259},
  editor = {Vossen, Piek and Vossen, Antske},
  year = {2022},
  collection = {Studies in Natural Language Processing},
  publisher = {Cambridge University Press},
  url = {https://www.cambridge.org/core/books/creating-a-more-transparent-internet/harvesting-perspectives-in-social-media-tommaso-caselli-and-malvina-nissim/D4EA9A2EA6A99C8F6AE37CA4633F3307}
}

Lang, I., Plas, L., Nissim, M., & Gatt, A. (2022). Visually Grounded Interpretation of Noun-Noun Compounds in English. In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 23–35). Dublin, Ireland: Association for Computational Linguistics.
```
@inproceedings{lang-etal-2022-visually,
  title = {Visually Grounded Interpretation of Noun-Noun Compounds in {E}nglish},
  author = {Lang, Inga and Plas, Lonneke and Nissim, Malvina and Gatt, Albert},
  editor = {Chersoni, Emmanuele and Hollenstein, Nora and Jacobs, Cassandra and Oseki, Yohei and Pr{\'e}vot, Laurent and Santus, Enrico},
  booktitle = {Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics},
  month = may,
  year = {2022},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.cmcl-1.3},
  doi = {10.18653/v1/2022.cmcl-1.3},
  pages = {23--35}
}
```
Noun-noun compounds (NNCs) occur frequently in the English language. Accurate NNC interpretation, i.e. determining the implicit relationship between the constituents of a NNC, is crucial for the advancement of many natural language processing tasks. Until now, computational NNC interpretation has been limited to approaches involving linguistic representations only. However, much research suggests that grounding linguistic representations in vision or other modalities can increase performance on this and other tasks. Our work is a novel comparison of linguistic and visuo-linguistic representations for the task of NNC interpretation. We frame NNC interpretation as a relation classification task, evaluating on a large, relationally-annotated NNC dataset. We combine distributional word vectors with image vectors to investigate how visual information can help improve NNC interpretation systems. We find that adding visual vectors increases classification performance on our dataset in many cases.
Lai, H., Mao, J., Toral, A., & Nissim, M. (2022). Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer. In A. Belz, M. Popović, E. Reiter, & A. Shimorina (Eds.), Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval) (pp. 102–115). Dublin, Ireland: Association for Computational Linguistics.
```
@inproceedings{lai-etal-2022-human,
  title = {Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer},
  author = {Lai, Huiyuan and Mao, Jiali and Toral, Antonio and Nissim, Malvina},
  editor = {Belz, Anya and Popovi{\'c}, Maja and Reiter, Ehud and Shimorina, Anastasia},
  booktitle = {Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)},
  month = may,
  year = {2022},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.humeval-1.9},
  doi = {10.18653/v1/2022.humeval-1.9},
  pages = {102--115}
}
```
Although text style transfer has witnessed rapid development in recent years, there is as yet no established standard for evaluation, which is performed using several automatic metrics, lacking the possibility of always resorting to human judgement. We focus on the task of formality transfer, and on the three aspects that are usually evaluated: style strength, content preservation, and fluency. To cast light on how such aspects are assessed by common and new metrics, we run a human-based evaluation and perform a rich correlation analysis. We are then able to offer some recommendations on the use of such metrics in formality transfer, also with an eye to their generalisability (or not) to related tasks.
de Graaf, E., Stopponi, S., Bos, J. K., Peels-Matthey, S., & Nissim, M. (2022). AGILe: The First Lemmatizer for Ancient Greek Inscriptions. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, … S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 5334–5344). Marseille, France: European Language Resources Association.
```
@inproceedings{de-graaf-etal-2022-agile,
  title = {{AGIL}e: The First Lemmatizer for {A}ncient {G}reek Inscriptions},
  author = {de Graaf, Evelien and Stopponi, Silvia and Bos, Jasper K. and Peels-Matthey, Saskia and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Odijk, Jan and Piperidis, Stelios},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  month = jun,
  year = {2022},
  address = {Marseille, France},
  publisher = {European Language Resources Association},
  url = {https://aclanthology.org/2022.lrec-1.571},
  pages = {5334--5344}
}
```
To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.
Lai, H., & Nissim, M. (2022). Multi-Figurative Language Generation. In N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, … S.-H. Na (Eds.), Proceedings of the 29th International Conference on Computational Linguistics (pp. 5939–5954). Gyeongju, Republic of Korea: International Committee on Computational Linguistics.
```
@inproceedings{lai-nissim-2022-multi,
  title = {Multi-Figurative Language Generation},
  author = {Lai, Huiyuan and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and Huang, Chu-Ren and Kim, Hansaem and Pustejovsky, James and Wanner, Leo and Choi, Key-Sun and Ryu, Pum-Mo and Chen, Hsin-Hsi and Donatelli, Lucia and Ji, Heng and Kurohashi, Sadao and Paggio, Patrizia and Xue, Nianwen and Kim, Seokhwan and Hahm, Younggyun and He, Zhong and Lee, Tony Kyungil and Santus, Enrico and Bond, Francis and Na, Seung-Hoon},
  booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
  month = oct,
  year = {2022},
  address = {Gyeongju, Republic of Korea},
  publisher = {International Committee on Computational Linguistics},
  url = {https://aclanthology.org/2022.coling-1.519},
  pages = {5939--5954}
}
```
Figurative language generation is the task of reformulating a given text in the desired figure of speech while still being faithful to the original context. We take the first step towards multi-figurative language modelling by providing a benchmark for the automatic generation of five common figurative forms in English. We train mFLAG employing a scheme for multi-figurative language pre-training on top of BART, and a mechanism for injecting the target figurative information into the encoder; this enables the generation of text with the target figurative form from another figurative form without parallel figurative-figurative sentence pairs. Our approach outperforms all strong baselines. We also offer some qualitative analysis and reflections on the relationship between the different figures of speech.
Minnema, G., Gemelli, S., Zanchi, C., Caselli, T., & Nissim, M. (2022). Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports. In Y. He, H. Ji, S. Li, Y. Liu, & C.-H. Chang (Eds.), Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1078–1090). Online only: Association for Computational Linguistics.
```
@inproceedings{minnema-etal-2022-dead,
  title = {Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports},
  author = {Minnema, Gosse and Gemelli, Sara and Zanchi, Chiara and Caselli, Tommaso and Nissim, Malvina},
  editor = {He, Yulan and Ji, Heng and Li, Sujian and Liu, Yang and Chang, Chua-Hui},
  booktitle = {Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  month = nov,
  year = {2022},
  address = {Online only},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.aacl-main.79},
  pages = {1078--1090}
}
```
Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic research in this area and conduct a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. We then train regression models that predict the salience of GBV participants with respect to different dimensions of perceived responsibility. Our best model (fine-tuned BERT) shows solid overall performance, with large differences between dimensions and participants: salient _focus_ is more predictable than salient _blame_, and perpetrators’ salience is more predictable than victims’ salience. Experiments with ridge regression models using different representations show that features based on linguistic theory similarly to word-based features. Overall, we show that different linguistic choices do trigger different perceptions of responsibility, and that such perceptions can be modelled automatically. This work can be a core instrument to raise awareness of the consequences of different perspectivizations in the general public and in news producers alike.
Minnema, G., Ruggiero, G., Bartl, M., Gemelli, S., Caselli, T., Zanchi, C., … Nissim, M. (2022). Responsibility Framing under the Magnifying Lens of NLP: The Case of Gender-based Violence and Traffic Danger. Computational Linguistics in the Netherlands Journal, 12, 207–233.
```
@article{gosse2022responsibility,
  title = {Responsibility Framing under the Magnifying Lens of NLP: The Case of Gender-based Violence and Traffic Danger},
  author = {Minnema, Gosse and Ruggiero, Gaetana and Bartl, Marion and Gemelli, Sara and Caselli, Tommaso and Zanchi, Chiara and Patti, Viviana and te Br{\"o}mmelstroet, Marco and Nissim, Malvina},
  journal = {Computational Linguistics in the Netherlands Journal},
  volume = {12},
  pages = {207--233},
  year = {2022},
  url = {https://clinjournal.org/clinj/article/view/155}
}
```
We introduce a framework for the computational analysis of how responsibility is framed in the reporting of two types of socially relevant events: gender-based violence (specifically, femicides in the Italian press), and traffic danger (specifically, traffic crashes in Dutch and Flemish news reports). We advocate for the parallel analysis of these two phenomena under the same theoretical framework, which draws on Frame Semantics, Critical Discourse Analysis and Natural Language Processing. Reusing two existing event-text datasets we show how computational experiments and the resulting analyses can be run. This work supports the testing and development of tools for NLP practitioners, as well as large-scale linguistic analyses for activists and journalists, in the context of socially impacting events.

Nissim, M., & Pannitto, L. (2022). Che cos’è la linguistica computazionale (p. 128). Le Bussole, Carocci editore.

@book{nissim2022che,
  title = {{Che cos'{\`e} la linguistica computazionale}},
  author = {Nissim, Malvina and Pannitto, Ludovica},
  pages = {128},
  year = {2022},
  publisher = {Le Bussole, Carocci editore},
  url = {https://www.carocci.it/prodotto/che-cose-la-linguistica-computazionale}
}

2021

de Vries, W., & Nissim, M. (2021). As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 836–846). Online: Association for Computational Linguistics.
```
@inproceedings{de-vries-nissim-2021-good,
  title = {As Good as New. How to Successfully Recycle {E}nglish {GPT}-2 to Make Models for Other Languages},
  author = {de Vries, Wietse and Nissim, Malvina},
  editor = {Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto},
  booktitle = {Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.findings-acl.74},
  doi = {10.18653/v1/2021.findings-acl.74},
  pages = {836--846}
}
```
Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch. Though on average these sentences are still identifiable as artificial by humans, they are assessed on par with sentences generated by a GPT-2 model fully trained from scratch.

Messina, L., Busso, L., Combei, C. R., Miaschi, A., Pannitto, L., Sarti, G., & Nissim, M. (2021). A dissemination workshop for introducing young Italian students to NLP. In D. Jurgens, V. Kolhatkar, L. Li, M. Mieskes, & T. Pedersen (Eds.), Proceedings of the Fifth Workshop on Teaching NLP (pp. 52–54). Online: Association for Computational Linguistics.

@inproceedings{messina-etal-2021-dissemination,
  title = {A dissemination workshop for introducing young {I}talian students to {NLP}},
  author = {Messina, Lucio and Busso, Lucia and Combei, Claudia Roberta and Miaschi, Alessio and Pannitto, Ludovica and Sarti, Gabriele and Nissim, Malvina},
  editor = {Jurgens, David and Kolhatkar, Varada and Li, Lucy and Mieskes, Margot and Pedersen, Ted},
  booktitle = {Proceedings of the Fifth Workshop on Teaching NLP},
  month = jun,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.teachingnlp-1.7},
  doi = {10.18653/v1/2021.teachingnlp-1.7},
  pages = {52--54}
}

Pannitto, L., Busso, L., Combei, C. R., Messina, L., Miaschi, A., Sarti, G., & Nissim, M. (2021). Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students. In D. Jurgens, V. Kolhatkar, L. Li, M. Mieskes, & T. Pedersen (Eds.), Proceedings of the Fifth Workshop on Teaching NLP (pp. 160–170). Online: Association for Computational Linguistics.
```
@inproceedings{pannitto-etal-2021-teaching,
  title = {Teaching {NLP} with Bracelets and Restaurant Menus: An Interactive Workshop for {I}talian Students},
  author = {Pannitto, Ludovica and Busso, Lucia and Combei, Claudia Roberta and Messina, Lucio and Miaschi, Alessio and Sarti, Gabriele and Nissim, Malvina},
  editor = {Jurgens, David and Kolhatkar, Varada and Li, Lucy and Mieskes, Margot and Pedersen, Ted},
  booktitle = {Proceedings of the Fifth Workshop on Teaching NLP},
  month = jun,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.teachingnlp-1.26},
  doi = {10.18653/v1/2021.teachingnlp-1.26},
  pages = {160--170}
}
```
Although Natural Language Processing is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be, and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2020, both face-to-face and online.
de Vries, W., Bartelds, M., Nissim, M., & Wieling, M. (2021). Adapting Monolingual Models: Data can be Scarce when Language Similarity is High. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4901–4907). Online: Association for Computational Linguistics.
```
@inproceedings{de-vries-etal-2021-adapting,
  title = {Adapting Monolingual Models: Data can be Scarce when Language Similarity is High},
  author = {de Vries, Wietse and Bartelds, Martijn and Nissim, Malvina and Wieling, Martijn},
  editor = {Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto},
  booktitle = {Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.findings-acl.433},
  doi = {10.18653/v1/2021.findings-acl.433},
  pages = {4901--4907}
}
```
For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently finetuned on a POS-tagging task in the model’s source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model.
Lai, H., Toral, A., & Nissim, M. (2021). Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 484–494). Online: Association for Computational Linguistics.
```
@inproceedings{lai-etal-2021-thank,
  title = {Thank you {BART}! Rewarding Pre-Trained Models Improves Formality Style Transfer},
  author = {Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.acl-short.62},
  doi = {10.18653/v1/2021.acl-short.62},
  pages = {484--494}
}
```
Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content. We show that fine-tuning pre-trained language (GPT-2) and sequence-to-sequence (BART) models boosts content preservation, and that this is possible even with limited amounts of parallel data. Augmenting these models with rewards that target style and content –the two core aspects of the task– we achieve a new state-of-the-art.
De Mattei, L., Lai, H., Dell’Orletta, F., & Nissim, M. (2021). Human Perception in Natural Language Generation. In A. Bosselut, E. Durmus, V. P. Gangal, S. Gehrmann, Y. Jernite, L. Perez-Beltrachini, … W. Xu (Eds.), Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) (pp. 15–23). Online: Association for Computational Linguistics.
```
@inproceedings{de-mattei-etal-2021-human,
  title = {Human Perception in Natural Language Generation},
  author = {De Mattei, Lorenzo and Lai, Huiyuan and Dell{'}Orletta, Felice and Nissim, Malvina},
  editor = {Bosselut, Antoine and Durmus, Esin and Gangal, Varun Prashant and Gehrmann, Sebastian and Jernite, Yacine and Perez-Beltrachini, Laura and Shaikh, Samira and Xu, Wei},
  booktitle = {Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.gem-1.2},
  doi = {10.18653/v1/2021.gem-1.2},
  pages = {15--23}
}
```
We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, and observe that this fine-tuned model produces texts that are indeed perceived more human-like than the original model. Contextually, we show that our automatic evaluation strategy well correlates with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.
Caselli, T., Schelhaas, A., Weultjes, M., Leistra, F., van der Veen, H., Timmerman, G., & Nissim, M. (2021). DALC: the Dutch Abusive Language Corpus. In A. Mostafazadeh Davani, D. Kiela, M. Lambert, B. Vidgen, V. Prabhakaran, & Z. Waseem (Eds.), Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) (pp. 54–66). Online: Association for Computational Linguistics.
```
@inproceedings{caselli-etal-2021-dalc,
  title = {{DALC}: the {D}utch Abusive Language Corpus},
  author = {Caselli, Tommaso and Schelhaas, Arjan and Weultjes, Marieke and Leistra, Folkert and van der Veen, Hylke and Timmerman, Gerben and Nissim, Malvina},
  editor = {Mostafazadeh Davani, Aida and Kiela, Douwe and Lambert, Mathias and Vidgen, Bertie and Prabhakaran, Vinodkumar and Waseem, Zeerak},
  booktitle = {Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
  month = aug,
  year = {2021},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.woah-1.6},
  doi = {10.18653/v1/2021.woah-1.6},
  pages = {54--66}
}
```
As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.
Lai, H., Toral, A., & Nissim, M. (2021). Generic resources are what you need: Style transfer tasks without task-specific parallel training data. In M.-F. Moens, X. Huang, L. Specia, & S. W.-tau Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 4241–4254). Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
```
@inproceedings{lai-etal-2021-generic,
  title = {Generic resources are what you need: Style transfer tasks without task-specific parallel training data},
  author = {Lai, Huiyuan and Toral, Antonio and Nissim, Malvina},
  editor = {Moens, Marie-Francine and Huang, Xuanjing and Specia, Lucia and Yih, Scott Wen-tau},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  year = {2021},
  address = {Online and Punta Cana, Dominican Republic},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.emnlp-main.349},
  doi = {10.18653/v1/2021.emnlp-main.349},
  pages = {4241--4254}
}
```
Style transfer aims to rewrite a source text in a different target style while preserving its content. We propose a novel approach to this task that leverages generic resources, and without using any task-specific parallel (source–target) data outperforms existing unsupervised approaches on the two most popular style transfer tasks: formality transfer and polarity swap. In practice, we adopt a multi-step procedure which builds on a generic pre-trained sequence-to-sequence model (BART). First, we strengthen the model’s ability to rewrite by further pre-training BART on both an existing collection of generic paraphrases, as well as on synthetic pairs created using a general-purpose lexical resource. Second, through an iterative back-translation approach, we train two models, each in a transfer direction, so that they can provide each other with synthetically generated pairs, dynamically in the training process. Lastly, we let our best resulting model generate static synthetic pairs to be used in a supervised training regime. Besides methodology and state-of-the-art results, a core contribution of this work is a reflection on the nature of the two tasks we address, and how their differences are highlighted by their response to our approach.
Minnema, G., & Nissim, M. (2021). Breeding Fillmore’s Chickens and Hatching the Eggs: Recombining Frames and Roles in Frame-Semantic Parsing. In S. Zarrieß, J. Bos, R. van Noord, & L. Abzianidze (Eds.), Proceedings of the 14th International Conference on Computational Semantics (IWCS) (pp. 155–165). Groningen, The Netherlands (online): Association for Computational Linguistics.
```
@inproceedings{minnema-nissim-2021-breeding,
  title = {Breeding {F}illmore{'}s Chickens and Hatching the Eggs: Recombining Frames and Roles in Frame-Semantic Parsing},
  author = {Minnema, Gosse and Nissim, Malvina},
  editor = {Zarrie{\ss}, Sina and Bos, Johan and van Noord, Rik and Abzianidze, Lasha},
  booktitle = {Proceedings of the 14th International Conference on Computational Semantics (IWCS)},
  month = jun,
  year = {2021},
  address = {Groningen, The Netherlands (online)},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2021.iwcs-1.15},
  pages = {155--165}
}
```
Frame-semantic parsers traditionally predict predicates, frames, and semantic roles in a fixed order. This paper explores the ‘chicken-or-egg’ problem of interdependencies between these components theoretically and practically. We introduce a flexible BERT-based sequence labeling architecture that allows for predicting frames and roles independently from each other or combining them in several ways. Our results show that our setups can approximate more complex traditional models’ performance, while allowing for a clearer view of the interdependencies between the pipeline’s components, and of how frame and role prediction models make different use of BERT’s layers.
Minnema, G., Gemelli, S., Zanchi, C., Patti, V., Caselli, T., Nissim, M., & others. (2021). Frame semantics for social NLP in Italian: Analyzing responsibility framing in femicide news reports. Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-It 2021), 3033, 1–8. CEUR-WS.
```
@inproceedings{minnema2021frame,
  title = {Frame semantics for social NLP in Italian: Analyzing responsibility framing in femicide news reports},
  author = {Minnema, Gosse and Gemelli, Sara and Zanchi, Chiara and Patti, Viviana and Caselli, Tommaso and Nissim, Malvina and others},
  booktitle = {Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2021)},
  volume = {3033},
  pages = {1--8},
  year = {2021},
  organization = {CEUR-WS},
  url = {https://ceur-ws.org/Vol-3033/paper32.pdf}
}
```
We propose using a FrameNet-based approach for analyzing how socially relevant events are framed in media discourses. Taking femicides as an example, we perform a preliminary investigation on a large dataset of news reports and event data covering recent femicides in Italy. First, we revisit the EVALITA 2011 shared task on Italian frame labeling, and test a recent multilingual frame semantic parser against this benchmark. Then, we experiment with specializing this model for Italian and perform a human evaluation to test our model’s real-world applicability. We show how FrameNet-based analyses can help to identify linguistic constructions that background the agentivity and responsibility of femicide perpetrators in Italian news.

2020

Nissim, M., van Noord, R., & van der Goot, R. (2020). Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor. Computational Linguistics, 46, 487–497.
```
@article{nissim-etal-2020-fair,
  title = {Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor},
  author = {Nissim, Malvina and van Noord, Rik and van der Goot, Rob},
  journal = {Computational Linguistics},
  volume = {46},
  number = {2},
  month = jun,
  year = {2020},
  url = {https://aclanthology.org/2020.cl-2.7},
  doi = {10.1162/coli_a_00379},
  pages = {487--497},
  publisher = {MIT Press}
}
```
Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings. Concurrently, they have also been used to expose how strongly human biases are encoded in vector spaces trained on natural language, with examples like man is to computer programmer as woman is to homemaker. Recent work has shown that analogies are in fact not an accurate diagnostic for bias, but this does not mean that they are not used anymore, or that their legacy is fading. Instead of focusing on the intrinsic problems of the analogy task as a bias detection tool, we discuss a series of issues involving implementation as well as subjective choices that might have yielded a distorted picture of bias in word embeddings. We stand by the truth that human biases are present in word embeddings, and, of course, the need to address them. But analogies are not an accurate tool to do so, and the way they have been most often used has exacerbated some possibly non-existing biases and perhaps hidden others. Because they are still widely popular, and some of them have become classics within and outside the NLP community, we deem it important to provide a series of clarifications that should put well-known, and potentially new analogies, into the right perspective.
de Vries, W., van Cranenburgh, A., & Nissim, M. (2020). What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 4339–4350). Online: Association for Computational Linguistics.
```
@inproceedings{de-vries-etal-2020-whats,
  title = {What{'}s so special about {BERT}{'}s layers? A closer look at the {NLP} pipeline in monolingual and multilingual models},
  author = {de Vries, Wietse and van Cranenburgh, Andreas and Nissim, Malvina},
  editor = {Cohn, Trevor and He, Yulan and Liu, Yang},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
  month = nov,
  year = {2020},
  address = {Online},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2020.findings-emnlp.389},
  doi = {10.18653/v1/2020.findings-emnlp.389},
  pages = {4339--4350}
}
```
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations, so that it may be more useful to combine information from different layers, instead of selecting a single one based on the best overall performance.
De Mattei, L., Cafagna, M., Dell’Orletta, F., Nissim, M., & Guerini, M. (2020). Geppetto carves italian into a language model. Proceedings of CLiC-It 2020.
```
@inproceedings{de2020geppetto,
  title = {Geppetto carves italian into a language model},
  author = {De Mattei, Lorenzo and Cafagna, Michele and Dell'Orletta, Felice and Nissim, Malvina and Guerini, Marco},
  booktitle = {Proceedings of CLiC-it~2020},
  year = {2020},
  url = {https://ceur-ws.org/Vol-2769/paper_46.pdf}
}
```
In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto’s quality by means of both an automatic and a humanbased evaluation. The automatic assessment consists in (i) calculating perplexity across different genres and (ii) a profiling analysis over GePpeTto’s writing characteristics. We find that GePpeTto’s production is a sort of bonsai version of human production, with shorter but yet complex sentences. Human evaluation is performed over a sentence completion task, where GePpeTto’s output is judged as natural more often than not, and much closer to the original human texts than to a simpler language model which we take as baseline.
Haagsma, H., Bos, J., & Nissim, M. (2020). MAGPIE: A Large Corpus of Potentially Idiomatic Expressions. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, … S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 279–287). Marseille, France: European Language Resources Association.
```
@inproceedings{haagsma-etal-2020-magpie,
  title = {{MAGPIE}: A Large Corpus of Potentially Idiomatic Expressions},
  author = {Haagsma, Hessel and Bos, Johan and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios},
  booktitle = {Proceedings of the Twelfth Language Resources and Evaluation Conference},
  month = may,
  year = {2020},
  address = {Marseille, France},
  publisher = {European Language Resources Association},
  url = {https://aclanthology.org/2020.lrec-1.35},
  pages = {279--287},
  language = {English},
  isbn = {979-10-95546-34-4}
}
```
Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom processing and linguistic analysis by creating the largest-to-date corpus of idioms for English. Using a fixed idiom list, automatic pre-extraction, and a strictly controlled crowdsourced annotation procedure, we show that it is feasible to build a high-quality corpus comprising more than 50K instances, an order of a magnitude larger than previous resources. Crucial ingredients of crowdsourcing were the selection of crowdworkers, clear and comprehensive instructions, and an interface that breaks down the task in small, manageable steps. Analysis of the resulting corpus revealed strong effects of genre on idiom distribution, providing new evidence for existing theories on what influences idiom usage. The corpus also contains rich metadata, and is made publicly available.
van Rosendaal, J., Caselli, T., & Nissim, M. (2020). Lower Bias, Higher Density Abusive Language Datasets: A Recipe. In J. Monti, V. Basile, M. P. D. Buono, R. Manna, A. Pascucci, & S. Tonelli (Eds.), Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language (pp. 14–19). Marseille, France: European Language Resources Association (ELRA).
```
@inproceedings{van-rosendaal-etal-2020-lower,
  title = {Lower Bias, Higher Density Abusive Language Datasets: A Recipe},
  author = {van Rosendaal, Juliet and Caselli, Tommaso and Nissim, Malvina},
  editor = {Monti, Johanna and Basile, Valerio and Buono, Maria Pia Di and Manna, Raffaele and Pascucci, Antonio and Tonelli, Sara},
  booktitle = {Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language},
  month = may,
  year = {2020},
  address = {Marseille, France},
  publisher = {European Language Resources Association (ELRA)},
  url = {https://aclanthology.org/2020.restup-1.4},
  pages = {14--19},
  language = {English},
  isbn = {979-10-95546-49-8}
}
```
Datasets to train models for abusive language detection are at the same time necessary and still scarce. One the reasons for their limited availability is the cost of their creation. It is not only that manual annotation is expensive, it is also the case that the phenomenon is sparse, causing human annotators having to go through a large number of irrelevant examples in order to obtain some significant data. Strategies used until now to increase density of abusive language and obtain more meaningful data overall, include data filtering on the basis of pre-selected keywords and hate-rich sources of data. We suggest a recipe that at the same time can provide meaningful data with possibly higher density of abusive language and also reduce top-down biases imposed by corpus creators in the selection of the data to annotate. More specifically, we exploit the controversy channel on Reddit to obtain keywords that are used to filter a Twitter dataset. While the method needs further validation and refinement, our preliminary experiments show a higher density of abusive tweets in the filtered vs unfiltered dataset, and a more meaningful topic distribution after filtering.
De Mattei, L., Cafagna, M., Dell’Orletta, F., & Nissim, M. (2020). Invisible to People but not to Machines: Evaluation of Style-aware HeadlineGeneration in Absence of Reliable Human Judgment. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, … S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6709–6717). Marseille, France: European Language Resources Association.
```
@inproceedings{de-mattei-etal-2020-invisible,
  title = {Invisible to People but not to Machines: Evaluation of Style-aware {H}eadline{G}eneration in Absence of Reliable Human Judgment},
  author = {De Mattei, Lorenzo and Cafagna, Michele and Dell{'}Orletta, Felice and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios},
  booktitle = {Proceedings of the Twelfth Language Resources and Evaluation Conference},
  month = may,
  year = {2020},
  address = {Marseille, France},
  publisher = {European Language Resources Association},
  url = {https://aclanthology.org/2020.lrec-1.828},
  pages = {6709--6717},
  language = {English},
  isbn = {979-10-95546-34-4}
}
```
We automatically generate headlines that are expected to comply with the specific styles of two different Italian newspapers. Through a data alignment strategy and different training/testing settings, we aim at decoupling content from style and preserve the latter in generation. In order to evaluate the generated headlines’ quality in terms of their specific newspaper-compliance, we devise a fine-grained evaluation strategy based on automatic classification. We observe that our models do indeed learn newspaper-specific style. Importantly, we also observe that humans aren’t reliable judges for this task, since although familiar with the newspapers, they are not able to discern their specific styles even in the original human-written headlines. The utility of automatic evaluation goes therefore beyond saving the costs and hurdles of manual annotation, and deserves particular care in its design.

Masini, F., Micheli, M. S., Zaninello, A., Castagnoli, S., & Nissim, M. (2020). MWE_combinet_release_1. 0. Associazione Italiana di Linguistica Computazionale.

@misc{masini2020mwe_combinet_release_1,
  title = {MWE\_combinet\_release\_1. 0},
  author = {Masini, Francesca and Micheli, M Silvia and Zaninello, Andrea and Castagnoli, Sara and Nissim, Malvina},
  year = {2020},
  publisher = {Associazione Italiana di Linguistica Computazionale}
}

Bartl, M., Nissim, M., & Gatt, A. (2020). Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias. In M. R. Costa-jussà, C. Hardmeier, W. Radford, & K. Webster (Eds.), Proceedings of the Second Workshop on Gender Bias in Natural Language Processing (pp. 1–16). Barcelona, Spain (Online): Association for Computational Linguistics.
```
@inproceedings{bartl-etal-2020-unmasking,
  title = {Unmasking Contextual Stereotypes: Measuring and Mitigating {BERT}{'}s Gender Bias},
  author = {Bartl, Marion and Nissim, Malvina and Gatt, Albert},
  editor = {Costa-juss{\`a}, Marta R. and Hardmeier, Christian and Radford, Will and Webster, Kellie},
  booktitle = {Proceedings of the Second Workshop on Gender Bias in Natural Language Processing},
  month = dec,
  year = {2020},
  address = {Barcelona, Spain (Online)},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2020.gebnlp-1.1},
  pages = {1--16}
}
```
Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically,especially in view of the current emphasis on large-scale, multilingual language models.

Bassignana, E., Nissim, M., & Patti, V. (2020). Personal-ity: a novel youtube-based corpus for personality prediction in Italian. Proceedings of CLiC-It 2020.

@inproceedings{bassignana2020personal,
  title = {Personal-ity: a novel youtube-based corpus for personality prediction in Italian},
  author = {Bassignana, Elisa and Nissim, Malvina and Patti, Viviana},
  booktitle = {Proceedings of CLiC-it~2020},
  year = {2020}
}

Ruggiero, G., Gatt, A., & Nissim, M. (2020). Datasets and Models for Authorship Attribution on Italian Personal Writings. Proceedings of CLiC-It 2020.

@inproceedings{ruggiero2020datasets,
  title = {Datasets and Models for Authorship Attribution on Italian Personal Writings},
  author = {Ruggiero, Gaetana and Gatt, Albert and Nissim, Malvina},
  booktitle = {Proceedings of CLiC-it~2020},
  year = {2020}
}

Bassignana, E., Nissim, M., & Patti, V. (2020). Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality. In M. Nissim, V. Patti, B. Plank, & E. Durmus (Eds.), Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s in Social Media (pp. 11–22). Barcelona, Spain (Online): Association for Computational Linguistics.
```
@inproceedings{bassignana-etal-2020-matching,
  title = {Matching Theory and Data with Personal-{ITY}: What a Corpus of {I}talian {Y}ou{T}ube Comments Reveals About Personality},
  author = {Bassignana, Elisa and Nissim, Malvina and Patti, Viviana},
  editor = {Nissim, Malvina and Patti, Viviana and Plank, Barbara and Durmus, Esin},
  booktitle = {Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media},
  month = dec,
  year = {2020},
  address = {Barcelona, Spain (Online)},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2020.peoples-1.2},
  pages = {11--22}
}
```
As a contribution to personality detection in languages other than English, we rely on distant supervision to create Personal-ITY, a novel corpus of YouTube comments in Italian, where authors are labelled with personality traits. The traits are derived from one of the mainstream personality theories in psychology research, named MBTI. Using personality prediction experiments, we (i) study the task of personality prediction in itself on our corpus as well as on TWISTY, a Twitter dataset also annotated with MBTI labels; (ii) carry out an extensive, in-depth analysis of the features used by the classifier, and view them specifically under the light of the original theory that we used to create the corpus in the first place. We observe that no single model is best at personality detection, and that while some traits are easier than others to detect, and also to match back to theory, for other, less frequent traits the picture is much more blurred.

Cimino, A., Dell’Orletta, F., & Nissim, M. (2020). TAG-it@ EVALITA 2020: Overview of the Topic, Age, and Gender Prediction Task for Italian. Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.

@article{cimino2020tag,
  title = {TAG-it@ EVALITA 2020: Overview of the Topic, Age, and Gender Prediction Task for Italian},
  author = {Cimino, Andrea and Dell’Orletta, Felice and Nissim, Malvina},
  journal = {Evaluation Campaign of Natural Language Processing and Speech Tools for Italian},
  year = {2020},
  publisher = {CEUR Workshop Proceedings (CEUR-WS. org)}
}

De Mattei, L., Cafagana, M., Dell’Orletta, F., Nissim, M., & Gatt, A. (2020). Change-it@ evalita 2020: Change headlines, adapt news, generate. Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. European Language Resources Association (ELRA).

@inproceedings{de2020change,
  title = {Change-it@ evalita 2020: Change headlines, adapt news, generate},
  author = {De Mattei, Lorenzo and Cafagana, Michele and Dell’Orletta, Felice and Nissim, Malvina and Gatt, Albert},
  booktitle = {Evaluation Campaign of Natural Language Processing and Speech Tools for Italian},
  year = {2020},
  organization = {European Language Resources Association (ELRA)}
}

Mattei, L. D., Cafagna, M., Lai, H., Dell’Orletta, F., Nissim, M., & Gatt, A. (2020). On the interaction of automatic evaluation and task framing in headline style transfer. In S. Agarwal, O. Dušek, S. Gehrmann, D. Gkatzia, I. Konstas, E. Van Miltenburg, & S. Santhanam (Eds.), Proceedings of the 1st Workshop on Evaluating NLG Evaluation (pp. 38–43). Online (Dublin, Ireland): Association for Computational Linguistics.
```
@inproceedings{mattei-etal-2020-interaction,
  title = {On the interaction of automatic evaluation and task framing in headline style transfer},
  author = {Mattei, Lorenzo De and Cafagna, Michele and Lai, Huiyuan and Dell{'}Orletta, Felice and Nissim, Malvina and Gatt, Albert},
  editor = {Agarwal, Shubham and Du{\v{s}}ek, Ond{\v{r}}ej and Gehrmann, Sebastian and Gkatzia, Dimitra and Konstas, Ioannis and Van Miltenburg, Emiel and Santhanam, Sashank},
  booktitle = {Proceedings of the 1st Workshop on Evaluating NLG Evaluation},
  month = dec,
  year = {2020},
  address = {Online (Dublin, Ireland)},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2020.evalnlgeval-1.5},
  pages = {38--43}
}
```
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU.

Masini, F., Micheli, M. S., Zaninello, A., Castagnoli, S., & Nissim, M. (2020). Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian. Italian Conference on Computational Linguistics 2020. CEUR-WS. org.

@inproceedings{masini2020multiword,
  title = {Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian},
  author = {Masini, Francesca and Micheli, M Silvia and Zaninello, Andrea and Castagnoli, Sara and Nissim, Malvina},
  booktitle = {Italian Conference on Computational Linguistics 2020},
  year = {2020},
  organization = {CEUR-WS. org}
}

Minnema, G., Remijnse, L., Bos, J., Caselli, T., Fokkens, A., Nissim, M., … Vossen, P. (2020). Towards reference-aware FrameNet representations: Bridging generic and specific event knowledge. GeCKo Symposium: Integrating Generic and Contextual Knowledge.

@inproceedings{minnema2020towards,
  title = {Towards reference-aware FrameNet representations: Bridging generic and specific event knowledge},
  author = {Minnema, Gosse and Remijnse, Levi and Bos, Johan and Caselli, Tommaso and Fokkens, Antske and Nissim, Malvina and Postma, Marten and Vossen, Piek},
  booktitle = {GeCKo Symposium: Integrating Generic and Contextual Knowledge},
  year = {2020}
}

Cafagna, M., De Mattei, L., & Nissim, M. (2020). Embeddings-based detection of word use variation in Italian newspapers. IJCoL. Italian Journal of Computational Linguistics, 6, 9–22.
```
@article{cafagna2020embeddings,
  title = {Embeddings-based detection of word use variation in Italian newspapers},
  author = {Cafagna, Michele and De Mattei, Lorenzo and Nissim, Malvina},
  journal = {IJCoL. Italian Journal of Computational Linguistics},
  volume = {6},
  number = {6-2},
  pages = {9--22},
  year = {2020},
  publisher = {Accademia University Press},
  url = {https://journals.openedition.org/ijcol/703}
}
```
We study how words are used differently in two Italian newspapers at opposite ends of the political spectrum by training embeddings on one newspaper’s corpus, updating the weights on the second one, and observing vector shifts. We run two types of analysis, one top-down, based on a preselection of frequent words in both newspapers, and one bottom-up, on the basis of a combination of the observed shifts and relative and absolute frequency. The analysis is specific to this data, but the method can serve as a blueprint for similar studies.

2019

Basile, A., Gatt, A., & Nissim, M. (2019). You Write like You Eat: Stylistic Variation as a Predictor of Social Stratification. In A. Korhonen, D. Traum, & Màrquez Lluı́s (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2583–2593). Florence, Italy: Association for Computational Linguistics.
```
@inproceedings{basile-etal-2019-write,
  title = {You Write like You Eat: Stylistic Variation as a Predictor of Social Stratification},
  author = {Basile, Angelo and Gatt, Albert and Nissim, Malvina},
  editor = {Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'\i}s},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  month = jul,
  year = {2019},
  address = {Florence, Italy},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/P19-1246},
  doi = {10.18653/v1/P19-1246},
  pages = {2583--2593}
}
```
Inspired by Labov’s seminal work on stylisticvariation as a function of social stratification,we develop and compare neural models thatpredict a person’s presumed socio-economicstatus, obtained through distant supervision,from their writing style on social media. Thefocus of our work is on identifying the mostimportant stylistic parameters to predict socio-economic group. In particular, we show theeffectiveness of morpho-syntactic features aspredictors of style, in contrast to lexical fea-tures, which are good predictors of topic

Nieuwenhuis, M., & Nissim, M. (2019). The Contribution of Embeddings to Sentiment Analysis on YouTube. CLiC-It 2019.

@inproceedings{nieuwenhuis2019contribution,
  title = {The Contribution of Embeddings to Sentiment Analysis on YouTube.},
  author = {Nieuwenhuis, Moniek and Nissim, Malvina},
  booktitle = {CLiC-it~2019},
  year = {2019}
}

Cafagna, M., De Mattei, L., & Nissim, M. (2019). Embeddings Shifts as Proxies for Different Word Use in Italian Newspapers. CLiC-It 2019.

@inproceedings{cafagna2019embeddings,
  title = {Embeddings Shifts as Proxies for Different Word Use in Italian Newspapers.},
  author = {Cafagna, Michele and De Mattei, Lorenzo and Nissim, Malvina},
  booktitle = {CLiC-it~2019},
  year = {2019}
}

Cafagna, M., De Mattei, L., Bacciu, D., & Nissim, M. (2019). Suitable Doesn’t Mean Attractive. Human-Based Evaluation of Automatically Generated Headlines. Proceedings of CLiC-It 2019.

@inproceedings{cafagna2019suitable,
  title = {Suitable Doesn't Mean Attractive. Human-Based Evaluation of Automatically Generated Headlines.},
  author = {Cafagna, Michele and De Mattei, Lorenzo and Bacciu, Davide and Nissim, Malvina},
  booktitle = {Proceedings of CLiC-it~2019},
  year = {2019}
}

Haagsma, H., Nissim, M., & Bos, J. (2019). Casting a wide net: robust extraction of potentially idiomatic expressions. ArXiv Preprint ArXiv:1911.08829.

@article{haagsma2019casting,
  title = {Casting a wide net: robust extraction of potentially idiomatic expressions},
  author = {Haagsma, Hessel and Nissim, Malvina and Bos, Johan},
  journal = {arXiv preprint arXiv:1911.08829},
  year = {2019}
}

De Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., & Nissim, M. (2019). Bertje: A dutch bert model. ArXiv Preprint ArXiv:1912.09582.

@article{de2019bertje,
  title = {Bertje: A dutch bert model},
  author = {De Vries, Wietse and van Cranenburgh, Andreas and Bisazza, Arianna and Caselli, Tommaso and van Noord, Gertjan and Nissim, Malvina},
  journal = {arXiv preprint arXiv:1912.09582},
  year = {2019}
}

Haagsma, H., Kreutz, T., Medvedeva, M., Daelemans, W., & Nissim, M. (2019). Overview of the CLIN29 Shared Task on Cross-Genre Gender Prediction in Dutch. In Proceedings of the Shared Task on Cross-Genre Gender Prediction in Dutch at CLIN29 (GxG-CLIN29) (pp. 1–5). CEUR Proceedings 2453.
```
@incollection{haagsma2019overview,
  title = {Overview of the CLIN29 Shared Task on Cross-Genre Gender Prediction in Dutch},
  author = {Haagsma, Hessel and Kreutz, Tim and Medvedeva, Masha and Daelemans, Walter and Nissim, Malvina},
  booktitle = {Proceedings of the Shared Task on Cross-Genre Gender Prediction in Dutch at CLIN29 (GxG-CLIN29)},
  pages = {1--5},
  year = {2019},
  publisher = {CEUR Proceedings 2453},
  url = {https://ceur-ws.org/Vol-2453/paper00.pdf}
}
```
This overview presents the results of the cross-genre gender prediction task (GxG) organized at CLIN29. Teams were tasked with training a system to predict the gender of authors of tweets, YouTube comments and news articles. In the cross-genre setting, systems were trained on two genres, and tested on the other to assess domain adaptivity of the solutions. Eight teams participated in the shared task. Performance was generally better in the in-genre setting. In the cross-genre settings, performance on news articles declined the most compared to other target genres.

2018

van der Goot, R., Ljubešić, N., Matroos, I., Nissim, M., & Plank, B. (2018). Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. In I. Gurevych & Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 383–389). Melbourne, Australia: Association for Computational Linguistics.
```
@inproceedings{van-der-goot-etal-2018-bleaching,
  title = {Bleaching Text: Abstract Features for Cross-lingual Gender Prediction},
  author = {van der Goot, Rob and Ljube{\v{s}}i{\'c}, Nikola and Matroos, Ian and Nissim, Malvina and Plank, Barbara},
  editor = {Gurevych, Iryna and Miyao, Yusuke},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  month = jul,
  year = {2018},
  address = {Melbourne, Australia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/P18-2061},
  doi = {10.18653/v1/P18-2061},
  pages = {383--389}
}
```
Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.
Kulmizev, A., Abdou, M., Ravishankar, V., & Nissim, M. (2018). Discriminator at SemEval-2018 Task 10: Minimally Supervised Discrimination. In M. Apidianaki, S. M. Mohammad, J. May, E. Shutova, S. Bethard, & M. Carpuat (Eds.), Proceedings of the 12th International Workshop on Semantic Evaluation (pp. 1008–1012). New Orleans, Louisiana: Association for Computational Linguistics.
```
@inproceedings{kulmizev-etal-2018-discriminator,
  title = {Discriminator at {S}em{E}val-2018 Task 10: Minimally Supervised Discrimination},
  author = {Kulmizev, Artur and Abdou, Mostafa and Ravishankar, Vinit and Nissim, Malvina},
  editor = {Apidianaki, Marianna and Mohammad, Saif M. and May, Jonathan and Shutova, Ekaterina and Bethard, Steven and Carpuat, Marine},
  booktitle = {Proceedings of the 12th International Workshop on Semantic Evaluation},
  month = jun,
  year = {2018},
  address = {New Orleans, Louisiana},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/S18-1167},
  doi = {10.18653/v1/S18-1167},
  pages = {1008--1012}
}
```
We participated to the SemEval-2018 shared task on capturing discriminative attributes (Task 10) with a simple system that ranked 8th amongst the 26 teams that took part in the evaluation. Our final score was 0.67, which is competitive with the winning score of 0.75, particularly given that our system is a zero-shot system that requires no training and minimal parameter optimisation. In addition to describing the submitted system, and discussing the implications of the relative success of such a system on this task, we also report on other, more complex models we experimented with.
Haagsma, H., Nissim, M., & Bos, J. (2018). The Other Side of the Coin: Unsupervised Disambiguation of Potentially Idiomatic Expressions by Contrasting Senses. In A. Savary, C. Ramisch, J. D. Hwang, N. Schneider, M. Andresen, S. Pradhan, & M. R. L. Petruck (Eds.), Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018) (pp. 178–184). Santa Fe, New Mexico, USA: Association for Computational Linguistics.
```
@inproceedings{haagsma-etal-2018-side,
  title = {The Other Side of the Coin: Unsupervised Disambiguation of Potentially Idiomatic Expressions by Contrasting Senses},
  author = {Haagsma, Hessel and Nissim, Malvina and Bos, Johan},
  editor = {Savary, Agata and Ramisch, Carlos and Hwang, Jena D. and Schneider, Nathan and Andresen, Melanie and Pradhan, Sameer and Petruck, Miriam R. L.},
  booktitle = {Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions ({LAW}-{MWE}-{C}x{G}-2018)},
  month = aug,
  year = {2018},
  address = {Santa Fe, New Mexico, USA},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W18-4919},
  pages = {178--184}
}
```
Disambiguation of potentially idiomatic expressions involves determining the sense of a potentially idiomatic expression in a given context, e.g. determining that make hay in ‘Investment banks made hay while takeovers shone.’ is used in a figurative sense. This enables automatic interpretation of idiomatic expressions, which is important for applications like machine translation and sentiment analysis. In this work, we present an unsupervised approach for English that makes use of literalisations of idiom senses to improve disambiguation, which is based on the lexical cohesion graph-based method by Sporleder and Li (2009). Experimental results show that, while literalisation carries novel information, its performance falls short of that of state-of-the-art unsupervised methods.

Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., & Nissim, M. (2018). Simply the best: Minimalist system trumps complex models in author profiling. International Conference of the Cross-Language Evaluation Forum for European Languages, 143–156. Springer International Publishing Cham.

@inproceedings{basile2018simply,
  title = {Simply the best: Minimalist system trumps complex models in author profiling},
  author = {Basile, Angelo and Dwyer, Gareth and Medvedeva, Maria and Rawee, Josine and Haagsma, Hessel and Nissim, Malvina},
  booktitle = {International Conference of the Cross-Language Evaluation Forum for European Languages},
  pages = {143--156},
  year = {2018},
  organization = {Springer International Publishing Cham}
}

Bai, X., Merenda, F., Zaghi, C., Caselli, T., & Nissim, M. (2018). Rug at germeval: Detecting offensive speech in German social media. Proceedings of GermEval 2018. Verlag der Österreichischen Akademie der Wissenschaften.

@inproceedings{bai2018rug-Germ,
  title = {Rug at germeval: Detecting offensive speech in German social media},
  author = {Bai, Xiaoyu and Merenda, Flavio and Zaghi, Claudia and Caselli, Tommaso and Nissim, Malvina},
  booktitle = {Proceedings of GermEval~2018},
  year = {2018},
  publisher = {Verlag der {\"O}sterreichischen Akademie der Wissenschaften}
}

Merenda, F., Zaghi, C., Caselli, T., & Nissim, M. (2018). Source-driven representations for hate speech detection. Proceedings of CLiC-It 2018.

@inproceedings{merenda2018source,
  title = {Source-driven representations for hate speech detection},
  author = {Merenda, Flavio and Zaghi, Claudia and Caselli, Tomasso and Nissim, Malvina},
  booktitle = {Proceedings of CLiC-it~2018},
  year = {2018}
}

Basile, V., Novielli, N., Croce, D., Barbieri, F., Nissim, M., & Patti, V. (2018). Sentiment polarity classification at evalita: Lessons learned and open challenges. IEEE Transactions on Affective Computing, 12, 466–478.

@article{basile2018sentiment,
  title = {Sentiment polarity classification at evalita: Lessons learned and open challenges},
  author = {Basile, Valerio and Novielli, Nicole and Croce, Danilo and Barbieri, Francesco and Nissim, Malvina and Patti, Viviana},
  journal = {IEEE Transactions on Affective Computing},
  volume = {12},
  number = {2},
  pages = {466--478},
  year = {2018},
  publisher = {IEEE}
}

Dell’Orletta, F., & Nissim, M. (2018). Overview of the evalita 2018 cross-genre gender prediction (gxg) task. EVALITA Evaluation of NLP and Speech Tools for Italian.

@inproceedings{dell2018overview,
  title = {Overview of the evalita 2018 cross-genre gender prediction (gxg) task},
  author = {Dell’Orletta, Felice and Nissim, Malvina},
  booktitle = {EVALITA Evaluation of NLP and Speech Tools for Italian},
  year = {2018}
}

Basile, A., Caselli, T., Merenda, F., & Nissim, M. (2018). Facebook reactions as controversy proxies: Predictive models over Italian news. IJCoL. Italian Journal of Computational Linguistics, 4, 73–89.

@article{basile2018facebook,
  title = {Facebook reactions as controversy proxies: Predictive models over Italian news},
  author = {Basile, Angelo and Caselli, Tommaso and Merenda, Flavio and Nissim, Malvina},
  journal = {IJCoL. Italian Journal of Computational Linguistics},
  volume = {4},
  number = {4-2},
  pages = {73--89},
  year = {2018},
  publisher = {Accademia University Press}
}

Bai, X., Merenda, F., Zaghi, C., Caselli, T., & Nissim, M. (2018). Rug at EVALITA 2018: Hate speech detection in italian social media. EVALITA 2018. CEUR Workshop Proceedings (CEUR-WS. org).

@inproceedings{bai2018rug,
  title = {{Rug at EVALITA} 2018: Hate speech detection in italian social media},
  author = {Bai, Xiaoyu and Merenda, Flavio and Zaghi, Claudia and Caselli, Tomasso and Nissim, Malvina},
  booktitle = {EVALITA 2018},
  year = {2018},
  organization = {CEUR Workshop Proceedings (CEUR-WS. org)}
}

Basili, R., Nissim, M., & Satta, G. (2018). CLiC-it 2017: A Retrospective. IJCoL. Italian Journal of Computational Linguistics, 4, 77–88.

@article{basili2018clic,
  title = {CLiC-it 2017: A Retrospective},
  author = {Basili, Roberto and Nissim, Malvina and Satta, Giorgio},
  journal = {IJCoL. Italian Journal of Computational Linguistics},
  volume = {4},
  number = {4-1},
  pages = {77--88},
  year = {2018},
  publisher = {Accademia University Press}
}

2017

Medvedeva, M., Haagsma, H., & Nissim, M. (2017). An analysis of cross-genre and in-genre performance for author profiling in social media. Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11–14, 2017, Proceedings 8, 211–223. Springer International Publishing.

@inproceedings{medvedeva2017analysis,
  title = {An analysis of cross-genre and in-genre performance for author profiling in social media},
  author = {Medvedeva, Maria and Haagsma, Hessel and Nissim, Malvina},
  booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11--14, 2017, Proceedings 8},
  pages = {211--223},
  year = {2017},
  organization = {Springer International Publishing}
}

van der Goot, R., Plank, B., & Nissim, M. (2017). To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging. In L. Derczynski, W. Xu, A. Ritter, & T. Baldwin (Eds.), Proceedings of the 3rd Workshop on Noisy User-generated Text (pp. 31–39). Copenhagen, Denmark: Association for Computational Linguistics.
```
@inproceedings{van-der-goot-etal-2017-normalize,
  title = {To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging},
  author = {van der Goot, Rob and Plank, Barbara and Nissim, Malvina},
  editor = {Derczynski, Leon and Xu, Wei and Ritter, Alan and Baldwin, Tim},
  booktitle = {Proceedings of the 3rd Workshop on Noisy User-generated Text},
  month = sep,
  year = {2017},
  address = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W17-4404},
  doi = {10.18653/v1/W17-4404},
  pages = {31--39}
}
```
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data? To the best of our knowledge, little is known on the actual impact of normalization in a real-world scenario, where gold error detection is not available. We investigate the effect of automatic normalization on POS tagging of tweets. We also compare normalization to strategies that leverage large amounts of unlabeled data kept in its raw form. Our results show that normalization helps, but does not add consistently beyond just word embedding layer initialization. The latter approach yields a tagging model that is competitive with a Twitter state-of-the-art tagger.

Kulmizev, A., Blankers, B., Bjerva, J., Nissim, M., van Noord, G., Plank, B., & Wieling, M. (2017). The power of character n-grams in native language identification. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 382–389.

@inproceedings{kulmizev2017power,
  title = {The power of character n-grams in native language identification},
  author = {Kulmizev, Artur and Blankers, Bo and Bjerva, Johannes and Nissim, Malvina and van Noord, Gertjan and Plank, Barbara and Wieling, Martijn},
  booktitle = {Proceedings of the 12th workshop on innovative use of NLP for building educational applications},
  pages = {382--389},
  year = {2017}
}

Nissim, M., Abzianidze, L., Evang, K., Van Der Goot, R., Haagsma, H., Plank, B., & Wieling, M. (2017). Sharing is caring: The future of shared tasks. Computational Linguistics, 43, 897–904.

@article{nissim2017sharing,
  title = {Sharing is caring: The future of shared tasks},
  author = {Nissim, Malvina and Abzianidze, Lasha and Evang, Kilian and Van Der Goot, Rob and Haagsma, Hessel and Plank, Barbara and Wieling, Martijn},
  journal = {Computational Linguistics},
  volume = {43},
  number = {4},
  pages = {897--904},
  year = {2017},
  publisher = {MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info~…}
}

Basile, P., Nissim, M., Patti, V., Sprugnoli, R., & Cutugno, F. (2017). EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition. ITALIAN JOURNAL OF COMPUTATIONAL LINGUISTICS, 2017, 93–127.

@article{basile2017evalita,
  title = {EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition},
  author = {Basile, Pierpaolo and Nissim, Malvina and Patti, Viviana and Sprugnoli, Rachele and Cutugno, Francesco},
  journal = {ITALIAN JOURNAL OF COMPUTATIONAL LINGUISTICS},
  volume = {2017},
  number = {1},
  pages = {93--127},
  year = {2017},
  publisher = {ITA}
}

Haagsma, H., & Nissim, M. (2017). A Critical Assessment of a Method for Detecting Diachronic Meaning Shifts: Lessons Learnt from Experiments on Dutch. Computational Linguistics in the Netherlands Journal, 7, 65–78.

@article{haagsma2017critical,
  title = {A Critical Assessment of a Method for Detecting Diachronic Meaning Shifts: Lessons Learnt from Experiments on {Dutch}},
  author = {Haagsma, Hessel and Nissim, Malvina},
  journal = {Computational Linguistics in the Netherlands Journal},
  volume = {7},
  pages = {65--78},
  year = {2017}
}

Nissim, M., & Patti, V. (2017). Semantic aspects in sentiment analysis. In Sentiment analysis in social networks (pp. 31–48). Morgan Kaufmann.

@incollection{nissim2017semantic,
  title = {Semantic aspects in sentiment analysis},
  author = {Nissim, Malvina and Patti, Viviana},
  booktitle = {Sentiment analysis in social networks},
  pages = {31--48},
  year = {2017},
  publisher = {Morgan Kaufmann}
}

Nissim, M., & Pietrandrea, P. (2017). MODAL: A multilingual corpus annotated for modality. Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-It 2017), CEUR Proceedings, 2006.

@inproceedings{nissim2017modal,
  title = {MODAL: A multilingual corpus annotated for modality},
  author = {Nissim, Malvina and Pietrandrea, Paola},
  booktitle = {Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), CEUR Proceedings},
  volume = {2006},
  year = {2017}
}

Basile, A., Caselli, T., & Nissim, M. (2017). Predicting Controversial News Using Facebook Reactions. Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-It 2017), CEUR Proceedings, 2006.

@inproceedings{basile2017predicting,
  title = {Predicting Controversial News Using Facebook Reactions},
  author = {Basile, Angelo and Caselli, Tommaso and Nissim, Malvina},
  booktitle = {Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), CEUR Proceedings},
  volume = {2006},
  year = {2017}
}

Basile, P., Basile, V., Nissim, M., Novielli, N., Patti, V., & others. (2017). Sentiment Analysis of Microblogging Data. In Encyclopedia of Social Network Analysis and Mining (pp. 1–17). Springer Science+ Business Media.

@incollection{basile2017sentiment,
  title = {Sentiment Analysis of Microblogging Data},
  author = {Basile, Pierpaolo and Basile, Valerio and Nissim, Malvina and Novielli, Nicole and Patti, Viviana and others},
  booktitle = {Encyclopedia of Social Network Analysis and Mining},
  pages = {1--17},
  year = {2017},
  publisher = {Springer Science+ Business Media}
}

Lenci, A., Masini, F., Nissim, M., Castagnoli, S., Lebani, G. E., Passaro, L. C., & Senaldi, M. S. G. (2017). How to harvest Word Combinations from corpora: Methods, evaluation and perspectives. Studi e Saggi Linguistici, 55, 45–68.

@article{lenci2017harvest,
  title = {How to harvest Word Combinations from corpora: Methods, evaluation and perspectives},
  author = {Lenci, Alessandro and Masini, Francesca and Nissim, Malvina and Castagnoli, Sara and Lebani, Gianluca E and Passaro, Lucia C and Senaldi, Marco SG},
  journal = {Studi e saggi linguistici},
  volume = {55},
  number = {2},
  pages = {45--68},
  year = {2017},
  publisher = {Edizioni ETS}
}

Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., & Nissim, M. (2017). N-gram: New groningen author-profiling model. Working Notes of CLEF 2017-Conference and Labs of the Evaluation Forum.

@inproceedings{basile2017n,
  title = {N-gram: New groningen author-profiling model},
  author = {Basile, Agenlo and Dwyer, Gareth and Medvedeva, Maria and Rawee, Josine and Haagsma, Hessel and Nissim, Malvina},
  booktitle = {Working Notes of CLEF 2017-Conference and Labs of the Evaluation Forum},
  year = {2017}
}

2016

Sara, C., Lebani, G., Francesca, M., Malvina, N., Lucia, P., & others. (2016). POS-patterns or Syntax? Comparing methods for extracting Word Combinations. In Computerised and corpus-based approaches to phraseology: Monolingual and multilingual perspectives (pp. 116–128). Tradulex.

@incollection{sara2016pos,
  title = {POS-patterns or Syntax? Comparing methods for extracting Word Combinations},
  author = {Sara, Castagnoli and Lebani, Gianluca and Francesca, Masini and Malvina, Nissim and Lucia, Passaro and others},
  booktitle = {Computerised and corpus-based approaches to phraseology: Monolingual and multilingual perspectives},
  pages = {116--128},
  year = {2016},
  publisher = {Tradulex}
}

Kreutz, T., & Nissim, M. (2016). Catching Events in the Twitter Stream: A Showcase of Student Projects. SIDEWAYS@LREC, 14–18.

@inproceedings{kreutz2016catching,
  title = {Catching Events in the Twitter Stream: A Showcase of Student Projects.},
  author = {Kreutz, Tim and Nissim, Malvina},
  booktitle = {SIDEWAYS@LREC},
  pages = {14--18},
  year = {2016}
}

Kloppenburg, L., & Nissim, M. (2016). Leveraging Native Data to Correct Preposition Errors in Learners’ Dutch. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2819–2824.

@inproceedings{kloppenburg2016leveraging,
  title = {Leveraging Native Data to Correct Preposition Errors in Learners’ Dutch},
  author = {Kloppenburg, Lennart and Nissim, Malvina},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)},
  pages = {2819--2824},
  year = {2016}
}

op Vollenbroek, M. B., Carlotto, T., Kreutz, T., Medvedeva, M., Pool, C., Bjerva, J., … Nissim, M. (2016). Gronup: Groningen user profiling. Notebook for PAN at CLEF.

@inproceedings{op2016gronup,
  title = {Gronup: Groningen user profiling},
  author = {op Vollenbroek, Mart Busger and Carlotto, Talvany and Kreutz, Tim and Medvedeva, Maria and Pool, Chris and Bjerva, Johannes and Haagsma, Hessel and Nissim, Malvina},
  booktitle = {Notebook for PAN at CLEF},
  year = {2016}
}

Pool, C., & Nissim, M. (2016). Distant supervision for emotion detection using Facebook reactions. In M. Nissim, V. Patti, & B. Plank (Eds.), Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES) (pp. 30–39). Osaka, Japan: The COLING 2016 Organizing Committee.
```
@inproceedings{pool-nissim-2016-distant,
  title = {Distant supervision for emotion detection using {F}acebook reactions},
  author = {Pool, Chris and Nissim, Malvina},
  editor = {Nissim, Malvina and Patti, Viviana and Plank, Barbara},
  booktitle = {Proceedings of the Workshop on Computational Modeling of People{'}s Opinions, Personality, and Emotions in Social Media ({PEOPLES})},
  month = dec,
  year = {2016},
  address = {Osaka, Japan},
  publisher = {The COLING 2016 Organizing Committee},
  url = {https://aclanthology.org/W16-4304},
  pages = {30--39}
}
```
We exploit the Facebook reaction feature in a distant supervised fashion to train a support vector machine classifier for emotion detection, using several feature combinations and combining different Facebook pages. We test our models on existing benchmarks for emotion detection and show that employing only information that is derived completely automatically, thus without relying on any handcrafted lexicon as it’s usually done, we can achieve competitive results. The results also show that there is large room for improvement, especially by gearing the collection of Facebook pages, with a view to the target domain.

Plank, B., & Nissim, M. (2016). When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter. Proceedings of EVALITA 2016.

@inproceedings{plank2016silver,
  title = {When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter},
  author = {Plank, Barbara and Nissim, Malvina},
  booktitle = {Proceedings of {EVALITA} 2016},
  year = {2016}
}

Del Tredici, M., Nissim, M., & Zaninello, A. (2016). Tracing metaphors in time through self-distance in vector spaces. Proceedings of CLiC-It 2016.

@inproceedings{del2016tracing,
  title = {Tracing metaphors in time through self-distance in vector spaces},
  author = {Del Tredici, Marco and Nissim, Malvina and Zaninello, Andrea},
  booktitle = {Proceedings of CLiC-it 2016},
  year = {2016}
}

Barbieri, F., Basile, V., Croce, D., Nissim, M., Novielli, N., Patti, V., & others. (2016). Overview of the evalita 2016 sentiment polarity classification task. CEUR Workshop Proceedings, 1749. CEUR-WS.

@inproceedings{barbieri2016overview,
  title = {Overview of the evalita 2016 sentiment polarity classification task},
  author = {Barbieri, Francesco and Basile, Valerio and Croce, Danilo and Nissim, Malvina and Novielli, Nicole and Patti, Viviana and others},
  booktitle = {CEUR Workshop Proceedings},
  volume = {1749},
  year = {2016},
  organization = {CEUR-WS}
}

Kloppenburg, L., & Nissim, M. (2016). Native-data models for detecting and correcting errors in learners’ Dutch. Computational Linguistics in the Netherlands Journal, 6, 39–55.

@article{kloppenburg2016native,
  title = {Native-data models for detecting and correcting errors in learners’ Dutch},
  author = {Kloppenburg, Lennart and Nissim, Malvina},
  journal = {Computational Linguistics in the Netherlands Journal},
  volume = {6},
  pages = {39--55},
  year = {2016}
}

Nissim, M., Patti, V., & Plank, B. (2016). Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES). Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES).

@inproceedings{nissim2016proceedings,
  title = {Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)},
  author = {Nissim, Malvina and Patti, Viviana and Plank, Barbara},
  booktitle = {Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)},
  year = {2016}
}

Ghia, E., Kloppenburg, L., Nissim, M., Pietrandrea, P., & Cervoni, V. (2016). A construction-centered approach to the annotation of modality. Bunt, H.(a Cura Di), Proceedings of the 12th ISO Workshop on Interoperable Semantic Annotation. Portoroz, 29.

@inproceedings{ghia2016construction,
  title = {A construction-centered approach to the annotation of modality},
  author = {Ghia, Elisa and Kloppenburg, Lennart and Nissim, Malvina and Pietrandrea, Paola and Cervoni, Valerio},
  booktitle = {Bunt, H.(a cura di), Proceedings of the 12th ISO Workshop on Interoperable Semantic Annotation. Portoroz},
  volume = {29},
  year = {2016}
}

Basile, P., Cutugno, F., Nissim, M., Patti, V., Sprugnoli, R., & others. (2016). Preface to the EVALITA 2016 Proceedings. CEUR WORKSHOP PROCEEDINGS, 1749. CEUR-WS.

@inproceedings{basile2016preface,
  title = {Preface to the EVALITA 2016 Proceedings},
  author = {Basile, Pierpaolo and Cutugno, Franco and Nissim, Malvina and Patti, Viviana and Sprugnoli, Rachele and others},
  booktitle = {CEUR WORKSHOP PROCEEDINGS},
  volume = {1749},
  year = {2016},
  organization = {CEUR-WS}
}

Basile, P., Cutugno, F., Nissim, M., Patti, V., Sprugnoli, R., & others. (2016). EVALITA 2016: Overview of the 5th evaluation campaign of natural language processing and speech tools for Italian. CEUR Workshop Proceedings, 1749, 1–4. CEUR-WS.

@inproceedings{basile2016evalita,
  title = {EVALITA 2016: Overview of the 5th evaluation campaign of natural language processing and speech tools for Italian},
  author = {Basile, Pierpaolo and Cutugno, Franco and Nissim, Malvina and Patti, Viviana and Sprugnoli, Rachele and others},
  booktitle = {CEUR Workshop Proceedings},
  volume = {1749},
  pages = {1--4},
  year = {2016},
  organization = {CEUR-WS}
}

2015

Bos, J., & Nissim, M. (2015). Uncovering Noun-Noun Compound Relations by Gamification. In B. Megyesi (Ed.), Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) (pp. 251–255). Vilnius, Lithuania: Linköping University Electronic Press, Sweden.

@inproceedings{bos-nissim-2015-uncovering,
  title = {Uncovering Noun-Noun Compound Relations by Gamification},
  author = {Bos, Johan and Nissim, Malvina},
  editor = {Megyesi, Be{\'a}ta},
  booktitle = {Proceedings of the 20th Nordic Conference of Computational Linguistics ({NODALIDA} 2015)},
  month = may,
  year = {2015},
  address = {Vilnius, Lithuania},
  publisher = {Link{\"o}ping University Electronic Press, Sweden},
  url = {https://aclanthology.org/W15-1832},
  pages = {251--255}
}

Lenci, A., Lebani, G., Senaldi, M., Castagnoli, S., Masini, F., & Nissim, M. (2015). Mapping the Constructicon with SYMPAThy. Italian Word Combinations between fixedness and productivity. CEUR Workshop Proceedings, 1347, 144–149. CEUR-WS. org.

@inproceedings{lenci2015mapping,
  title = {Mapping the Constructicon with SYMPAThy. Italian Word Combinations between fixedness and productivity},
  author = {Lenci, Alessandro and Lebani, Gianluca and Senaldi, Marco and Castagnoli, Sara and Masini, Francesca and Nissim, Malvina},
  booktitle = {CEUR Workshop Proceedings},
  volume = {1347},
  pages = {144--149},
  year = {2015},
  organization = {CEUR-WS. org}
}

Hürlimann, M., Weck, B., van den Berg, E., Suster, S., & Nissim, M. (2015). GLAD: Groningen Lightweight Authorship Detection. CLEF (Working Notes). Toulouse.

@inproceedings{hurlimann2015glad,
  title = {GLAD: Groningen Lightweight Authorship Detection.},
  author = {H{\"u}rlimann, Manuela and Weck, Benno and van den Berg, Esther and Suster, Simon and Nissim, Malvina},
  booktitle = {CLEF (working notes)},
  year = {2015},
  organization = {Toulouse}
}

Basile, P., Basile, V., Nissim, M., & Novielli, N. (2015). Deep tweets: from entity linking to sentiment analysis. Proceedings of the Italian Computational Linguistics Conference (CLiC-It 2015).

@inproceedings{basile2015deep,
  title = {Deep tweets: from entity linking to sentiment analysis},
  author = {Basile, Pierpaolo and Basile, Valerio and Nissim, Malvina and Novielli, Nicole},
  booktitle = {Proceedings of the Italian Computational Linguistics Conference (CLiC-it 2015)},
  year = {2015}
}

Nissim, M., Castagnoli, S., Masini, F., Gianluca, L., & Passaro, L. (2015). Automatic extraction of Word Combinations from corpora: evaluating methods and benchmarks. Proceedings of the Second Italian Conference on Computational Linguistics CLiC-It 2015. Academia University Press.

@inproceedings{malvina2015automatic,
  title = {Automatic extraction of Word Combinations from corpora: evaluating methods and benchmarks},
  author = {Nissim, Malvina and Castagnoli, Sara and Masini, Francesca and Gianluca, Lebani and Passaro, Lucia},
  booktitle = {Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015},
  year = {2015},
  organization = {Academia University Press}
}

Pavlick, E., Bos, J., Nissim, M., Beller, C., Van Durme, B., & Callison-Burch, C. (2015). Adding Semantics to Data-Driven Paraphrasing. In C. Zong & M. Strube (Eds.), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1512–1522). Beijing, China: Association for Computational Linguistics.

@inproceedings{pavlick-etal-2015-adding,
  title = {Adding Semantics to Data-Driven Paraphrasing},
  author = {Pavlick, Ellie and Bos, Johan and Nissim, Malvina and Beller, Charley and Van Durme, Benjamin and Callison-Burch, Chris},
  editor = {Zong, Chengqing and Strube, Michael},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  month = jul,
  year = {2015},
  address = {Beijing, China},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/P15-1146},
  doi = {10.3115/v1/P15-1146},
  pages = {1512--1522}
}

2014

Nissim, M., Castagnoli, S., & Masini, F. (2014). Extracting MWEs from Italian corpora: A case study for refining the POS-pattern methodology. In V. Kordoni, M. Egg, A. Savary, E. Wehrli, & S. Evert (Eds.), Proceedings of the 10th Workshop on Multiword Expressions (MWE) (pp. 57–61). Gothenburg, Sweden: Association for Computational Linguistics.

@inproceedings{castagnoli-2014-extracting,
  title = {Extracting {MWE}s from {I}talian corpora: A case study for refining the {POS}-pattern methodology},
  author = {Nissim, Malvina and Castagnoli, Sara and Masini, Francesca},
  editor = {Kordoni, Valia and Egg, Markus and Savary, Agata and Wehrli, Eric and Evert, Stefan},
  booktitle = {Proceedings of the 10th Workshop on Multiword Expressions ({MWE})},
  month = apr,
  year = {2014},
  address = {Gothenburg, Sweden},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W14-0809},
  doi = {10.3115/v1/W14-0809},
  pages = {57--61}
}

Bjerva, J., Bos, J., van der Goot, R., & Nissim, M. (2014). The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity. In P. Nakov & T. Zesch (Eds.), Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 642–646). Dublin, Ireland: Association for Computational Linguistics.

@inproceedings{bjerva-etal-2014-meaning,
  title = {The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity},
  author = {Bjerva, Johannes and Bos, Johan and van der Goot, Rob and Nissim, Malvina},
  editor = {Nakov, Preslav and Zesch, Torsten},
  booktitle = {Proceedings of the 8th International Workshop on Semantic Evaluation ({S}em{E}val 2014)},
  month = aug,
  year = {2014},
  address = {Dublin, Ireland},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/S14-2114},
  doi = {10.3115/v1/S14-2114},
  pages = {642--646}
}

Lenci, A., Lebani, G. E., Castagnoli, S., Masini, F., & Nissim, M. (2014). SYMPAThy: Towards a comprehensive approach to the extraction of Italian Word Combinations. Proceedings of the First Italian Conference on Computational Linguistics CLiC-It 2014 & and of the Fourth International Workshop EVALITA 2014: 9-11 December 2014, Pisa, 234–238. Pisa University Press.

@inproceedings{lenci2014sympathy,
  title = {SYMPAThy: Towards a comprehensive approach to the extraction of Italian Word Combinations},
  author = {Lenci, Alessandro and Lebani, Gianluca E and Castagnoli, Sara and Masini, Francesca and Nissim, Malvina},
  booktitle = {Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 \& and of the Fourth International Workshop EVALITA 2014: 9-11 December 2014, Pisa},
  pages = {234--238},
  year = {2014},
  organization = {Pisa University Press}
}

Basile, V., Bolioli, A., Nissim, M., Patti, V., & Rosso, P. (2014). Overview of the Evalita 2014 SENTIment POLarity Classification Task. Proceedings of EVALITA 2014, 50–57.

@inproceedings{basile2014overview,
  title = {Overview of the Evalita 2014 SENTIment POLarity Classification Task},
  author = {Basile, Valerio and Bolioli, Andrea and Nissim, Malvina and Patti, Viviana and Rosso, Paolo},
  booktitle = {Proceedings of EVALITA 2014},
  pages = {50--57},
  year = {2014}
}

Del Tredici, M., & Nissim, M. (2014). A Modular System for Rule-based Text Categorisation. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, … S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: European Language Resources Association (ELRA).
```
@inproceedings{del-tredici-nissim-2014-modular,
  title = {A Modular System for Rule-based Text Categorisation},
  author = {Del Tredici, Marco and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Loftsson, Hrafn and Maegaard, Bente and Mariani, Joseph and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)},
  month = may,
  year = {2014},
  address = {Reykjavik, Iceland},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/941_Paper.pdf}
}
```
We introduce a modular rule-based approach to text categorisation which is more flexible and less time consuming to build than a standard rule-based system because it works with a hierarchical structure and allows for re-usability of rules. When compared to currently more wide-spread machine learning models on a case study, our modular system shows competitive results, and it has the advantage of reducing manual effort over time, since only fewer rules must be written when moving to a (partially) new domain, while annotation of training data is always required in the same amount.

Mariotti, A., & Nissim, M. (2014). Parting ways with the partitive view: a corpus-based account of the Italian particle “ne.” Proceedings of the First Italian Conference on Computational Linguistics CLiC-It 2014 & and of the Fourth International Workshop EVALITA, 249–253.

@inproceedings{mariotti2014parting,
  title = {Parting ways with the partitive view: a corpus-based account of the Italian particle “ne”},
  author = {Mariotti, Alice and Nissim, Malvina},
  booktitle = {Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 \& and of the Fourth International Workshop EVALITA},
  pages = {249--253},
  year = {2014}
}

Basile, V., Bolioli, A., Nissim, M., Patti, V., & Rosso, P. (2014). Evalita 2014 Sentipolc task: Task guidelines. Technical report.

@techreport{basile2014evalita-guidelines,
  title = {Evalita 2014 Sentipolc task: Task guidelines},
  author = {Basile, Valerio and Bolioli, Andrea and Nissim, Malvina and Patti, Viviana and Rosso, Paolo},
  year = {2014},
  institution = {Technical report}
}

Basile, V., Bolioli, A., Bosco, C., Nissim, M., Patti, V., Rosso, P., … others. (2014). Evalita 2014: Sentipolc Twitter dataset. Dipartimento di Informatica, Università degli Studi di Torino.

@misc{basile2014evalita,
  title = {Evalita 2014: Sentipolc Twitter dataset},
  author = {Basile, Valerio and Bolioli, A and Bosco, Cristina and Nissim, M and Patti, Viviana and Rosso, P and Rabellino, Sergio and others},
  year = {2014},
  publisher = {Dipartimento di Informatica, Universit{\`a} degli Studi di Torino}
}

Castagnoli, S., Nissim, M., Masini, F., & others. (2014). Metodi e risorse computazionali per l’estrazione di combinazioni di parole da corpora. Alma Mater Studiorum-Università di Bologna.

@techreport{castagnoli2014metodi,
  title = {Metodi e risorse computazionali per l’estrazione di combinazioni di parole da corpora},
  author = {Castagnoli, Sara and Nissim, Malvina and Masini, Francesca and others},
  year = {2014},
  institution = {Alma Mater Studiorum-Universit{\`a} di Bologna}
}

2013

Nissim, M., Pietrandrea, P., Sansò, A., & Mauri, C. (2013). Cross-linguistic annotation of modality: a data-driven hierarchical model. Proceedings of the 9th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation, 7–14.

@inproceedings{nissim2013cross,
  title = {Cross-linguistic annotation of modality: a data-driven hierarchical model},
  author = {Nissim, Malvina and Pietrandrea, Paola and Sans\`{o}, Andrea and Mauri, Caterina},
  booktitle = {Proceedings of the 9th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation},
  pages = {7--14},
  year = {2013}
}

Oltramari, A., Vetere, G., Chiari, I., Jezek, E., Zanzotto, F. M., Nissim, M., & Gangemi, A. (2013). Senso Comune: A collaborative knowledge resource for italian. In The People’s Web Meets NLP: Collaboratively Constructed Language Resources (pp. 45–67). Springer Berlin Heidelberg.

@incollection{oltramari2013senso,
  title = {Senso Comune: A collaborative knowledge resource for italian},
  author = {Oltramari, Alessandro and Vetere, Guido and Chiari, Isabella and Jezek, Elisabetta and Zanzotto, Fabio Massimo and Nissim, Malvina and Gangemi, Aldo},
  booktitle = {The People’s Web Meets NLP: Collaboratively Constructed Language Resources},
  pages = {45--67},
  year = {2013},
  publisher = {Springer Berlin Heidelberg}
}

Basile, V., & Nissim, M. (2013). Sentiment analysis on Italian tweets. Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 100–107.

@inproceedings{basile2013sentiment,
  title = {Sentiment analysis on Italian tweets},
  author = {Basile, Valerio and Nissim, Malvina},
  booktitle = {Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  pages = {100--107},
  year = {2013},
  url = {https://aclanthology.org/W13-1614.pdf}
}

Nissim, M., & Zaninello, A. (2013). A Repository of Variation Patterns for Multiword Expressions. Proceedings of the 9th Workshop on Multiword Expressions, 101–105.

@inproceedings{nissim2013repository,
  title = {A Repository of Variation Patterns for Multiword Expressions},
  author = {Nissim, Malvina and Zaninello, Andrea},
  booktitle = {Proceedings of the 9th Workshop on Multiword Expressions},
  pages = {101--105},
  year = {2013}
}

Nissim, M., & Zaninello, A. (2013). Modeling the internal variability of multiword expressions through a pattern-based method. ACM Transactions on Speech and Language Processing (TSLP), 10, 1–26.

@article{nissim2013modeling,
  title = {Modeling the internal variability of multiword expressions through a pattern-based method},
  author = {Nissim, Malvina and Zaninello, Andrea},
  journal = {ACM Transactions on Speech and Language Processing (TSLP)},
  volume = {10},
  number = {2},
  pages = {1--26},
  year = {2013},
  publisher = {ACM New York, NY, USA},
  url = {https://dl.acm.org/doi/10.1145/2483691.2483696}
}

2012

Bos, J., Evang, K., & Nissim, M. (2012). Annotating semantic roles in a lexicalised grammar environment. Proceedings of the Eighth Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-8), 9–12.

@inproceedings{bos2012annotating,
  title = {Annotating semantic roles in a lexicalised grammar environment},
  author = {Bos, Johan and Evang, Kilian and Nissim, Malvina},
  booktitle = {Proceedings of the Eighth Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-8)},
  pages = {9--12},
  year = {2012}
}

2011

Grandi, N., Nissim, M., & Tamburini, F. (2011). Noun-clad adjectives. On the adjectival status of non-head constituents of Italian attributive compounds. Lingue e Linguaggio, 10, 161–160.

@article{grandi2011noun,
  title = {Noun-clad adjectives. On the adjectival status of non-head constituents of Italian attributive compounds},
  author = {Grandi, Nicola and Nissim, Malvina and Tamburini, Fabio},
  journal = {Lingue e linguaggio},
  volume = {10},
  number = {1},
  pages = {161--0},
  year = {2011},
  publisher = {Societ{\`a} editrice il Mulino},
  url = {https://www.rivisteweb.it/doi/10.1418/34543},
  issn = {1720-9331}
}

Nissim, M., & Zaninello, A. (2011). A quantitative study on the morphology of Italian multiword expressions. Lingue e Linguaggio, 10, 283–300.

@article{nissim2011quantitative,
  title = {A quantitative study on the morphology of Italian multiword expressions},
  author = {Nissim, Malvina and Zaninello, Andrea},
  journal = {Lingue e linguaggio},
  volume = {10},
  number = {2},
  pages = {283--300},
  year = {2011},
  publisher = {Societ{\`a} editrice il Mulino},
  url = {https://www.rivisteweb.it/doi/10.1418/35844}
}

2010

Zaninello, A., & Nissim, M. (2010). Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, … D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta: European Language Resources Association (ELRA).
```
@inproceedings{zaninello-nissim-2010-creation,
  title = {Creation of Lexical Resources for a Characterisation of Multiword Expressions in {I}talian},
  author = {Zaninello, Andrea and Nissim, Malvina},
  editor = {Calzolari, Nicoletta and Choukri, Khalid and Maegaard, Bente and Mariani, Joseph and Odijk, Jan and Piperidis, Stelios and Rosner, Mike and Tapias, Daniel},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation ({LREC}'10)},
  month = may,
  year = {2010},
  address = {Valletta, Malta},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2010/pdf/567_Paper.pdf}
}
```
The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions, revisiting lexicographers’s choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.

2009

Markert, K., & Nissim, M. (2009). Data and models for metonymy resolution. Language Resources and Evaluation, 43, 123–138.

@article{markert2009data,
  title = {Data and models for metonymy resolution},
  author = {Markert, Katja and Nissim, Malvina},
  journal = {Language Resources and Evaluation},
  volume = {43},
  pages = {123--138},
  year = {2009},
  publisher = {Springer Netherlands},
  url = {https://doi.org/10.1007/s10579-009-9087-y}
}

Celli, F., & Nissim, M. (2009). Automatic identification of semantic relations in Italian complex nominals. In H. Bunt (Ed.), Proceedings of the Eight International Conference on Computational Semantics (pp. 45–60). Tilburg, The Netherlands: Association for Computational Linguistics.

@inproceedings{celli-nissim-2009-automatic,
  title = {Automatic identification of semantic relations in {I}talian complex nominals},
  author = {Celli, Fabio and Nissim, Malvina},
  editor = {Bunt, Harry},
  booktitle = {Proceedings of the Eight International Conference on Computational Semantics},
  month = jan,
  year = {2009},
  address = {Tilburg, The Netherlands},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W09-3707},
  pages = {45--60}
}

Nissim, M., & Bos, J. (2009). Using the Web as a Corpus in Natural Language Processing. In Linguistica e Modelli Tecnologici di Ricerca (pp. 345–351). Bulzoni.

@incollection{nissim2009using,
  title = {Using the Web as a Corpus in Natural Language Processing},
  author = {Nissim, Malvina and Bos, Johan},
  booktitle = {Linguistica e Modelli Tecnologici di Ricerca},
  pages = {345--351},
  year = {2009},
  publisher = {Bulzoni}
}

Bos, J., Nissim, M., Ahn, B. G., Clark, S., Haggerty, J., Herbelot, A., & Zhang, Y. (2009). From shallow to deep Natural language processing: A hands-on tutorial. Springer.

@misc{bos2009shallow,
  title = {From shallow to deep Natural language processing: A hands-on tutorial},
  author = {Bos, Johan and Nissim, Malvina and Ahn, Byung Gyu and Clark, Stephen and Haggerty, James and Herbelot, Aurelie and Zhang, Yue},
  year = {2009},
  publisher = {Springer}
}

Calhoun, S., Carletta, J., Jurafsky, D., Nissim, M., Ostendorf, M., & Zaenen, A. (2009). NXT switchboard annotations. Linguistic Data Consortium Corpus.

@article{calhoun2009nxt,
  title = {NXT switchboard annotations},
  author = {Calhoun, Sasha and Carletta, Jean and Jurafsky, Daniel and Nissim, Malvina and Ostendorf, Mari and Zaenen, Annie},
  journal = {Linguistic Data Consortium Corpus},
  url = {http://catalog.ldc.upenn.edu/LDC2009T26},
  year = {2009}
}

2008

Bos, J., & Nissim, M. (2008). Combining discourse representation theory with FrameNet. Frames, Corpora, and Knowledge Representation, 169–183.

@article{bos2008combining,
  title = {Combining discourse representation theory with {FrameNet}},
  author = {Bos, Johan and Nissim, Malvina},
  journal = {Frames, Corpora, and Knowledge Representation},
  pages = {169--183},
  year = {2008},
  publisher = {Bononia University Press}
}

Nissim, M., & Perboni, S. (2008). The Italian Particle “ne”: Corpus Construction and Analysis. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). Marrakech, Morocco: European Language Resources Association (ELRA).
```
@inproceedings{nissim-perboni-2008-italian,
  title = {The {I}talian Particle {``}ne{''}: Corpus Construction and Analysis},
  author = {Nissim, Malvina and Perboni, Sara},
  editor = {Calzolari, Nicoletta and Choukri, Khalid and Maegaard, Bente and Mariani, Joseph and Odijk, Jan and Piperidis, Stelios and Tapias, Daniel},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation ({LREC}'08)},
  month = may,
  year = {2008},
  address = {Marrakech, Morocco},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2008/pdf/795_paper.pdf}
}
```
The Italian particle Â“neÂ” exhibits interesting anaphoric properties that have not been yet explored in depth from a corpus and computational linguistic perspective. We provide: (i) an overview of the phenomenon; (ii) a set of annotation schemes for marking up occurrences of Â“neÂ”; (iii) the description of a corpus annotated for this phenomenon ; (iv) a first assessment of the resolution task. We show that the schemes we developed are reliable, and that the actual distribution of partitive and non-partitive uses of Â“neÂ” is inversely proportional to the amount of attention that the two different uses have received in the linguistic literature. As an assessment of the complexity of the resolution task, we find that a recency-based baseline yields an accuracy of less than 30% on both development and test data.

Grover, C., Klein, E., Manning, C., Markert, K., & Nissim, M. (2008). Machine learning of entity recognizers for modular retargetable natural language processing. University of Edinburgh.

@techreport{grover2008machine,
  title = {Machine learning of entity recognizers for modular retargetable natural language processing},
  author = {Grover, Claire and Klein, Ewan and Manning, Chris and Markert, Katja and Nissim, Malvina},
  year = {2008},
  institution = {University of Edinburgh}
}

2007

Gangemi, A., Lehmann, J., Presutti, V., Nissim, M., & Catenacci, C. (2007). C-ODO: an OWL Meta-model for Collaborative Ontology Design. In N. Noy, H. Alani, G. Stumme, P. Mika, Y. Sure, & D. Vrandecic (Eds.), Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007) (Vol. 273).

@inproceedings{gangemi2007c,
  title = {C-ODO: an OWL Meta-model for Collaborative Ontology Design},
  author = {Gangemi, Aldo and Lehmann, Jos and Presutti, Valentina and Nissim, Malvina and Catenacci, Carola},
  editor = {Noy, N. and Alani, H. and Stumme, G. and Mika, P. and Sure, Y. and Vrandecic, D.},
  booktitle = {Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007)},
  volume = {273},
  year = {2007},
  url = {https://hdl.handle.net/11585/57679}
}

Markert, K., & Nissim, M. (2007). SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007. In E. Agirre, Màrquez Lluı́s, & R. Wicentowski (Eds.), Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007) (pp. 36–41). Prague, Czech Republic: Association for Computational Linguistics.

@inproceedings{markert-nissim-2007-semeval,
  title = {{S}em{E}val-2007 Task 08: Metonymy Resolution at {S}em{E}val-2007},
  author = {Markert, Katja and Nissim, Malvina},
  editor = {Agirre, Eneko and M{\`a}rquez, Llu{\'\i}s and Wicentowski, Richard},
  booktitle = {Proceedings of the Fourth International Workshop on Semantic Evaluations ({S}em{E}val-2007)},
  month = jun,
  year = {2007},
  address = {Prague, Czech Republic},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/S07-1007},
  pages = {36--41}
}

Markert, K., & Nissim, M. (2007). Metonymy resolution at semeval i: Guidelines for participants. Association for Computational Linguistics.

@techreport{markert2007metonymy,
  title = {Metonymy resolution at semeval i: Guidelines for participants},
  author = {Markert, Katja and Nissim, Malvina},
  note = {Rapport technique, SemEval},
  year = {2007},
  institution = {Association for Computational Linguistics}
}

Bos, J., & Nissim, M. (2007). Answer Translation: An Alternative Approach to Cross-Lingual Question Answering. In C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, … M. Stempfhuber (Eds.), Evaluation of Multilingual and Multi-modal Information Retrieval (pp. 290–299). Berlin, Heidelberg: Springer Berlin Heidelberg.
```
@inproceedings{10.1007/978-3-540-74999-8_36,
  author = {Bos, Johan and Nissim, Malvina},
  editor = {Peters, Carol and Clough, Paul and Gey, Fredric C. and Karlgren, Jussi and Magnini, Bernardo and Oard, Douglas W. and de Rijke, Maarten and Stempfhuber, Maximilian},
  title = {Answer Translation: An Alternative Approach to Cross-Lingual Question Answering},
  booktitle = {Evaluation of Multilingual and Multi-modal Information Retrieval},
  year = {2007},
  publisher = {Springer Berlin Heidelberg},
  address = {Berlin, Heidelberg},
  pages = {290--299},
  isbn = {978-3-540-74999-8}
}
```
We approach cross-lingual question answering by using a mono-lingual QA system for the source language and by translating resulting answers into the target language. As far as we are aware, this is the first cross-lingual QA system in the history of CLEF that uses this method—almost without exception, cross-lingual QA systems use translation of the question or query terms instead. We demonstrate the feasibility of our alternative approach by using a mono-lingual QA system for English, and translating answers and finding appropriate documents in Italian and Dutch. For factoid and definition questions, we achieve overall accuracy scores ranging from 13% (EN\textrightarrowNL) to 17% (EN\textrightarrowIT) and lenient accuracy figures from 19% (EN\textrightarrowNL) to 25% (EN\textrightarrowIT). The advantage of this strategy to cross-lingual QA is that translation of answers is easier than translating questions—the disadvantage is that answers might be missing from the source corpus and additional effort is required for finding supporting documents of the target language.

Bos, J., Nissim, M., & others. (2007). Are two heads better than one? Experiments with Italian part-of-speech labelling. Intelligenza Artificiale, 4, 18–19.

@article{bos2007two,
  title = {Are two heads better than one? Experiments with Italian part-of-speech labelling.},
  author = {Bos, J and Nissim, Malvina and others},
  journal = {Intelligenza Artificiale},
  volume = {4},
  pages = {18--19},
  year = {2007}
}

2006

Markert, K., & Nissim, M. (2006). Metonymic proper names: A corpus-based account. In A. Stefanowitsch & S. T. Gries (Eds.), Corpora in Cognitive Linguistics - Corpus-Based Approaches to Syntax and Lexis. Volume I: Metaphor and Metonymy (pp. 152–174). Mouton de Gruyter.
```
@incollection{markert2006metonymic,
  title = {Metonymic proper names: A corpus-based account},
  author = {Markert, Katja and Nissim, Malvina},
  booktitle = {Corpora in Cognitive Linguistics - Corpus-Based Approaches to Syntax and Lexis. Volume I: Metaphor and Metonymy},
  pages = {152-174},
  year = {2006},
  publisher = {Mouton de Gruyter},
  url = {https://doi.org/10.1515/9783110199895},
  editor = {Stefanowitsch, Anatol and Gries, Stefan Th.}
}
```
Many proper names are widely used metonymically. Thus, for example, organisation names can be used for products produced by the organisation, members of an organisation or events associated with the organisation. Their treatment is crucial for many natural language processing tasks like question answering and anaphora resolution. At the moment, language resources do not contain the necessary information for large-scale metonymy processing. As a contribution, we describe a general framework for annotating metonymies in domain- independent text that considers the regularity, productivity and underspecification of metonymic usage. We will then concentrate on two fully worked out annotation schemes for location and organisation names and rigorously evaluate these schemes as to their reliability. We also present a gold standard corpus consisting of 4000 annotated occurrences of location and organisation names in the British National Corpus. We use this corpus to examine the distribution of metonymies as well as for experiments in automatic metonymy resolution.

Bos, J., & Nissim, M. (2006). An Empirical Approach to the Interpretation of Superlatives. In D. Jurafsky & E. Gaussier (Eds.), Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 9–17). Sydney, Australia: Association for Computational Linguistics.

@inproceedings{bos-nissim-2006-empirical,
  title = {An Empirical Approach to the Interpretation of Superlatives},
  author = {Bos, Johan and Nissim, Malvina},
  editor = {Jurafsky, Dan and Gaussier, Eric},
  booktitle = {Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing},
  month = jul,
  year = {2006},
  address = {Sydney, Australia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W06-1602},
  pages = {9--17}
}

Bos, J., & Nissim, M. (2006). Cross-Lingual Question Answering by Answer Translation. CLEF (Working Notes).

@inproceedings{bos2006cross,
  title = {Cross-Lingual Question Answering by Answer Translation.},
  author = {Bos, Johan and Nissim, Malvina},
  booktitle = {CLEF (Working Notes)},
  year = {2006}
}

Alex, B., Nissim, M., & Grover, C. (2006). The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odijk, & D. Tapias (Eds.), Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). Genoa, Italy: European Language Resources Association (ELRA).
```
@inproceedings{alex-etal-2006-impact,
  title = {The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text},
  author = {Alex, Beatrice and Nissim, Malvina and Grover, Claire},
  editor = {Calzolari, Nicoletta and Choukri, Khalid and Gangemi, Aldo and Maegaard, Bente and Mariani, Joseph and Odijk, Jan and Tapias, Daniel},
  booktitle = {Proceedings of the Fifth International Conference on Language Resources and Evaluation ({LREC}{'}06)},
  month = may,
  year = {2006},
  address = {Genoa, Italy},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2006/pdf/398_pdf.pdf}
}
```
In this paper we discuss five different corpora annotated forprotein names. We present several within- and cross-dataset proteintagging experiments showing that different annotation schemes severelyaffect the portability of statistical protein taggers. By means of adetailed error analysis we identify crucial annotation issues thatfuture annotation projects should take into careful consideration.

Nissim, M. (2006). Learning Information Status of Discourse Entities. In D. Jurafsky & E. Gaussier (Eds.), Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 94–102). Sydney, Australia: Association for Computational Linguistics.

@inproceedings{nissim-2006-learning,
  title = {Learning Information Status of Discourse Entities},
  author = {Nissim, Malvina},
  editor = {Jurafsky, Dan and Gaussier, Eric},
  booktitle = {Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing},
  month = jul,
  year = {2006},
  address = {Sydney, Australia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W06-1612},
  pages = {94--102}
}

Iglesias, M., Caracciolo, C., Jaques, Y., Sini, M., Calderini, F., Keizer, J., … others. (2006). User requirements for the fisheries stock depletion alert system. (Rome)(Italy).

@techreport{iglesias2006user,
  title = {User requirements for the fisheries stock depletion alert system},
  author = {Iglesias, M and Caracciolo, C and Jaques, Y and Sini, M and Calderini, F and Keizer, J and Le Hunte Ward, F and Nissim, M and Gangemi, A and others},
  year = {2006},
  publisher = {(Rome)(Italy)},
  institution = {EU Horizon}
}

Catenacci, C., Gangemi, A., Lehmann, J., Nissim, M., Presutti, V., Steve, G., … others. (2006). Design rationales for collaborative development of networked ontologies state of the art and the collaborative ontology design ontology. EU Horizon.

@techreport{catenacci2006design,
  title = {Design rationales for collaborative development of networked ontologies state of the art and the collaborative ontology design ontology},
  author = {Catenacci, Carola and Gangemi, Aldo and Lehmann, Jos and Nissim, Malvina and Presutti, Valentina and Steve, Gerardo and Guarino, Nicola and Masolo, Claudio and Lewen, Holger and Dellschaft, Klaas and others},
  note = {Deliverable d2},
  year = {2006},
  institution = {EU Horizon}
}

Iglesias, M., Caracciolo, C., Jaques, Y., Sini, M., Calderini, F., Keizer, J., … Gangemi, A. (2006). WP7 User requirements for the fisheries stock depletion alert system.

@misc{iglesias2006wp7,
  title = {WP7 User requirements for the fisheries stock depletion alert system},
  author = {Iglesias, Marta and Caracciolo, Caterina and Jaques, Yves and Sini, Margherita and Calderini, Francesco and Keizer, Johannes and Le Hunte Ward, Fynvola and Nissim, Malvina and Gangemi, Aldo},
  year = {2006}
}

2005

Finkel, J., Dingare, S., Manning, C. D., Nissim, M., Alex, B., & Grover, C. (2005). Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 6, 1–9.

@article{finkel2005exploring,
  title = {Exploring the boundaries: gene and protein identification in biomedical text},
  author = {Finkel, Jenny and Dingare, Shipra and Manning, Christopher D and Nissim, Malvina and Alex, Beatrice and Grover, Claire},
  journal = {BMC bioinformatics},
  volume = {6},
  pages = {1--9},
  year = {2005},
  publisher = {BioMed Central},
  url = {https://doi.org/10.1186/1471-2105-6-S1-S5}
}

Markert, K., & Nissim, M. (2005). Comparing Knowledge Sources for Nominal Anaphora Resolution. Computational Linguistics, 31, 367–402.

@article{markert-nissim-2005-comparing,
  title = {Comparing Knowledge Sources for Nominal Anaphora Resolution},
  author = {Markert, Katja and Nissim, Malvina},
  journal = {Computational Linguistics},
  volume = {31},
  number = {3},
  year = {2005},
  url = {https://aclanthology.org/J05-3004},
  doi = {10.1162/089120105774321064},
  pages = {367--402},
  publisher = {MIT Press One Rogers Street, Cambridge, MA}
}

Dingare, S., Nissim, M., Finkel, J., Manning, C., & Grover, C. (2005). A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations. Comparative and Functional Genomics, 6, 77–85.
```
@article{dingare2005system,
  title = {A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations},
  author = {Dingare, Shipra and Nissim, Malvina and Finkel, Jenny and Manning, Christopher and Grover, Claire},
  journal = {Comparative and functional genomics},
  volume = {6},
  number = {1-2},
  pages = {77--85},
  year = {2005},
  publisher = {John Wiley \& Sons, Ltd. Chichester, UK},
  url = {https://doi.org/10.1002/cfg.457}
}
```
We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.

Calhoun, S., Nissim, M., Steedman, M., & Brenier, J. (2005). A Framework for Annotating Information Structure in Discourse. In A. Meyers (Ed.), Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky (pp. 45–52). Ann Arbor, Michigan: Association for Computational Linguistics.

@inproceedings{calhoun-etal-2005-framework,
  title = {A Framework for Annotating Information Structure in Discourse},
  author = {Calhoun, Sasha and Nissim, Malvina and Steedman, Mark and Brenier, Jason},
  editor = {Meyers, Adam},
  booktitle = {Proceedings of the Workshop on Frontiers in Corpus Annotations {II}: Pie in the Sky},
  month = jun,
  year = {2005},
  address = {Ann Arbor, Michigan},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W05-0307},
  pages = {45--52}
}

Ahn, K., Bos, J., Kor, D., Nissim, M., Webber, B. L., & Curran, J. R. (2005). Question Answering with QED at TREC 2005. Proceedings of TREC.

@inproceedings{ahn2005question,
  title = {Question Answering with QED at TREC 2005.},
  author = {Ahn, Kisuh and Bos, Johan and Kor, David and Nissim, Malvina and Webber, Bonnie L and Curran, James R},
  booktitle = {Proceedings of TREC},
  year = {2005}
}

Nissim, M., & Markert, K. (2005). Learning to buy a Renault and talk to BMW: A supervised approach to conventional metonymy. International Workshop on Computational Semantics (IWCS 2005).

@inproceedings{nissim2005learning,
  title = {Learning to buy a Renault and talk to BMW: A supervised approach to conventional metonymy},
  author = {Nissim, Malvina and Markert, Katja},
  booktitle = {International Workshop on Computational Semantics (IWCS 2005)},
  year = {2005}
}

2004

Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., & Sinclair, G. (2004). Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In N. Collier, P. Ruch, & A. Nazarenko (Eds.), Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP) (pp. 91–94). Geneva, Switzerland: COLING.

@inproceedings{finkel-etal-2004-exploiting,
  title = {Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web},
  author = {Finkel, Jenny and Dingare, Shipra and Nguyen, Huy and Nissim, Malvina and Manning, Christopher and Sinclair, Gail},
  editor = {Collier, Nigel and Ruch, Patrick and Nazarenko, Adeline},
  booktitle = {Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications ({NLPBA}/{B}io{NLP})},
  month = aug,
  year = {2004},
  address = {Geneva, Switzerland},
  publisher = {COLING},
  url = {https://aclanthology.org/W04-1217},
  pages = {91--94}
}

Nissim, M., Dingare, S., Carletta, J., & Steedman, M. (2004). An Annotation Scheme for Information Status in Dialogue. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal: European Language Resources Association (ELRA).

@inproceedings{nissim-etal-2004-annotation,
  title = {An Annotation Scheme for Information Status in Dialogue},
  author = {Nissim, Malvina and Dingare, Shipra and Carletta, Jean and Steedman, Mark},
  editor = {Lino, Maria Teresa and Xavier, Maria Francisca and Ferreira, F{\'a}tima and Costa, Rute and Silva, Raquel},
  booktitle = {Proceedings of the Fourth International Conference on Language Resources and Evaluation ({LREC}{'}04)},
  month = may,
  year = {2004},
  address = {Lisbon, Portugal},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2004/pdf/638.pdf}
}

Nissim, M., Matheson, C., Reid, J., & others. (2004). Recognising geographical entities in Scottish historical documents. Proceedings of the Workshop on Geographic Information Retrieval at SIGIR 2004, 35.

@inproceedings{nissim2004recognising,
  title = {Recognising geographical entities in Scottish historical documents},
  author = {Nissim, Malvina and Matheson, Colin and Reid, James and others},
  booktitle = {Proceedings of the Workshop on Geographic Information Retrieval at SIGIR 2004},
  volume = {35},
  year = {2004}
}

Carletta, J., Dingare, S., Nissim, M., & Nikitina, T. (2004). Using the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: a Case Study. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal: European Language Resources Association (ELRA).

@inproceedings{carletta-etal-2004-using,
  title = {Using the {NITE} {XML} Toolkit on the Switchboard Corpus to Study Syntactic Choice: a Case Study},
  author = {Carletta, Jean and Dingare, Shipra and Nissim, Malvina and Nikitina, Tatiana},
  editor = {Lino, Maria Teresa and Xavier, Maria Francisca and Ferreira, F{\'a}tima and Costa, Rute and Silva, Raquel},
  booktitle = {Proceedings of the Fourth International Conference on Language Resources and Evaluation ({LREC}{'}04)},
  month = may,
  year = {2004},
  address = {Lisbon, Portugal},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2004/pdf/636.pdf}
}

Hachey, B., Nguyen, H., Nissim, M., Alex, B., & Grover, C. (2004). Grounding gene mentions with respect to gene database identifiers. BioCreAtIvE Workshop Handouts.

@inproceedings{hachey2004grounding,
  title = {Grounding gene mentions with respect to gene database identifiers},
  author = {Hachey, Ben and Nguyen, Huy and Nissim, Malvina and Alex, Bea and Grover, Claire},
  booktitle = {BioCreAtIvE Workshop Handouts},
  year = {2004}
}

Nissim, M., & others. (2004). Lexical information and choice of determiners. In Possessives and Beyond (pp. 133–152). GLSA Publications.

@incollection{nissim2004lexical,
  title = {Lexical information and choice of determiners},
  author = {Nissim, Malvina and others},
  booktitle = {Possessives and Beyond},
  pages = {133--152},
  year = {2004},
  publisher = {GLSA Publications}
}

2003

Markert, K., Nissim, M., & Modjeska, N. (2003). Using the web for nominal anaphora resolution. Proc. 10th European Chapter of the Association for Computational Linguistics (EACL 03) Workshop on the Computational Treatment of Anaphora, 39–46.

@inproceedings{markert2003using,
  title = {Using the web for nominal anaphora resolution},
  author = {Markert, Katja and Nissim, Malvina and Modjeska, Natalia},
  booktitle = {Proc. 10th European Chapter of the Association for Computational Linguistics (EACL 03) Workshop on the Computational Treatment of Anaphora},
  pages = {39--46},
  year = {2003}
}

Modjeska, N. N., Markert, K., & Nissim, M. (2003). Using the Web in Machine Learning for Other-Anaphora Resolution. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 176–183.

@inproceedings{modjeska-etal-2003-using,
  title = {Using the Web in Machine Learning for Other-Anaphora Resolution},
  author = {Modjeska, Natalia N. and Markert, Katja and Nissim, Malvina},
  booktitle = {Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing},
  year = {2003},
  url = {https://aclanthology.org/W03-1023},
  pages = {176--183}
}

Nissim, M., & Markert, K. (2003). Syntactic features and word similarity for supervised metonymy resolution. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 56–63.

@inproceedings{nissim2003syntactic,
  title = {Syntactic features and word similarity for supervised metonymy resolution},
  author = {Nissim, Malvina and Markert, Katja},
  booktitle = {Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics},
  pages = {56--63},
  year = {2003},
  url = {https://aclanthology.org/P03-1008.pdf}
}

Markert, K., & Nissim, M. (2003). Corpus-based metonymy analysis. Metaphor and Symbol, 18, 175–188.
```
@article{markert2003corpus,
  title = {Corpus-based metonymy analysis},
  author = {Markert, Katja and Nissim, Malvina},
  journal = {Metaphor and symbol},
  volume = {18},
  number = {3},
  pages = {175--188},
  year = {2003},
  publisher = {Routledge}
}
```
In this article, we make the case for corpus-based metonymy analysis and show that many interesting linguistic and statistical questions can only be answered byworking with real texts. To facilitate such studies, we present a method for annotating metonymies in domain- and genre-independent text. We advocate an annotation scheme that builds on regularities in metonymic usage, that takes underspecification in metonymic reference into account, and that is organized hierarchically. We combine previous metonymy classification proposals with insights from a corpus study to present a fullyworked-out annotation scheme for location names, illustrating the previously mentioned principles.We present several experiments measuring annotation agreement and show that the annotation scheme is reliable and has wide coverage. We also provide a gold standard for annotations of this kind consisting of 2,000 annotated occurrences of country names in the British National Corpus.We use the resulting corpus to study metonymy distributions and the factors that influence the choice of literal versus metonymic readings in real texts.

Nissim, M. (2003). The role of metonymy in named entity recognition. In A. G. Ramat & E. Rigotti (Eds.), Linguistics and the New Professions. FrancoAngeli.

@incollection{nissim2003role,
  title = {The role of metonymy in named entity recognition},
  author = {Nissim, Malvina},
  year = {2003},
  editor = {Ramat, Anna Giacalone and Rigotti, Eddo},
  booktitle = {Linguistics and the New Professions},
  publisher = {FrancoAngeli}
}

2002

Markert, K., & Nissim, M. (2002). Towards a Corpus Annotated for Metonymies: the Case of Location Names. In González Rodrı́guez Manuel & C. P. Suarez Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). Las Palmas, Canary Islands - Spain: European Language Resources Association (ELRA).

@inproceedings{markert-nissim-2002-towards,
  title = {Towards a Corpus Annotated for Metonymies: the Case of Location Names},
  author = {Markert, Katja and Nissim, Malvina},
  editor = {Gonz{\'a}lez Rodr{\'\i}guez, Manuel and Suarez Araujo, Carmen Paz},
  booktitle = {Proceedings of the Third International Conference on Language Resources and Evaluation ({LREC}{'}02)},
  month = may,
  year = {2002},
  address = {Las Palmas, Canary Islands - Spain},
  publisher = {European Language Resources Association (ELRA)},
  url = {http://www.lrec-conf.org/proceedings/lrec2002/pdf/11.pdf}
}

Markert, K., & Nissim, M. (2002). Metonymy Resolution as a Classification Task. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 204–213. Association for Computational Linguistics.

@inproceedings{markert-nissim-2002-metonymy,
  title = {Metonymy Resolution as a Classification Task},
  author = {Markert, Katja and Nissim, Malvina},
  booktitle = {Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing ({EMNLP} 2002)},
  month = jul,
  year = {2002},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/W02-1027},
  doi = {10.3115/1118693.1118720},
  pages = {204--213}
}

Bresnan, J., Carletta, J., Crouch, R., Nissim, M., Steedman, M., Wasow, T., & Zaenen, A. (2002). Paraphrase analysis for improved generation, Stanford-link project. Stanford, CA: HRCR Edinburgh-CLSI Stanford.

@misc{bresnan2002paraphrase,
  title = {Paraphrase analysis for improved generation, Stanford-link project},
  author = {Bresnan, Joan and Carletta, Jean and Crouch, Richard and Nissim, Malvina and Steedman, Mark and Wasow, Tom and Zaenen, Annie},
  year = {2002},
  publisher = {Stanford, CA: HRCR Edinburgh-CLSI Stanford}
}

2001

Poesio, M., & Nissim, M. (2001). Salience and possessive NPs: the effect of animacy and pronominalization. Proc. of AMLAP (Poster Session).

@inproceedings{poesio2001salience,
  title = {Salience and possessive NPs: the effect of animacy and pronominalization},
  author = {Poesio, Massimo and Nissim, Malvina},
  booktitle = {Proc. of AMLAP (Poster Session)},
  year = {2001}
}

Nissim, M. (2001). Underlying relations in genitives and bridging. In Pragmatics in 2000 (pp. 445–457). International Pragmatics Association.

@incollection{nissim2000underlying,
  title = {Underlying relations in genitives and bridging},
  author = {Nissim, Malvina},
  booktitle = {Pragmatics in 2000},
  pages = {445--457},
  year = {2001},
  publisher = {International Pragmatics Association}
}

Nissim, M. (2001). Bridging Definites and Possessives: Distribution of Determiners in Anaphoric Noun Phrases (PhD thesis). Università degli Studi di Pavia.

@phdthesis{nissim2001bridging,
  title = {Bridging Definites and Possessives: Distribution of Determiners in Anaphoric Noun Phrases},
  author = {Nissim, Malvina},
  year = {2001},
  publisher = {Universit\`{a} degli Studi di Pavia}
}

2000

Nissim, M. (2000). On the referential role of genitives. In C. Piliere (Ed.), Proceedings of the ESSLLI Student Session.

@inproceedings{nissim2000referential,
  title = {On the referential role of genitives},
  author = {Nissim, Malvina},
  editor = {Piliere, Catherine},
  booktitle = {Proceedings of the ESSLLI Student Session},
  year = {2000}
}

1999

Nissim, M., Sansò, A., & Soria, C. (1999). Towards a compositional frame semantics. In S. Bagnara (Ed.), Proceedings of the European Conference on Cognitive Science (ECCS99).

@inproceedings{nissim1999towards,
  title = {Towards a compositional frame semantics},
  author = {Nissim, Malvina and Sans\`{o}, Andrea and Soria, Claudia},
  booktitle = {Proceedings of the European Conference on Cognitive Science (ECCS99)},
  year = {1999},
  editor = {Bagnara, Sebastiano},
  location = {Certosa di Pontignano, Siena, Italy}
}

Malvina Nissim