About

The HR-XT-XTEND is one of eight FSTP subprojects of a larger Horizon Europe funded project Unified transcription and translation for extended reality (UTTER). HR-XT-XTEND project aim is to develop a large language model (LLM) for the Croatian language that will be trained on a massive dataset of Croatian text. The project aims to build resources for XR models, extend XR models to a new language, and evaluate the LLM. The project goals are to collect at least 6 billion tokens of Croatian text and prepare that data for LLM training, create a LLM for the Croatian language using monolingual data only, and evaluate the LLM for downstream tasks. The experimental phase will focus on developing and evaluating the model architecture and training process. The training phase will be used to train the LLM. The integration phase will involve integrating the LLM into the UTTER platform. The project results will be accessible under permissive licenses to the research community and the public from the HR-CLARIN repository.

 

 

Latest news

HR-XR-XTEND presented by a paper at LREC-COLING2024

The research on application of the existing LLMs to the downstream task of HR-XR-XTEND project -- sentiment analysis, has been presented at the Main conference of the LREC-COLING2024. The conference was held in Turin, Italy and the paper "M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets" can be found here. (2024-05-27)

 

Resources

A list of produced resources and other project results will be presented in this section.

 

Contact

Address
University of Zagreb, Faculty of Humanities and Social Sciences, Institute of Linguistics

Phone
+385 1 4092142

E-mail
dasa.farkas [at] ffzg.unizg.hr

Acknowledgments
This work was funded from the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No 101070631 "Unified transcription and translation for extended reality (UTTER)" and from the UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee (Grant No 10039436).