CAIDAS publishes first all-German large language model “LLäMmlein”
We are presenting a milestone for German-language large language models: At the University of Würzburg (JMU) the first all-German large-scale language model has been created and trained at the Center for Artificial Intelligence and Data Science (CAIDAS) with resources support provided by the NHR@FAU.
So far, many large language models have mainly been trained using English data sets. This is where Professor Dr. Andreas Hotho from CAIDAS and his team started their work: “With LLäMmlein we have created models that were trained exclusively based on German-language data. This focuses not only on German language processing and opens up new possibilities for applications that are specifically customized for the German language, but also concentrates on the study of German language models.”
“We are presenting two models of different sizes: LLäMmlein 120M and 1B. These offer an inside look at how model size influences performance. In addition, we provide special chat options optimized for interactive applications,” explains Andreas Hotho.
This project marks the kick-off for the development of even larger models. The extensive computations were carried out at the Alex Cluster of NHR@FAU and required 50,000 computing hours on A100 GPUs with 80GB memory for the 1B model. The training took around 5 weeks on 64 GPUs.