Europe Boosts AI with Multilingual and Cultural Diversity Initiative

Europe Boosts AI with Multilingual and Cultural Diversity Initiative

2025-03-28 data

Brussels, Friday, 28 March 2025.
The European Commission’s ALT-EDIC aims to enhance AI inclusivity by supporting multilingual solutions, preserving cultural heritage, and bolstering European tech sovereignty with 26 member countries involved.

Revolutionary Language Initiative Launch

In a significant move to reshape Europe’s AI landscape, the European Commission has launched the Alliance for Language Technologies European Digital Infrastructure Consortium (ALT-EDIC) and the Language Data Space (LDS). Formed in February 2024, the consortium now includes 17 participating Member States and 9 observer Member States and regions [1]. This initiative addresses a critical shortage of European language data necessary for training large language models, marking a strategic step toward technological sovereignty [1].

Comprehensive Language Data Ecosystem

The European Language Data Space (LDS) is designed as a sophisticated ecosystem enabling organizations to share, monetize, and connect using language data while maintaining compliance with EU regulations [2]. The initiative’s significance was highlighted at the recent Data Spaces Symposium 2025 held in Warsaw on March 11-12, 2025, where industry leaders and policymakers convened to discuss the evolving landscape of data spaces [2].

Strategic Sector Implementation

The initiative has already gained substantial momentum with the launch of LLMs4Europe, which brings together over 70 partners across Europe. This project, coordinated by ALT-EDIC, aims to develop specialized large language models for five strategic sectors: Energy, Telecommunications, Tourism, Public Services, and Science [4]. The Science Pilot, coordinated by CrossLang, unites prominent research institutions including OpenAIRE, Athena Research Center, CNRS, CNR, and BSC to create sophisticated LLM-based tools for researchers and policymakers [4].

Addressing Global Language Disparities

This initiative comes at a crucial time when AI systems show significant language biases. Currently, nearly half of the data used for training AI models is in English, despite only 20% of the global population being native English speakers [8]. The European initiative aims to correct this imbalance, ensuring that AI systems can effectively serve users across all EU languages [1]. Through the Language Data Space, the project will create a cohesive marketplace for language data, enhancing the collection and sharing of multilingual data to support European large language models [1].

Bronnen


AI diversity