AI Singapore (AISG) has joined forces with Google Research to enhance inclusivity in Large Language Models (LLMs) for Southeast Asian languages. Together, they have launched Project SEALD (Southeast Asian Languages in One Network Data), aimed at improving datasets used to train LLMs in languages spoken across Southeast Asia (SEA).
LLMs, which are artificial intelligence models capable of understanding and generating human language text, will be fed with languages including Indonesian, Thai, Tamil, Filipino, and Burmese as part of the Project SEALD initiative. These languages will contribute to building a diverse dataset representing the linguistic diversity of the SEA region under the SEA-LION (Southeast Asian Languages in One Network) initiative by AISG.
Project SEALD also entails the development of translocalisation and translation models, the establishment of best practices for instruction tuning datasets, the creation of tools for enabling translocalisation at scale, and the publication of pre-training recipes for SEA languages.
Yolyn Ang, Vice President, Knowledge and Information Partnerships at Google Asia Pacific, expressed pride in partnering with AISG to advance AI model development in Singapore and SEA. The collaboration aims to significantly enhance existing datasets and evaluation benchmarks for SEA languages, ultimately benefiting communication with under-represented populations such as migrant workers in Singapore.
Furthermore, the datasets developed through Project SEALD will contribute to AI solutions developed under the AI Trailblazers initiative, facilitating outreach across various domains including worker grievances redressal and assistance scheme extensions.
Project SEALD will engage academia, industry, and government partners for data collection, curation, and quality checks. Additionally, AISG is collaborating with Google Cloud to make SEA-LION LLMs available on Google Cloud’s Model Garden on Vertex AI, allowing organizations to access and customize these models for their applications.
Leslie Teo, Senior Director of AI Products at AISG, emphasized the importance of building a community and ecosystem to enhance the quality and capabilities of SEA-LION LLMs. He highlighted Google’s role as a key part of the SEA-LION ecosystem and expressed optimism about building better datasets through Project SEALD collaboration with Google.