India Should Think About 'Open Source AI'

Sep 13, 2024

Dear Reader,

Last week, Telangana hosted its first-ever Global AI Summit on the theme ‘Making AI work for Everyone.’ As part of the two-day conference, more than 2,000 global delegates, including some of the most influential voices in AI, came together to deliberate on how AI technologies can be developed and used in a manner that is inclusive, sustainable, and responsible. The government also signed a slew of Memorandum of Understandings (MoUs) with various tech companies, and one MoU in particular caught our attention: Meta’s two-year partnership with Telangana’s IT, Electronics & Communications Department involving the giant’s ‘open-source AI’ technologies.

While no concrete details about it seem to be out yet, Meta India has announced that the tech giant will leverage its ‘Open Source Generative AI technologies,’ including its latest Llama 3.1 model, to improve the state’s public search delivery and government efficiency. In 2023, Meta signed an MoU with the Union Government to achieve similar ends. In July 2024, Meta organised an ‘Open Source GenAI Grand Challenge’ along with Nasscom (India’s leading tech trade body) and C-DAC (an R&D organisation under the Union IT ministry), aimed at incentivising Indian startups and developers to build GenAI use-cases in public interest. MoUs tend to indicate only the broader intent of the parties involved, and therefore in the present instance, they do not shed much light on the concrete shape that such intended partnerships with Meta could take. They do however indicate India’s growing interest in ‘open source AI’ solutions for driving socio-economic impact.

There are two key reasons why ‘open-source AI’ is beneficial to India. First, the meaningful role that it is playing in boosting India’s local GenAI ecosystem. Broadly speaking, language models such as Meta’s Llama, Mistral, and Google’s Gemma allow for their source code to be freely downloaded, modified, and deployed. And although not by all, these models are being relied upon by several Indian startups and developers as the base to build their own Indic large language models (LLMs) that cater to India’s diverse linguistic needs. For example, Sarvam AI, an Indian startup that has successfully raised a total of USD 53 million, built a Hindi LLM using Meta’s Llama 2-7B model in a two-phase training process. Similar LLMs in Tamil, Kannada, and Marathi have also been developed using such foundation models. Another example is that of Navarasa 2.0, a multi-lingual variant of Google’s Gemma 7B/2B model that has been built by Indian developers, and has generative capabilities that can cover 15 Indian languages besides English. Startups have therefore been able to benefit from such models as their foundation instead of having to start from scratch each time.

And second, the positive effect that releasing AI technology through ‘open-source’ frameworks could theoretically have on democratising access to AI, which is a principle that India has aligned itself with in its leadership role in the global south on AI development & deployment.

India however must also prepare to engage with some of the most pressing issues surrounding ‘open source AI’ (OS AI) — most importantly, its scope.

Currently, a key challenge for the ecosystem is the very term ‘open source AI,’ which is undergoing a shift in meaning, making it harder to determine what qualifies and what doesn’t. At first blush, a product marketed as OS AI could indicate the company’s core principles on tech distribution including a desire to democratise AI. However, AI researchers and commentators are increasingly raising concerns about the “open” branding of AI models. For instance, Meta itself has been criticised for ‘open washing’ by portraying its Llama 2 model as open source, when in reality, its licence does not entirely comply with the most widely used standard for OS software. Its licence also does not apply to Meta’s key commercial competitors, and insists on a special licence for deployment by apps/services that, on the model’s release date, had more than 700 million daily users. Some are even calling this cap anti-competitive.

But beyond preventing open washing, ascertaining what constitutes OS AI is also critical in the context of its legal consequences. Firstly, given how certain legal exemptions are being made for OS AI models under AI regulations such as the recently passed EU AI Act, the imposed boundary conditions for OS AI models could determine which AI companies are held to relatively lighter obligations under AI regulations. Secondly, in the absence of a base consensus on what it means for AI models to be ‘open source,’ AI licensing could prove to be a very complex exercise.

Conventionally understood in the context of software, something is called OS if its source code is publicly available, and is published under a licence that permits people to access, modify, and use the code in their own projects so long as their reuse meets the terms of the licence. There are a number of OS licences that are used today, and depending on their terms, they are often described as being permissive or restrictive. However, AI works in fundamentally different ways, and the OS framework for software cannot be neatly applied to AI. This is because there are a host of elements that form part of present-day AI models – the algorithm or code, weights (the parameters used to train a model), the training data, the underlying architecture, and so on. Should each of these elements be openly available for the model to be deemed OS? Can it be the case that only specific elements such as training data are made open and not the weights, or vice versa? How will OS licenses apply to AI models that only open source some of these elements? And most hotly discussed, is ‘open data’ an essential part of ‘open source’? Such questions remain to be substantially answered, giving India an opportunity to steward some of these conversations.

That’s all for this week.

Thank you for reading Digital Republic.

As always, please feel free to send in your thoughts or suggestions at digitalrepublic@evamlp.com.

Have a great weekend!

Best,

Shruti Mittal

Digital Republic

India Should Think About 'Open Source AI'

Discussion about this post