Breaking `bhasha' barriers

Print edition : January 14, 2005

Indian language tools developed by C-DAC. -

C-DAC pioneered Indian language computing tools in the 1980s. Today it continues to explore ways to overcome the problems of those for whom language remains a hurdle to self-empowerment.

WHEN Mohan Tambe joined the newly created C-DAC, fresh from his work at the Indian Institute of Technology, Kanpur (IIT-K), he was already grappling with the problem of how to modify the essentially Roman script-based personal computing software of the day so that it could work just as well for Devanagiri as well as several other Indian language scripts. This needed to be addressed in two ways: The standard English "QWERTY" keyboard known to generations of typewriter users had to be modified for the syllable-based Indian scripts. Then a new Indian standard had to be evolved to electronically encode the Indian script so that the computer could decipher the keystrokes. In Kanpur, Tambe had worked on a project to create a phonetic keyboard known as Inscript.

At C-DAC, Tambe led a team of engineers who, between them, created the first indigenous local language tool sets for computer users in India. The flagship product became an acronym for Indian language computing: the Graphics and Intelligent Script Terminal or GIST card - a piece of hardware that one had to install in the PC in one of the free slots, to make the keyboard work in an Indian language. The Inscript technology dovetailed with GIST.

C-DAC then began work that culminated in an Indian Standard Code for Information Interchange (ISCII) similar to ASCII, or the American Standard Code for Information Interchange that drives the English language and other Roman script keyboards, whereby Indian alphabets could be represented on 8-bit computers designed for the 7-bit ASCII code for English.

Tambe left to start his own company in Bangalore: Innomedia technologies, a pioneer in delivering digital content via cable. Meanwhile, C-DAC followed on the GIST card development by marketing the technology with a number of add-ons such as Apex Language Processor (ALP), ISM (ISFOC Script Manager), a software used to develop customised applications for harnessing the Internet; N-TRANS, a software for translating proper nouns from English to Indian languages and vice versa; iplugin, a web application development tool for Indian languages; and LEAP (Language Environment for Aesthetic Publishing), a full-fledged Indian language office suite. The `lite' version of LEAP - iLEAP is today a free 4 MB download from the website of the Technology Development for Indian Languages group of the Information Technology Ministry. Largely due to the shrewd marketing by C-DAC, a number of private players have been motivated to come up with compelling Indian language computing tools.

As the C-DAC family enlarged in 2003, resources of new centres, such as the National Centre for Software Technology began to enrich its efforts. The NCST was already a respected member of the Open Source community and in its new `avatar' it continued to play a key role in localising open source software in Indian languages. It had created a number of Indian language tools - Aalekh: A bi-lingual text processor in dot-matrix environment; Palatino and Vinyas: Font design packages for Indian and other scripts; Vividha: A multi-lingual text processing software tool; Vidura: Electronic publishing software with Vedic Sanskrit typesetting system; Darshan: Rich text edit control for Indian text; Rupanthar: Transliteration software for name/address conversion from Roman script; Pranam: Bi-lingual certificate printing/processing software; and Mantra: a human-aided machine translation from English to Hindi.

In October 2004, C-DAC announced the launch of its "Janabhaaratii" project focussing efforts on developing a software suite based on Linux for Indian languages. The project aims at inviting, building and supporting community initiatives to produce and disseminate free and open source software systems, breaking language barriers and bridging the digital divide. The project will cover the development of technology into diverse areas through infokiosks, cybercafes, and so on. It draws on the Mumbai C-DAC's INDIX project to make GNU/Linux operating systems "intelligent" when it comes to Indian scripts.

The Thiruvananthapuram Centre of the Electronics Research and Development Centre (ER&DC), another member of the C-DAC `parivar', has a long history of coming up with compelling technology. Early in 2004, it launched a Malayalam-based Web mail server, "Sandesam" (a useful tool for Malayalam language Internet Service Providers as well as multilingual portal developers), to run on the red-hat flavour of Linux, or on Windows.

The C-DAC Noida centre has created "Gyan-Nidhi Corpus" containing one Million Pages Parallel Corpus in 11 Indian languages and English. It has also worked on speech corpus for three Indian languages, which has become the essential part of Linguistic Resources for the development of NLP-based systems.

The C-DAC headquarters has developed an OCR system, "Chitrankan", for Devanagiri script, which has user-friendly features such as automatic text and picture region detection facility, recognition of Hindi and Marathi languages along with embedded English text facility and support for a number of popular fonts. Similarly, C-DAC in Noida has been working on an optical character recognition technology, "Chitraksharika", for Hindi and Marathi (Devanagiri script). Both these systems are based on the technology developed by the Indian Statistical Institute, Kolkata.

The Kolkata centre of C-DAC has worked on a speech corpus for Bengali. It has developed a Bengali TTS called "Bangla-Vani" using ESNOLA technique.

C-DAC's work on Machine-Assisted Translation System, MANTRA, got the Computerworld Smithsonian Award and is the part of the "1999 Innovation Collection" in the National Museum for American History.

The newest initiative of C-DAC is LILA - Learning Indian Languages through Artificial Intelligence - a multi-media based intelligent self-tutoring application for learning Hindi as a second language through southern Indian languages. Users can study for the Hindi Prabodh, Praveen and Pragya examinations through the medium of Kannada, Malayalam, Tamil and Telugu by tapping into a tool on the World Wide Web. Facilities include reading and writing of Hindi alphabets, a speech interface to enable correct pronunciation and intonation; an online bilingual dictionary, a tool to record and compare one's own voice with standard pronunciation as well as grammar notes, interactive exercises and tests. The web site (https://lilappp.cdacindia.com/) is a gateway to instruction in the four southern languages as well as in Hindi and English.

Through such services, C-DAC continues to help Indians breach the barriers of `bhasha'. Its mission continues to bring computers closer to millions of people for whom lack of English language skills still remains an obstacle.

A letter from the Editor


Dear reader,

The COVID-19-induced lockdown and the absolute necessity for human beings to maintain a physical distance from one another in order to contain the pandemic has changed our lives in unimaginable ways. The print medium all over the world is no exception.

As the distribution of printed copies is unlikely to resume any time soon, Frontline will come to you only through the digital platform until the return of normality. The resources needed to keep up the good work that Frontline has been doing for the past 35 years and more are immense. It is a long journey indeed. Readers who have been part of this journey are our source of strength.

Subscribing to the online edition, I am confident, will make it mutually beneficial.

Sincerely,

R. Vijaya Sankar

Editor, Frontline

Support Quality Journalism
This article is closed for comments.
Please Email the Editor
×