An Indian adventure

Print edition : September 02, 2000

Indian IT firms are gearing to cash in on database-related software opportunities following the sequencing of the human genome, but in order to move up the value chain in the field, more attention needs to be paid to sharpen specialised skills.


THE hype is gathering momentum. The buzzword today in the two important segments of the Indian corporate world, information technology (IT) and pharma-biotech industry, is genomics. Interestingly, more than the pharmaceutical and biotechnology companies, the IT industry appears to be gung-ho about genomics or, more precisely, bio-informatics, the application of IT to genomics. Industry associations are up to their usual acts to hype it up - of organising seminars and round-table conferences and setting up committees. Sections of the media are adding their bit to this falsetto.

A computer-generated illustration of the double-helix structure of the DNA. An understanding of biology and genetics coupled with software skills is required for professionals involved in the field in order for India to move up the value chain in the specialised field of handling genomics data.-THE WELLCOME TRUST/REUTERS

In July, the two major industry associations in the country, the Confederation of Indian Industry (CII) and the Federation of Indian Chambers of Commerce and Industry (FICCI), hosted meetings in New Delhi on two consecutive days to bring together the re search community and the industry. One had thought that these would provide the opportunity to know the Indian industry perspectives on the emerging field of bio-medical research and development (R&D) and business potential in the post-genome sequence ph ase of biotechnology. Sadly, one did not come out wiser from these meetings.

Notwithstanding all the hoopla, industry participation in these meetings, particularly that of IT companies, was disappointing. The few industry presentations that were made, wallowed in generalities. One failed to see what Indian industry's strengths a re and where the advantage lies for the much-touted "Public-Private Partnership in Genomics, IT and Medicine", as the FICCI meeting was billed.

The CII saw bio-informatics as the new cash cow for the Indian software industry. Its argument was that in order to restrict costs, many overseas biotech companies would be forced to outsource software from countries like India because Indian IT firms h ad a cost advantage. In particular, the CII viewed database development as the area where Indian companies would be sought to offer complete solutions to major pharma and genomics-based biotech companies of the West.

A business weekly ran a cover story that argued that genomics offered the biggest business opportunity since the software boom. It quoted various huge numbers as the size of the potential market for the Indian industry in the area of bio-informatics. Da ta-mining - putting sense through analysis to the huge volume of genomics-related data - is believed to be the niche area where Indian software firms can grab a piece of the pie.

All these claims need to be viewed in a balanced perspective.

True, the completion of the sequencing of the human genome has led to the creation of a large variety of genomic databases of enormous sizes, and the mining of these rapidly accumulating data will become important. But June 26 only marked the completion of an international project, whereas data generation has been going on since the programme began in 1989. Besides, before human sequencing was completed, the sequencing of the genomes of nearly 25 other organisms was done. Today the number has increased to nearly 30, the latest one being the sequencing of the cholera causing bacteria. The data from these are by no means small.

Much of the genomic data of other organisms, if not all, are in the public domain and analyses of these have been going on all this while. The much-touted Indian expertise in IT, if really true, should have attracted bio-informatics opportunities all th ese years. Nothing of that sort has happened. And if Indian software companies were in the business of database solutions and data-mining, they would have sought out such opportunities already. Where were all these companies that today are believed to be queueing up to grab a piece of the genomics pie? That all of a sudden it has dawned on Indian firms that there is a boom waiting - thanks to the human genome project - is wishful thinking.

And where is the evidence for the so-called expertise in database-related software development in the country? Genomics is only one more area of database generation. To do data-mining, there should be data to mine from, to begin with. Within the country there is no tradition of generation of any kind of data - let alone biology- and health-related data - and of doing systematic analysis on them to say that Indian IT companies have an edge in developing database-related software and in data-mining.

A country like the United States has been in the business of database generation since the 1970s and it has a huge amount of data to mine from. And, given the growing field of data-mining in the last few years, Indian IT companies should have already th rived on this emerging area if they had the expertise and were really in the business. According to database expert Mukul Sinha, U.S. agencies do come with the idea of contracting data-mining jobs but realise very soon that there is no expertise in the f ield in India and go back. At best, he says, the bulk of database- and data-mining-related work being done by IT companies in the country belongs to the low end of the value chain. Body shopping, in other words.

Indeed, the entire genomics area is being viewed as a Y2K-like opportunity for "IT enabled services", the new euphemism for body shopping coined by the IT Task Force. After Y2K and Euro-conversion jobs, the IT industry required something to survive on, and the human genome project may offer just that through genomics data-related services. The oft-repeated statement in IT industry circles that services offer an entry point for Indian companies and they can move up the value chain needs to be taken with loads of salt because, barring perhaps a very few small niche companies, there is no evidence of that having happened over the years even with regard to conventional data. At least the representative of one company at the FICCI meeting was honest enough to admit that it was the services component of bio-informatics that was of interest to the company because it could not invest in specialised manpower at the present stage and, more pertinently, did not have the staying power for potential long-term ret urns to enter at a higher rung of the value ladder.

In order to move up the value chain in the specialised field of handling genomics data, an understanding of biology and genetics coupled with software skill is required. Experts in the field are few and are found only in a few specialised government-fun ded academic and R&D institutions, such as the Tata Institute of Fundamental Research (TIFR), the Indian Institute of Science (IISc), the National Institute of Immunology (NII) and the Centre for Cellular and Molecular Biology (CCMB) and some university groups. There is an urgent need to create this specialised cadre of software manpower in order to handle whatever genomics related work that is happening and are proposed for the future within the country. As Samir Brahmachari, Director of the Centre for Biochemical Technology (CBT) of the Council of Scientific and Industrial Research (CSIR), one of the main centres of genomics work in the country, says: "What we have in plenty are 'techno-coolies'. What we need here is specialised skill. We have to bri ng together people from biology, genetics, chemistry, physics, mathematics and computer science and train a new breed."

This is easier said than done. Because the manner in which our higher science education is fashioned in India, such people will be in extreme short supply. Thanks to the policies of the government and the narrow vision of people who administer higher sci ence education in the universities, IT has become an end in itself. Today, in the name of "vocation-oriented education", new-fangled IT courses are introduced at the undergraduate level without giving students a broad-based grounding in the sciences.

These end up only producing 'techno-coolies' who are no different from students passing out of the hundreds of 'computer institutes' across the country.

Compounding the problem is the more serious development of rapidly declining enrolment in basic sciences and subtantive drop-out rates among those enrolled. Consider the following. The chemistry department of a well-known college had to close down for al l practical purposes for want of teachers and students. This is symptomatic of the state of higher science education in the country. Unfortunately, no one seems to be paying any attention to this serious issue even as grandiose statements are made at the highest political level about the Indian genomics programme and India's participation in the world effort in genomics. Ironically, it is the all-round IT mantra that is killing the sciences in the country. In this scenario of lack of specialised experti se, if anyone hopes that the country will be able to contribute to the genomics effort in a significant measure, think again.

Consider the various elements involved in the analysis of sequence data. While evidently there is a strong component of IT or bio-informatics in the exercises involved, a good understanding of molecular biology and genetics is also essential if one is to use IT to produce significant results at our end. Even to render these as service to someone else's ideas and needs, an understanding of the subject is necessary.

The genome data that are now available - 3.1 billion basic units or letters called base pairs - are raw, or that is, without annotation, which imparts meaning to deoxyribonucleic acid (DNA) sequences and adds value. While what annotation is meant to ind icate is known in broad terms, there are no set standards or paradigms for annotation. One can, therefore, annotate selected sequences of interest from the perspective of the research programme at hand. Basically, the purpose of annotation is to denote s ignificant features - genes, promoters, pseudogenes, expressed sequence tags (ESTs) and Short Tagged Sites (STSs), tandem repeat sequences, conserved matches in the genomes of other species, and so on - in graphic as well as written forms.

Given the whole gamut of characteristic features of coding as well as non-coding sequences of the DNA - the latter too are believed to play a regulatory role - it is obvious that database development will be central. The present challenge, according to Partha Majumdar of the Indian Statistical Institute (ISI), Calcutta, who heads a major human genetic diversity programme under the Indian genome initiative, is to improve database design, software for database access and manipulation, and data-entry proc edures that are compatible with the diverse computational platforms that are used in different laboratories.

The focus of the Indian initiative is largely functional genomics - determination of the functions of unknown genes and their expression - and pharmaco-genomics or developing gene-based drugs. From this perspective, new databases and analytical tools ne ed to be developed for studying gene expression and functional data, for modelling complex biological networks and interactions and for collecting and analysing polymorphisms (genetic sequence variations) data and linking genotypic data to phenotypic dat a (characteristic features of populations).

Given the data at the genomic DNA level - which is what the genome sequence gives - the questions from a researcher's perspective, according to Majumdar, are the following: given a genomic landmark -- gene, disease loci, markers, mutations or single nuc leotide polymorphisms (SNPs), deletions/insertions, and so on. - what is it that is already known about this landmark? Where is it located? Which other landmarks are situated around this landmark? What is known about these landmarks? Are there similar (h omologous) chromosomal regions in the genomes of other organisms, say, mouse? Clearly, one should know which of the existing databases have these pieces of information and enable a search for that particular type of landmark and neighbouring landmarks, o r develop software that will enable one to do the kind of search that one is interested in.

The IT community has its tasks cut out: develop data-warehousing techniques, relational database designs, data-mining techniques from single or multiple databases, annotation, user-friendly query systems, and graphic software for generating outputs. Mor eover, this cannot be a stand-alone effort if it has to be part of a concerted effort within the country and not merely a service to some outside party. The IT effort has to be coordinated with mathematical and computational efforts in the development of algorithms and methodology for data-mining, pattern recognition and medical-biological or functional genomics efforts that involve clinical databases and genotypic-phenotypic relational databases and three dimensional modelling of gene structures.

The question is whether the IT industry is up to taking up this challenge? Or whether content with providing genome data-related services. Largely, it would appear to be the latter. The only way that an IT or even pharma company can benefit out of this exploding genomics research is by tying up with leading academic institutions involved in the field. To be fair to the Indian corporate sector, it must be said that some companies have realised this and are talking to laboratories such as the CCMB, the I ISc and the CBT.

Important among these is the proposed facility for 'gene chip' technology at the CCMB, Hyderabad, in collaboration with the company Biological Evans, which has earmarked a budget of Rs.1 crore. This is basically a rapid scanning technique for creating c linical genotypic databases and for the diagnosis of diseases. The CCMB is also talking to companies such as the IT major Satyam Online and Reliance Industries to set up a bio-informatics training centre in order to transfer its in-house expertise to the industry.

Another effort is the setting up of a Bio-Informatics Institute at the Bangalore Software Technology Park by the Karnataka government in association with ICICI and the IISc, Bangalore, with a funding of Rs.10 crores. Some niche software companies such a s the Bangalore-based Genotypic are slated to make use of this facility.

Similarly, the CBT is scouting around for suitable IT companies to train specialised manpower to handle the bio-informatics component of genomics research. Companies such as the National Institute of Information Technology (NIIT) and even some start-up companies have apparently shown interest, but nothing concrete has emerged so far. The CCMB effort may fructify soon on this front, it seems. As Samir Brahmachari, whose triangular institution-hospital-pharma company linkages are yielding some results, p oints out: "What we need are quadrangular linkages; bring in the IT companies as well and move straight to the top of the value chain. The IT companies too will then have a share in the Intellectual Property Rights (IPRs) generated from genomics research . IT companies have to move away from the Y2K service type of mindset. They should aim high. We can do it if we work together."

Unfortunately, that does not seem to be happening at the pace one would like it to happen. The government initiative on genomics too has been considerably flawed; it has not facilitated the spawning of triangular and quadrangular public-private partnersh ips, which are essential for any meaningful effort in the post-sequence era of genomics. According to Pushpa Bhargava, former director of the CCMB, who was unsuccessful in convincing the Department of Biotechnology (DBT) to take up human genome sequencin g in the country, even later opportunities that came by for the country to contribute substantiallally were missed.

"We missed the bus of the human genome, or for that matter the 30 other genomes that have been sequenced. But we still can be a leader in genome analysis. The question is whether our scientific bureaucracy will allow this to happen in time," she said. No t entirely happy with the way the human genome initiative under the DBT has been conceptualised - its incomprehensive structure and its sub-critical funding - the Indian Council of Medical Research (ICMR) has decided to support a parallel genomics initia tive proposed by Pushpa Bhargava, which goes beyond the DBT's focus, which is only on the human genome.

The mission objective of the ICMR programme is "to acquire and analyse DNA sequence data of human and other organisms (including bacteria, viruses and plants) and to generate value-added knowledge for the national development of health, medicine and agr iculture." At a meeting held in the ICMR headquarters on May 4, a Rs.50-crore programme was approved. While the DBT has objected to this parallel programme, Health Minister C.P. Thakur has apparently supported the move and promised an initial funding of Rs.20 crore in the revision of budgetary allocations in October and the rest in the course of the Tenth Plan.

An important component of this is the creation of a National Genome Database in association with the National Informatics Centre (NIC), which will access all genome-related databases from all over the world through a 2 Mb satellite link. A committee hea ded by the Health Minister is proposed to be constituted and it will include Secretaries of agencies such as Department of Science and Technology (DST), the DBT, the Indian Council of Agricultural Research (ICAR), CSIR and the ICMR. Whether the two initi atives will be able to coordinate, converge and strengthen themselves, rather than fragment, and whether they will help enhance the overall private-public partnership to give substance to the current hype remain to be seen.

This article is closed for comments.
Please Email the Editor