An upsurge in research

Published : Aug 29, 2003 00:00 IST

With new vistas in biotech research opening up following genome sequencing, the Department of Biotechnology is according high priority to functional genomics and bioinformatics.

THE science of biology is undergoing an awesome revolution. With the unravelling of the genomes of as many as 149 organisms - from mosquito to mouse and from maize to man - a new paradigm of research is evolving rapidly. A genome is the total DNA (deoxyribonucleic acid) of an organism. It contains the complete genetic code for the form and functions of the organism. The most important of these are the human genome (about 94 per cent of whose draft sequence was completed in 2001) and the rice genome (whose draft was released last year). Indeed, 747 genome projects are going on with phenomenal speed. Naturally, the impact of this on biotechnology is likely to be equally revolutionary. New vistas of biotech research are opening up.

What has been achieved with regard to the genomes of these various organisms is laying bare the sequence in which the alphabets of life - chemicals A, C, G and T - are linked together in the DNA of each of them. The mammoth exercise of making sense of these strings of letters, looking for meaningful words in the form of genes, identifying sentences with structures that translate into functional proteins, and seeing how they are organised in the chromosomal chapters is likely to go on for at least a couple of decades. In this study of "genomics", the human genome, with its 24 chromosomes, will constitute the most challenging of tasks.

While India was not one of the 18 countries that collaborated in the international human genome sequencing effort, it can play a significant role in the analysis and interpretation of the sequence in the years ahead. Even while actively promoting research in pre-genomics biotechnology areas based on genetic engineering, microbial technology and tissue culture - such as plant and agricultural biotechnology, including transgenic crops, biofertilizers and bio-pest control agents; bio-resource development covering biodiversity parks, medicinal and aromatic plants and so on; animal biotechnology, including vaccines, aquaculture and marine biotechnology and seri-biotechnology; medical biotechnology, including diagnostics and vaccines (particularly edible vaccines); and environmental biotechnology - the Department of Biotechnology (DBT) has accorded high priority to post-human genome sequencing activities.

Broadly, these can be classified as "functional genomics" and "bioinformatics". What do these mean? In the post-sequencing phase, annotating the available DNA sequence information is in itself an important area of research. This involves, among other things, defining the nature of specific regions - the number of genes, sequences involved in regulating genes, and predicting the functions of encoded proteins. Consider the following: The number of genes in the human genome is estimated to be anywhere between 30,000 and 100,000, of which around a mere 5,000 are known. For example, when the completely annotated sequence of chromosome 22, one of the shortest among the 24 chromosomes, was announced last December, it was found to carry 247 known genes. In addition, there were 150 genes with DNA sequences similar to other known genes. Computer analyses showed that the chromosome could carry another 200-300 genes, about which nothing is known. Given the genome sequence, locating the gene and its DNA sequence has been made an easy computerised task. What is not easy, however, at least at present, is to predict the function of gene products (proteins) from the sequence of the gene. This is the essence of `functional genomics' (correlation of gene to function by an analysis of genetic variations across populations), in particular `proteomics' (correlation of coded proteins to gene functions) and `pharmaco-genomics' - developing gene-based drugs. With the sequence information telling us what genes we have, what will be critical is information on genetic defects and when and where these genes are expressed. How does the environment affect gene expression? When some genes are mutant, what are the effects on the expression of other genes? How does gene expression respond to drug treatment? Can one design specific drug treatments depending on the genes a person carries? Answers to these questions are very region-specific. Studies in the West will be useful but by no means definitive, and may perhaps be misplaced. Thus, in the Indian context, mapping diseases and understanding how environmental factors affect the expression of genes in each tissue will be an important area of genomics research.

According to the People of India Project of the Anthropological Survey of India completed in 1993, there are 4,635 communities in India, of which 75 are endangered tribal groups. In no other country there is such a diversity and complexity of human organisation. Understanding this diversity in molecular-genetic terms, by identifying the variations between communities at the level of DNA, can provide answers to a wide range of questions about biological, medical and anthropological issues in the Indian context.

Then there is also the so-called "junk DNA". In the human genome, it is estimated that 97 per cent of the DNA sequences do not code for any protein and a major portion is composed of interspersed repetitive sequences. Most of these repetitive and other non-coding sequences had once been dismissed as having no real function. It is now known that repetitive DNA sequences play an important role in genetic control and regulatory processes. Also, variations (polymorphisms) in these repetitive sequences have been associated with certain types of neurological disorders. So, separating "wheat from the chaff" will also constitute an important component of post-sequence genomics. This is particularly relevant for disease-based polymorphism studies in Indian populations.

The DBT kick-started its functional genomics activities five years ago at the Delhi-based Centre for Biochemical Technology (CBT) - now renamed as the Institute of Genomics and Integrated Biology - helping the institute acquire computational facilities to handle the voluminous human genome sequence data and establish a laboratory for genomics research. In early 2001, the DBT announced plans to earmark up to Rs.250 crores over the next five years on functional genomics, microbial genomics and pharmaco genomics aimed at genomics-based drug-design. The DBT supported research teams at various centres today are already involved in sequencing microbes such as Mycobacterium tuberculosis, Helicobacter pylori and Hepatitis C virus.

While India failed to participate in human genome sequencing, the DBT's initiative resulted in India joining the International Rice Genome Sequencing Project (IRGSP) in June 2000. India chose to sequence a part of chromosome 11 of rice (Oryza sativa). It has invested Rs.48.83 crores for the "Indian Initiative for Rice Genome Sequencing (IIRGS)". The IIRGS is a joint effort by the Department of Plant Molecular Biology (DPMB), University of Delhi South Campus (UDSC) and the National Research Centre on Plant Biotechnology (NRCPB) of the Indian Agricultural Research Institute (IARI), New Delhi. The Indian share of chromosome 11 has been equally divided between these two centres. The rice genome sequencing is expected to be completed in 2005, in three phases. Phase II was completed in December 2002.

The target given to India with regard to sequencing chromosome 11 has been achieved. In fact, the total 15.4 Mbp (million base pairs) sequence assembled and aligned (with > 10 times coverage) was more than the targeted 14.9 Mbp. For this at least 428,000 sequencing reactions were carried out, which generated about 214 Mbp primary sequence data. The Phase II sequence of the entire rice genome, along with the Indian contribution, was declared officially on December 18, 2002 in Tokyo, the headquarters of the IRGSP. The sequences were analysed with the help of various gene prediction programmes, and more than 2,500 genes with a frequency of 1 gene per 6 kbp were predicted in the region of chromosome 11 sequenced by India. Several genes pertaining to abiotic stress tolerance, disease resistance, cellular networking and transcription regulation were identified. In Phase III, work will continue to fill physical and sequencing gaps, to enrich and validate the sequence as well as for precise annotation to finish the sequence up to Phase III level and identify genes of agronomic importance.

The genome data of any organism that becomes available after the completion of sequencing of a given genome is raw. That is, without annotation that imparts meaning to DNA sequences and adds value. While what annotation is meant to indicate is known in broad terms, there are no set standards or paradigms for annotation. Basically, the purpose of annotation is to denote significant features - genes, promoters, pseudogenes, expressed sequence tags (ESTs) and short tagged sites (STSs), tandem repeat sequences and conserved matches in genomes of other species - in graphic as well as written forms. It is clear from the above that bio-informatics - generating and handling genomic databases - and computational genomics - locating genes and identifying their sequences through computer algorithms - are as important areas of research as `functional genomics' is.

Given this whole gamut of characteristic features of coding and non-coding DNA sequences - the latter too are believed to play a regulatory role - it is obvious that database development will be central. The present challenge is to improve database design, software for database access and manipulation and data-entry procedures that are compatible with the diverse computational platforms that are used in different laboratories. From the perspective of the focus of the DBT's genomics initiative, new databases and analytical tools need to be developed for studying gene-expression and functional data, for modelling complex biological networks and interactions, for collecting and analysing polymorphisms (genetic sequence variations) data and linking genotypic data to phenotypic data (characteristic features of populations).

Habenaria intermediaOrhis latfolla

Given the data at the genomic DNA level - which is what the genome sequence gives - the problem from a researcher's perspective is the following: given a genomic landmark - gene, disease loci, markers, mutations or single nucleotide polymorphisms (SNPs), deletions/insertions and so on - what is already known about this landmark? Where is it located? Which other landmarks are situated around this landmark? Are there similar (homologous) chromosomal regions in the genomes of other organisms, say mouse? Clearly, one should know which of the existing databases have these information and enable search for the particular type of landmark and neighbouring landmarks. Or, develop software for the kind of search one is interested in.

The bioinformatics research today is focussed on developing data warehousing techniques, relational database designs, data mining techniques from single or multiple databases, annotation, user-friendly query systems, graphic software for generating outputs. Moreover, this cannot be a stand-alone effort. The Information Technology effort has to be coordinated with mathematical and computational efforts in the development of algorithms and methodology for data mining, pattern recognition and medical-biological or functional genomics efforts that involve clinical databases and genotypic-phenotypic relational databases and three-dimensional modelling of gene structures. In anticipation of an increasing use of IT tools in biology, the DBT initiated a bioinformatics programme in 1987. It first set up a Biotechnology Information System (BTIS) that linked nine universities and laboratories where biotechnology research was under way. The network today connects 57 institutions.

The DBT has also set up bio-computing facilities for interactive graphics and molecular modelling at six laboratories for research on protein structures, docking algorithms for molecular recognition, identification of active sites in proteins and drug design.

The DBT has estimated that between 2000 and 2005, it will spend Rs.90 crores on bioinformatics research and infrastructure. It has sought government approval to set up a National Bioinformatics Institute and Teraflop scale supercomputing facilities at selected bioinformatics centres. A National Supercomputing Facility has recently been established at the Indian Institute of Technology-Delhi towards in-silico (on computer) drug development by using bioinformatics resources. The DBT has also set up a high-speed, large bandwidth virtual private network (VPN) linking bioinformatics centres in public-funded institutions. At present 11 institutions have been linked through VPN. This network is called BIOGRID INDIA. As part of BIOGRID, mirror sites of internationally recognised databanks such as genome data bank, protein data bank, plant genome data bank, databases of European Bioinformatics Institute (EBI) and public domain bioinformatics software packages have been established. These sites are designed to act as `knowledge pathways' for discoveries in biotechnology. The DBT estimates that the BTIS centres' resources are being used by more than 12,000 scientists around the country.

In August 2000, a brainstorming session organised by the DBT led to the biological community recommending projects beyond the immediate follow-up research in functional genomics and bioinformatics in the post-genome era. As a result of these recommendations, the DBT has initiated a number of projects in the areas of microbial, structural, functional and computational genomics and proteomics. Some of these have already begun to show some exciting results with potential applications.

In microbial genomics, under the project titled "Comparative and functional genomics approaches to identify and characterise genes responsible for multi-drug resistance of M. tuberculosis," scientists have designed probes for the target genes known as H37RV. The probes have been developed using a collection of M. tuberculosis DNA samples from 23 countries. Under another multi-centric programme, "Sequencing and comparative genomics of H. pylori isolated from Indian patients", supported at five centres, the genomic DNA of H. pylori isolates from different ethnic groups in the country has been made. This represents well-characterised patient isolates with duodenal ulcers and gastric carcinoma from India. In addition, genomes of isolates from six other countries have been compared.

A multi-centric programme on structural genomics of microbial pathogens has been supported by the department in which cloning and expression of 40 already identified genes pertaining to the relevant pathways of M. tuberculosis are being carried out. The programme involves the crystallisation of proteins and the determination of X-ray structural analysis on the crystallised proteins. A powerful web-based tool has also been developed for the sequence/structure analysis of M. tuberculosis proteins with high prediction accuracy.

In the functional genomics area, scientists are evaluating a strategy employing dual gene knock-outs on gene function in organisation. Molecular genetic studies of Type II diabetes and diabetic retinopathy has been undertaken. This may help identify a set of polymorphisms, genetic and protein expression profiles that dictate the clinical expression of insulin-resistant/Type II diabetes. A programme on Proteomics of Mycobacteria has identified a unique protein, which appears to have a role in the survival of M. tuberculosis in macrophages.

In the ongoing programme on bio-computing, theoretical simulations of DNA-based devices have been carried out and micro and nano-electrodes have been designed. A software has been developed for encoding and decoding any image or text in the DNA sequence. DNA-based arithmetic has also been developed under the project. At the bio-supercomputing facility, drug design is carried out and a software entitled Gene to Drug - A Proof of Concept has been developed. This was officially released on July 31, 2002.

The 10-year vision document of the DBT, released in January 2001, notes that modern biotechnological research has a long gestation period and a long innovation chain, which call for industrial partners or enhanced public investment. In the areas of genomics and bioinformatics, it has said that basic research would be more time-targeted and related to the identified products. The department expects the genomics and bioinformatics infrastructure and networking to be completed in two to three years, which means by the next year. After the completion of five years (that is, January 2006), about 1,000 trained experts in bioinformatics are expected to be available with a large number of databases and with abilities in data mining, data annotation, and comparative and functional genomics.

Sign in to Unlock member-only benefits!
  • Bookmark stories to read later.
  • Comment on stories to start conversations.
  • Subscribe to our newsletters.
  • Get notified about discounts and offers to our products.
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide to our community guidelines for posting your comment