More accessible version
IMBB > GROUPS > CBL > Research  Site Map.Search.Help.GreekEnglish
.Printer friendly version

Research in Gene Expression Data Processing

Gene transcription and translation products, mRNA molecules and proteins are the main factors which define in each moment the status of the cell, the tissue and generally the status of the whole organism, through complex interactive networks. Microarray technology is a recent achievement that has permitted the simultaneous monitoring of thousands of gene expression profiles. This technology has led to an explosion of available data, on a genome-wide scale. However the transformation of this information to biological information which can be used for biological or medical inferences is a challenging task. Microarrays provide a huge amount of noisy data. The utilization of these data requires a combination of biological knowledge, statistics, machine learning, and the development of efficient algorithms that are able to select the useful features.

Our lab work in this area focuses on the development of novel feature extraction and clustering/classification algorithms which can be used for the discrimination of different groups of samples, e.g. normal vs disease samples, different disease samples or different subgroups of a disease. Furthermore, selected features can serve as molecular markers for the prediction of the outcome of a disease and for the identification of unknown processes that are involved in the generation and progression of a disease.

Implemented methodologies include information theoretic approaches, multidimensional scaling, signal-to-noise statistical and neural network methods as well as ad hoc algorithms. We also develop new and modify existing clustering (k-NN, hierarchical, SOM) and classification (SVM, Bayesian networks, ANN) algorithms for the categorization of expression data. We aim to study the complexity and performance of these algorithms on microarray data as functions of the type of input data, size of input set, degree of correlation between input features, number of categories, signal-to-noise ratio in data etc. We mainly focus on applying and extending our current expertise in developing novel, biologically inspired neural network architectures and learning rules for supervised classification tasks.

Furthermore, we are in the process of compiling a user-friendly, software package incorporating a set of efficient and reliable computational methods addressing all stages of microarray data analysis from normalization to categorization and identification of marker features. This tool will be available to both theoreticians and experimentalists to investigate questions like locating potential prognostic gene markers, predicting gene function, discovering gene network interactions, identifying subcategories of diseases etc. Towards this target we collaborate with experimentalists and bioinformaticians both locally in IMBB and the Institute of Computer Science (ICS) at FORTH as well as the International Agency for Research on Cancer. The test bed application for this prototype is a project for the identification of prognostic gene markers in breast cancer. The software package will be used to analyze gene expression data and identify genes that could be used as prognostic markers for the outcome of the disease, i.e. the probability of cancer recurrence, based on the patient’s genetic and clinical profile.

Ongoing Projects:

1. Expression Profiling for Breast Cancer Incidence - The Prognochip project

2. Classification of Astrocytic Tumours into their Malignancy Grades using Neural Networks

3. Identification of Informative Genes for Class Prediction in Cancer