CABGen: Research activities



Today, the understanding of the biological mechanisms involved in the onset of human diseases due to multifactorial inheritance requires the use of biostatistical and bioinformatics approaches that can interpret and integrate the huge amount of information obtained through approaches such as "genome-wide" study. The recent technological developments, that have made it possible to understand much of human diversity at genotypic and transcriptional level, have opened the way for a new era of human genetics aimed at the identification of genetic determinants, at the understanding of their interactions and at the molecular mechanisms involved in the onset of multifactorial diseases.

In this context, the aim of the current researches is to apply the new methods of statistical genetics and bioinformatics to:

  • the study of cerebrovascular diseases, in particular to the characterization of the genetic determinants implicated in the different forms of stroke;
  • the identification of alterations in molecular mechanisms, such as gene expression or alternative splicing, which are susceptibility factors for complex genetic diseases.


Development of bioinformatic tools for the management and analysis of "genome-wide" data

The use of biostatistical and bioinformatics approaches are becoming fundamental in handling, interpreting and integrating the huge amount of information obtained by experimental approaches of  "genome-wide" analysis. Recent technological developments in the study of gene expression, in the genotyping of entire genomes and next generation sequencing, require the acquisition of new bioinformatics infrastructure and the development of new approaches to understand the biological phenomena at a global level.



In order to contribute to the understanding of the molecular mechanisms involved in the biological processes of human cells, we have developed a new bioinformatics tool: CorrelaGenes. This instrument, using the expression data derived from microarray experiments publicly available in the database Gene Expression Omnibus (GEO), allows the identification of groups of genes that show similar expression profiles in different experimental conditions.

CorrelaGenes uses an algorithm for finding association rules (Association Rule Mining: ARM) to identify genes that are frequently co-expressed in the dataset extracted from GEO database. In order to improve the biological significance of the results, the algorithm was modified to identify association rules that involve only two genes, one of which, called Target, is defined as a parameter in the initial search. This modification adds to the ARM standard technique a driven approach to identify a list of genes frequently co-expressed with the Target gene suggesting their coordinated action in the same biological processes. 

The preliminary results obtained from CorrelaGene in a simulation with 15 genes Target (ACTG1, AFF3, APOE, APP, CDC5L, DIAPH2, EMD, FOXO1, HIF1A, IL8, MAPT, PRFP19, PSEN1, PSEN2, PTPN22) showed a good agreement with the data available in the literature contributing to the characterization of transcriptional profiles of these genes integrating the results that can be obtained from other publicly available tools.


Identification of genetic risk factors and molecular mechanisms in the onset of cerebrovascular disease

The majority of human diseases has, among the elements that determine their pathogenesis, a genetic component. In the Mendelian inheritance disorders the transmission of the mutant allele, by a mechanism of autosomal  recessive, dominant or X-linked inheritance, is sufficient to determine the phenotype. However, most of the diseases does not follow this pattern of inheritance but the pathology is caused by the interaction of multiple genetic and environmental factors that determine complex multigenic disorders.

In the study of these diseases, an important contribution can be obtained by the genetic association analysis that helps to determine the statistical correlation between a specific allele of a genetic marker and a quantitative or discrete trait, of the pathological phenotype (Cordell and Clayton, 2005). The association analysis, compared to the linkage studies, has a greater power to identify alleles with small effects but requires a large number of markers. There are different types of association studies but the most common are the case-control studies that analyze a large number of individuals affected by the disease under study (cases) and a large number of individuals from the general population (controls).

Since 2005, thanks to methodological advances in sequencing of the human genome and the progressive characterization of human genetic variability through the identification of many single nucleotide polymorphisms (SNP), it was possible to conduct association studies on the entire human genome (genome-wide association studies - GWAS), and  to identify many genetic variants that contribute to determine the phenotype of several diseases.

Among the complex diseases, cerebrovascular diseases are a health issue of considerable importance for western countries. In developed countries, stroke is the third leading cause of death and a leading cause of neurological disability. Basically there are two types of stroke: ischemic, caused by the occlusion of a blood vessel, and hemorrhagic, in which there is the breaking of the vessel itself. About 80% of strokes are ischemic. Epidemiological data indicate that the risk of stroke has a genetic component, however, this is difficult to identify since there are involved many alleles, each with a small phenotypic effect. Moreover, the effects of some alleles are limited to one or few subtypes of stroke and the size of the effect may vary according to sex and ethnicity of individuals.

This research program aims to analyze the genotypic data obtained from a series of more than 700 patients suffering from different forms of stroke (these data come from the collaboration with the Neurological Institute C. Besta in Milan) and more than 950 healthy control individuals. Each individual was typed for more than 600,000 markers through microarray Illumina Human 610-Quad BeadChips and Human660W-Quad BeadChip. The ongoing study is aimed at: (1) the validation of the risk variants for the onset of ischemic stroke identified in the Italian population, (2) the genome-wide association study to identify new factors of risk, (3) the application of new methodologies for study of association based on the analysis of pathways (pathway-based analysis of genome-wide data).


Analysis of the expression profiles of human genes under different experimental conditions

The group CABGen is particularly interested in the study of gene expression profiles obtained by the analysis of microarray as needed from many research groups inside the Institute. In particular, analyzes were conducted on the variations of expression: (1) in a cellular system that recapitulates the stages of neoplastic transformation in collaboration with Dr. C. Mondello (Ostano et al., 2012), (2) in a cellular system of the defective gene product LIG1 in collaboration with Dr. A. Montecucco, (3) in a cellular system developed in the laboratory of Dr. G. Biamonti, in which different growth conditions determine specific changes at the level of splicing mechanisms and chromatin organization.


Statistical Analysis

The statistical data analysis is an essential step of many scientific researches. Cabgen has developed a strong expertise in the field of statistical and mathematical analysis of biomedical, molecular and population data which led to several collaborations with various universities and laboratories within the IGM itself.

The statistical analyzes focused at different stages of the following studies:

  • analysis of the influence of the gene COMT (catechol-O-methyltransferase) as a genetic risk factor for cognitive impairment in a sample of healthy subjects, patients with Alzheimer's disease (AD) and subjects with mild cognitive impairment (MCI). All subjects were analyzed for the polymorphism rs4680 COMT and APOE genotype. It was demonstrated an association between COMT genotype GG (Val / Val) and APOE epsilon4 and the risk of AD and MCI. In particular, when GG genotype is included in a multinomial analysis, the risk of AD and MCI for the ApoE allele epsilon4 is increased about 2-3 times, and, moreover, the risk conferred by the combination of G and epsilon4 alleles is more pronounced in male patients. (Lanni et al., 2012);
  • analysis of the long-term outcome and engraftmenout of donor cells in patients with Wiskott-Aldrich syndrome (WAS) treated with hematopoietic cell transplantation (HCT).This is a retrospective study and derive from the collaboration of several international medical centers which sampled in the period 1980-2009. The study showed a significant improvement after HCT for WAS patients with important implications for the development of new protocols that aim to achieve full recovery of the disease and reduce complications post-HCT (Moratto et al., 2011);
  • application of a mathematical-statistical technique (Self-Organizing Maps, SOM) to the study of the geographical distribution of the frequency of 77,451 different surnames (17,579,891 individuals) obtained from the lists of the Italian telephone subscribers in the year 1993, with the aim to automatically identify the geographical origin of the individuals carrying the specific surnames. A database with the origin of 49,117 different surnames has been created that can be of great help in the early stages of sampling of individuals enrolled in the Italian epidemiological and molecular analysis (Boattini et al., 2012);
  • study of the evolution of the Y chromosome haplogroup G for the identification of evolutionary patterns of European populations and Caucasian (Rootsi et al., 2012);
  • identification of molecular markers of apoptosis, cellular stress and DNA damage in lymphocytes of patients with multiple sclerosis (MS). The results showed an increase of PAR and gammaH2AX markers in MS patients than in healthy subjects. In addition there was a positive correlation between the level of aggression and gammaH2AX in MS patients. (Grecchi et al., 2012);
  • study of post-transcriptional modifications of the protein encoded by the gene SRSF1 due to replicative stress (Lea et al., 2012).


Copyright © 2014 Home_en