of 15
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
  RESEARCH ARTICLE Open Access A Boolean-based systems biology approach topredict novel genes associated with cancer:Application to colorectal cancer Shivashankar H Nagaraj, Antonio Reverter * Abstract Background:  Cancer has remarkable complexity at the molecular level, with multiple genes, proteins, pathwaysand regulatory interconnections being affected. We introduce a systems biology approach to study cancer thatformally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biologyand, as a proof of concept, we apply it to colorectal cancer. Results:  We first classified all the genes in the human genome into cancer-associated and non-cancer-associatedgenes based on extensive literature mining. We then selected a set of functional attributes proven to be highlyrelevant to cancer biology that includes protein kinases, secreted proteins, transcription factors, post-translationalmodifications of proteins, DNA methylation and tissue specificity. These cancer-associated genes were used toextract  ‘ common cancer fingerprints ’  through these molecular attributes, and a Boolean logic was implemented insuch a way that both the expression data and functional attributes could be rationally integrated, allowing for thegeneration of a guilt-by-association algorithm to identify novel cancer-associated genes. Finally, these candidategenes are interlaced with the known cancer-related genes in a network analysis aimed at identifying highlyconserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach usingcolorectal cancer as a test case and identify several novel candidate genes that are classified according to theirfunctional attributes. These genes include the following: 1) secreted proteins as potential biomarkers for the earlydetection of colorectal cancer ( FXYD1 ,  GUCA2B, REG3A ); 2) kinases as potential drug candidates to prevent tumorgrowth ( CDC42BPB, EPHB3, TRPM6 ); and 3) potential oncogenic transcription factors ( CDK8 ,  MEF2C, ZIC2 ). Conclusion:  We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficientlypredicts novel cancer-associated genes and has universal applicability to the study and advancement of cancerresearch. Background Cancer is a complex genetic disease that exhibitsremarkable complexity at the molecular level with mul-tiple genes, proteins and pathways and regulatory inter-connections being affected. Treating cancer is equally complex and depends on a number of factors, includingenvironmental factors, early detection, chemotherapy and surgery. Cancer is being recognized as a systemsbiology disease [1,2], as illustrated by multiple studies that include molecular data integration and network andpathway analyses in a genome-wide fashion. Such stu-dies have advanced cancer research by providing a glo-bal view of cancer biology as molecular circuitry ratherthan the dysregulation of a single gene or pathway. Forinstance, reverse-engineering of gene networks derivedfrom expression profiles was used to study prostate can-cer [3], from which the androgen-receptor (AR)emerged as the top candidate marker to detect theaggressiveness of prostate cancers. Similarly, sub-networks were proposed as potential markers ratherthan individual genes to distinguish metastatic fromnon-metastatic tumors in a breast cancer study [4]. Theauthors in this study argue that sub-network markers * Correspondence: Tony.Reverter-Gomez@csiro.auComputational and Systems Biology, Commonwealth Scientific and IndustrialResearch Organisation (CSIRO), Division of Livestock Industries, QueenslandBioscience Precinct, 306 Carmody Road, St. Lucia, Brisbane, Queensland 4067,Australia Nagaraj and Reverter  BMC Systems Biology   2011,  5 :35 © 2011 Nagaraj and Reverter; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (, which permits unrestricted use, distribution, andreproduction in any medium, provided the srcinal work is properly cited.  are more reproducible than individual marker genesselected without network information and that they achieve higher accuracy in the classification of meta-static versus non-metastatic tumor signaling. Using gen-ome-wide dysregulated interaction data in B-celllymphomas, novel oncogenes have been predicted in-silico  [5]. Finally, taking a signaling-pathway approach,a map of a human cancer signaling network was built [6]by integrating cancer signaling pathways with cancer-associated, genetically and epigenetically altered genes.Gene expression profiling has been widely used toinvestigate the molecular circuitry of cancer. In particu-lar, DNA microarrays have been used in almost all of the main cancers and promise to change the way canceris diagnosed, classified and treated [1]. However, expres-sion analyses often result in hundreds of outliers, or dif-ferentially expressed genes between normal and cancercells or across time points [2]. Owing to the large num-ber of candidate genes, several different hypotheses canbe generated to explain the variation in the expressionpatterns for a given study. In addition, the preferentialexpressions of some tissue-specific genes present addi-tional challenges in expression data analyses. Neverthe-less, recent systems approaches have attempted toprioritize differentially expressed genes by overlayingexpression data with molecular data, such as interactiondata [3], metabolic data [4] and phenotypic data [5]. Human malignancies are not just confined to genesand gene products, but also include epigenetic modifica-tions such as DNA methylation and chromosomal aber-rations. However, in order to effectively capture theproperties that emerge in a complex disease, we needanalytical methods that provide a robust framework toformally integrate prior knowledge of the biologicalattributes with the experimental data. The simplestheuristic will search for novel genes with a profile, interms of differential expression and/or network connec-tivity, similar to those for which an association to dis-ease has been well established (see, for instance, theapproaches of [7,8]). Boolean logic has been found to be optimal for suchtasks. Within the context of cancer, Mukherjee andSpeed [9] show how a series of biological attributesincluding ligands, receptors and cytosolic proteins, canbe included in the network inference. More recently,Mukherjee and co-workers [10] introduced an approachbased on sparse Boolean functions and applied it to theresponsiveness of breast cancer cell lines to an anti-cancer agent. In addition, large scale literature-basedBoolean models have been used to study apoptosis path-ways as well as pathways connected with them.In this study, we propose a systems biology approachto predict disease-associated genes that are either notpreviously reported (novel) or poorly characterized andusing colorectal cancer as a case study. To achieve thisgoal, we first implemented a Boolean logic schemaderived from cancer-associated genes and developed aguilt-by-association (GBA) algorithm, which is subse-quently applied in a genome-wide fashion. Althoughgene expression data are central to this approach, otherbiologically relevant functional attributes, such as tissuespecificity, are treated as equally important in the Boo-lean logic informing the GBA algorithm. Finally, novelcancer-associated genes are interlaced with the knowncancer-related genes in a weighted network circuitry aimed at identifying highly conserved gene interactionsthat impact cancer outcome. Results and Discussion Overview of the systems biology approach Figure 1 shows the schema of the proposed analyticalapproach. The first phase deals with the analysis of geneexpression data to obtain a list of differentially expressedand condition specific genes. Conceptually, differentially expression differs from condition specificity in that theformer requires the postulation of a contrast of interestwhile the latter enriches for genes that are preferentially expressed in one of the (potentially many) experimentalconditions being considered. Nevertheless, the expecta-tion is for a substantial overlap in the genes identifiedbetween either criterion. In the second phase, publicdatabases are mined to compile a list of cancer-asso-ciated genes, non cancer-associated genes and functionalattributes that are of relevance in the context of cancer.We considered a total of six functional attributes as fol-lows: tissue specificity (TS), transcription factors (TF),post-translational modifications (PTM), kinases (KIN),secreted proteins (SEC) and CpG island methylation(MET)(see Additional File 1 for rationale behind choos-ing these attributes). Table 1 summarizes the generalcharacteristics of the functional attributes with a few prototypic examples of representative genes. AdditionalFile 2 provides the list of 749 cancer-associated genesthat we compiled within each attribute. These featureswere selected based also on the fact that there is astrong functional interconnection among them andtherefore we see the overlapping of these genes acrossattributes.The resulting set of variables (differentially expression,condition specificity, and the six functional attributed)are each binarized and used in a Boolean logic frame-work. The Boolean logic is then applied to cancer-associated genes to develop a GBA algorithm. Whenapplied to non cancer-associated genes, the GBA algo-rithm preferentially ranks those genes whose behavioracross all variables most mimics that of cancer-asso-ciated genes. Finally, in order to gain a global under-standing of the novel candidate genes, we generate a Nagaraj and Reverter  BMC Systems Biology   2011,  5 :35 2 of 15  series of gene co-expression networks. The resultingnetworks are surveyed with a focus on the interactingpartners of candidate genes and within the context of the srcinal functional attributes. Differentially expressed and condition specific genes We explored three measures of differential expression(DE1 = Carcinoma - Normal; DE2 = Carcinoma - Ade-noma; and DE3 = Carcinoma - Inflammation) and iden-tified 444, 658 and 179 differentially expressed genes forDE1, DE2, and DE3, respectively. We observed severaloverlaps among the three differentially expressed genecategories, and 15 genes were found to be differentially expressed in all three categories (Figure 2). Amongthem, we highlight  CLCA4, CRNDE, DEFA5, DUOXA2,GCG, KLK10  , and  UGT2A3 . In particular,  CRNDE   (col-orectal neoplasia differentially expressed) was the mostdifferentially expressed (up-regulated) gene with a 16-fold change in expression.  CRNDE   gene is localized tochromosome 16 (16q12.2) and is poorly characterizedwith no functional information on its role in colorectalcancer except its differential expression from the ESTdata (UniGene Id: 167645). Another differentially expressed gene  KLK10   is a member of the kallikreingene family which is well documented biomarker for thedetection of colon, ovarian and pancreatic cancers[8,11]. In addition, we identified 83, 61, 23, and 48 conditionspecific genes for Normal, Adenoma, Carcinoma andInflammation, respectively. Among these genes, 23 werefound to be specific to carcinoma (CS3) (see AdditionalFile 1 Table S1). Notably,  CCDC3, EREG, IL6, PAPPA , SERPINE1, TFPI2  and  THBS2  are a few examples of thecondition specific genes that appeared as top candidates. Figure 1  The schema for the identification of novel genes associated with complex diseases . The expression profiles from the cancer dataare analyzed to predict differentially expressed and condition-specific genes. The functional attributes over-represented in cancer are selectedand representative datasets from public resources mined. The common cancer fingerprints from cancer-associated genes are processed throughBoolean logic to develop a guilt-by-association classifier which, applied to non-cancer-associated genes, predicts novel candidate cancer-associated genes. Finally, novel candidate genes are further analyzed using network theory approaches. Nagaraj and Reverter  BMC Systems Biology   2011,  5 :35 3 of 15  In particular,  CCDC3  (coiled-coil domain containing3) and  TFPI2  (tissue factor pathway inhibitor 2) geneswere the most carcinoma-specific genes. TFPI2  has beenproposed to be a tumor suppressor gene as it ’ s fre-quently methylated in colorectal cancer [7]. The  CCDC3 encoded protein is predicted to be localized to extracel-lular matrix [12] with no previous association with col-orectal cancer. Higher IL-6 levels might be prognosticindicator in colorectal cancer as they are associated withincreasing tumor stages and tumor size, with metastasisand decreased survival [13].Expression-profiling analyses often result in hundredsof candidate genes. The challenge is exacerbated whenthe expression data are gathered at different time pointsor in multiple conditions, as in the current study with anumber of differentially expressed and condition specificgenes. Nevertheless, it is a common practice to stop the in-silico  expression analysis with the list of outliers andselect one or more genes for experimental characteriza-tion based on the underlying biology. Often, expressiondata analyses are accompanied by downstream bioinfor-matics investigations such as Gene Ontology (GO) Table 1 Overview of the genetic, epigenetic and molecular information used in this study FunctionalAttributeRole in Cancer Potential application Examples Data source Reference CancerassociatedgenesGenes with at least 2 mutations incausally implicated in cancer.Includes oncogenes, tumorsuppressor genesPotential drug targets anddiagnostic or prognosticmarkersOncogenes:  BCL2, c-Jun, ERG,ERBB2, RAS, c-MYC, c-SRC   Tumor Suppressor Genes: RB1, P53, APC, BRCA-1 , BRCA-2 genetics/CGP/ Census/ Reviews:(Futreal et al,2004; Hahn et al,2002; Mitelman,2000; Vogelsteinet al, 2004)NANon-cancerassociatedgenes There is no previous report of anycausal mutation.If cancer association isestablished, these genes areeither potential drugtargets and diagnostic orprognostic markers  AMN, B3GNTL1, CDC42BPBS100A9, TRPM6, VNN1, ZIC2 NCBI - HumanGenome projects/genome/ guide/human/ NAKinases More than 30% of cancer relatedgenes are kinases and the mostcommon domain that is encodedby cancer genes is the proteinkinase domainDrug targets throughinhibitors c-Src, c-Abl, RAS , mitogen activatedprotein (MAP) kinase,phosphotidylinositol-3-kinase(PI3K),  AKT  , and the epidermalgrowth factor receptor (EGFR)Human KinomeConsortium human/kinome/ [15][17,51] Excretory -SecretoryproteinsMalignant tumors secreteincreased levels of ES proteinsnon-invasive diagnostic orprognostic markers for earlydetectionalpha-fetoprotein,  CD44 , kallikrein6, kallikrein 10,  MIC-1 Secreted ProteinDatabase (SPD) [52,53][54][55] TranscriptionfactorsOveractivity of TFs at differentstages of cancer is welldocumented and novel treatmentstrategies have been suggestedfor targeted inhibition of oncogenic TFsAlternative therapeuticstrategy, potential drugtargets C-MYB, NF-kappaB, AP-1, STAT   and ETS  transcription factorsGenomatix [15,56] [57][58]DNAMethylationMethylation patterns are altered incancer cells as shown inhypomethylation of oncogenesand hypermethylation of tumorsuppressor resulting in genesilencing or gene inactivationCpG island methylationcould be used as abiomarker of malignantcells hMLH1, BRCA1, MGMT, p16(INK4a), p14(ARF), p15(INK4b, DAPK, APAF-1 Human ColonMethylome from[29][27,59] [28][60,61]Post-translationalmodificationsKey proteins driving oncogenesis,Can undergo PTM AlthoughPhosphoryltion is partially coveredin kinases section, other PTMssuch as glycosylation andubiquitination reported to play arole in malignancies, are includedseparate functional geneattributes. BRCA1, EGFR, c-Src, c-Abl, RAS, TP53  HPRD [18]Burgerand Seth,2004) Nagaraj and Reverter  BMC Systems Biology   2011,  5 :35 4 of 15
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!