总结:对数据库的利用,分析泛癌的TF,和疾病特异的TF。TF家族与疾病之间的联系。
Abstract
Transcription factors (TFs) act as key regulators in biological processes through controlling gene expression.
We revealed that the average expression levels of TFs in normal tissues were lower than 50% expression of non-TFs, whereas TF expression was increased in cancers. TFs that are specifically expressed in an individual tissue or cancer may be potential marker genes.
For instance, TGIF2LX/Y were preferentially expressed in testis and NEUROG1, PRDM14, SRY, ZNF705A and ZNF716 were specifically highly expressed in germ cell tumors.
We found different distributions of target genes and TF co-regulations in different TF families. Some small TF families have huge protein interaction pairs, suggesting their central roles in transcriptional regulation. The bZIP family is a small family involving many signaling pathways. Survival analysis indicated that most TFs significantly affect survival of one or more cancers.
Some survival-related TFs were also specifically highly expressed in the corresponding cancer types, which may be potential targets for cancer therapy.
Finally, we identified 43 TFs whose mutations were closely correlated to survival, suggesting their cancer-driven roles. The systematic analysis of TFs provides useful clues for further investigation of TF regulatory mechanisms and the role of TFs in diseases.
Key Points•
Reveal the expression pattern of TFs in cancer andnormal tissues.
•Different TF families are involved in different diseasesor phenotypes.
•Different distributions of target genes andco-regulation of TFs in different TF families.
•Some survival-related TFs are also specifically highlyexpressed in the corresponding cancer types, whichmay be potential cancer therapeutic targets.
•Reveal that mutations in key TFs are strongly asso-ciated with survival, suggesting their cancer-drivenroles.
Results
TF expression in normal tissues and cancers
Compared with non-TF genes, TF genes tendedto be expressed at lower levels in normal tissues (Figure 1A)
TF expression was higherin seven cancers including liver and lung cancers
the expression of TFs was significantly higher incancers compared with normal tissues
Specifically expressed TFs in normal tissues andcancers
To further analyze the expression specificity of TFs,we identified236 and 476 specifically expressed TF genes (SEG-TFs, see Mate-rials and methods) in normal tissues and cancers, respectively(Figure 2A)
image.pngTarget genes of TF regulation
Wecollected a total of 2 712 247 TF–target gene pairs from hTFtargetdatabase, which integrated comprehensive human TF ChIP-seqdata, involving 542 TFs from 56 TF families and the correspond-ing 19 369 target genes (protein-coding genes).
As a result, 325TFs tended to regulate more than 1000 target genes (Figure 3A)
Co-regulation of TFs
Different TFs may cooperate to co-regulate target genes flexiblyand elaborately [33]. Here we only considered the co-regulationby two TFs, called co-regulated TFs
It was found that 56 TF families were involved in co-regulation (Figure 3E)
TF–protein interactions
we integrated the TF–protein inter-action pairs from HPRD and BioGRID by requiring that the TFand protein expressed in the same tissue or cancer. Finally, weobtained a total of 44 729 TF–protein pairs involving 1430 TFs and8679 interaction partners (Figure 4A)
the distribution of TF–TF interaction pairs (10 212 pairs)was similar to that of TF–protein interactions (Figure 4A)
image.png
Obviously, there wereonly 3 TFs in P53 family,but each TF interacted with nearly 50 TFsfrom 39 TF families (Figure 4B and C and Figure S4), confirmingthe importance of P53 family
TF and diseases
. About 68.34% of human TFs(1138 TFs) were annotated with phenotype data (derived fromAnimalTFDB3.0, MalaCards and Ensembl) and 333 TFs (20%)have KEGG pathway data. Enrichment analysis showed thatthe largest number of TFs was associated with ‘Transcriptionalmisregulation in cancers’ (Figure 5A), which contains 67 TFsfrom 20 TF families including Homeobox and ETS family
image.pngTFs significantly affect cancer survival
Survival-related genes may be potential biomarkers for prognos-tic prediction. We analyzed the correlation between TF expres-sion and prognosis of 33 cancers from TCGA by Kaplan–Meieranalysis. Results showed that 87% of TFs (1448 significant TFs)had significant effects on the prognosis of at least one cancer(log-rank testP<0.05) (Figure 6A)
There were threeTF genes (FOXD1,IRF1andMYBL2) that affected the progno-sis of 11 cancer type
TF mutation
highly mutated genes (≥10% mutation fre-quency in at least one cancer) including TFs were retained forfurther analysis
TFs (80 TFs) was significantly higher (t-test,P=0.0003) thanhighly mutated non-TF genes (1130 genes) in overall samples(Figure S6C). All of the 80 highly mutated TFs and their can-cers are depicted inFigure 7A.
ManyTFs were only highly mutated in a single cancer, for example,the TFGATA3was only highly mutated in BRCA (Figure 7A), andit was reported as a cancer driver gene with high mutations inmetastatic breast cancer
Mutations in 43TFs were significantly associated with cancer survival (P≤0.05)(Figure 7B), of which 33 TFs were highly mutated TFs as showninFigure 7A. Over 70% of TF mutations were associated with lowsurvival (higher risk;Figure 7B).
Correlation analysis revealed that the CNVsof 60 TFs were significantly correlated with their expression(|correlation|≥0.8 and FDR<1E-3) (Figure 7C).
image.png
METHOD
The human TFs studied in this subject were derived from Ani-malTFDB3.0 [8], which contains 1665 fully annotated human TFs.
RNA expression (TPM) data of 37 normal tissues were obtainedfrom the Human Protein Atlas
while the RNA expression(RSEM) data of 33 cancer types and their adjacent samples weredownloaded from TCGA
Expres-sion data from TCGA were removed batch effects by TCGABatch Effects Viewer (https://bioinformatics.mdanderson.org/public-software/tcga-batch-effects/)
TF–target gene pairs weredownloaded from the hTFtarget database
]TF–protein/TFphysical interaction data were derived from HPRD (http://www.hprd.org/) and BioGRID (https://thebiogrid.org/) databases.
Phe-notype data for TFs were obtained from AnimalTFDB3.0 TF-related GWAS phenotype, MalaCards (https://www.malacards.org/) and Ensembl Biomart. KEGG pathway data were collectedfrom KEGG database (https://www.kegg.jp/). Survival data, SNVand CNV of TFs were downloaded from the GSCALite database
Detection of SEGs
We used the tool SEGtool [24] to detect SEGs from wholegene expression matrix data.
In our study, TF genes that were specifically overexpressed orunder-expressed in a single or less than five tissues or cancerswere defined as SEG-H-TFs and SEG-L-TFs, respectively.
Co-regulation analysis
For a gene, we used 50 kb upstream of its transcription startsite (TSS) as the core region to predict candidate co-regulatedTFs.
TFs with high-confidence peaks in the upstream region of TSS of the target gene and the β model score of each peak≥0.517are considered as putative co-regulators for the query gene.
Enrichment analysis of TFs]
Enrichment analyses were performedby the hyper-geometric distribution function phyper() in the Rpackage.
Correlation between expression/mutations and survival
we extracted survival data for TFs among 33 cancers from theGSCALite database
CNV analysis
Analysis found that 1592TFs (95.6% of all TFs) had CNV (CNV was considered to be existedif the percentage of CNV is greater than 5%) in at least one cancer.
网友评论