图片复现-ONECUT2 is a driver of neur

作者: Juan_NF | 来源:发表于2019-04-01 00:17 被阅读0次

图片复现-ONECUT2 is a driver of neur
文献阅读-ONECUT2 is a driver of neur
扩展记录(NLP)
神经网络的设置步骤
图神经网络是怎么炼成的：GNN基本原理简介
图片复现：ggraph互作网络图
论文阅读“k-Nearest Neighbor Augmente
网站问题一览
读后感：Controlling Perceptual Facto
这本5分+中科院2区期刊，审稿快，无版面费

1 术语？

NEPC-neuroendocrine prostate cancer 神经内分泌前列腺癌
PCa-prostate cancer 前列腺癌
t-NEPC-treatment-emergent NEPC (t-NEPC)
hypoxia-directed therapy

2 数据

PCa Beltran data set
---All BAM files and associated sample information are described in Supplementary Table 11; data are deposited in dbGap phs000909.v.p1 and accessible on the cBIO Portal for Cancer Genomics.

使用[RSEM]处理和标准化来自TCGA的RNASeqV2 以产生TPM（每百万转录物）。
原始数据太大了，而且貌似，我们一般没有dbGAP的下载权限？
所以我使用的数据是data_RNA_Seq_expression_median.txt中的数据

image.png

dbGAP

image.png

PCa Lin data set

----?LTL545| 结合了临床队列？ ------先不处理

GPL14450的数据---quantile normalization

因为是芯片数据，所以需要找到对应平台的注释文件，我是从GEO下载的对应的注释txt
https://www.ncbi.nlm.nih.gov/geo/browse/

image.png

CCLE数据

CCLE: Lung Cancer
CCLE: Nervous system tumor

SCLC=lung+small_cell+ATCC+Gender（F/M）-note 重复 38
NSCLC=lung-small_cell-large_cell+ATCC|ECACC+Gender（F/M）-note 重复 71
Neuroblastoma=neuroblastoma+Gender（F/M）-note 重复 11
glioma=glioma+Gender（F/M）-note 重复 33

zcat CCLE_RNAseq_genes_rpkm_20180929.gct.gz |sed -n '3p' > cell_line.txt
awk '{for(i=1;i<=NF;i++){a[FNR,i]=$i}}END{for(i=1;i<=NF;i++){for(j=1;j<=FNR;j++){printf a[j,i]" "}print ""}}' cell_line.txt  > tcell_line.txt
cat > num.sh
cat $1|while read line
do
  cat tcell_line.txt|grep -n ${line} >>$1_num.txt
done
#####此处有教训，scc.txt是在window里从excel筛选出来粘贴得到的，然后传到服务器，这里的格式不是unix格式，在grep过程中一直没有结果，在notepad++转成unix格式后，再传到服务器，运行脚本，才有结果；$1这里是指我在windows里根据文章描述筛选出来的细胞系的txt；这里是要把对应的列取出来，之后方便用cut函数将对应的细胞系的表达情况的列取出来
cat > target.sh
cat $1|while read line
do
echo $line > line.txt
num=`cut -d ':' -f 1 line.txt`
col=`zcat CCLE_RNAseq_genes_rpkm_20180929.gct.gz|cut -f ${num} -`
echo $col > line1.txt
paste line1.txt  >>$1_target.txt
done
#####这里是要根据上一步的列号，进行cut操作，echo之后，就是行的模式，可以重定向

zscore
For mRNA and microRNA expression data, we typically compute the relative expression of an individual gene and tumor to the gene's expression distribution in a reference population. That reference population is all samples that are diploid for the gene in question (by default for mRNA), or normal samples (when specified), or all profiled samples . The returned value indicates the number of standard deviations away from the mean of expression in the reference population (Z-score). This measure is useful to determine whether a gene is up- or down-regulated relative to the normal samples or all other tumor samples.
the z-scores are calculated using only patient data. Hence, overexpressed in this case implies higher expression than the average patient.

3 R部分

Wilcoxon test was used to calculate p-value in every comparison and Benjamini-Hochberg adjustment was conducted to assess the false discovery rates (FDR) of multiple comparisons. Genes co-up-regulated (fold change >2 and FDR < 0.05) in NE vs.non-NE comparisons of all the four data sets were subjected to the following network analysis.