个人杂记

作者: 深山夕照深秋雨OvO | 来源:发表于2023-11-14 00:52 被阅读0次

讲座杂记(一) --放开那个叫莱布尼兹的同学
讲座杂记(二)
个人杂记2017
个人杂记2019.07
个人杂记2018.5
讲座杂记(五) -从达芬奇的秘密谈起
讲座杂记(四) （上）
讲座杂记(三) --从人性谈起
讲座杂记(六)
杂记两则

cat tmp2 | tr A-Z a-z | sed 's/^\w|\s\w/\U&/g' | tr " " "," > tmp3
首先全部变成小写，然后再首字母大写(tmp2是空格分隔符)

根据指定基因组区域的提取bam，可以使用以下命令

samtools view -hb chr:start-end  wgs.sort.bam > target.region.bam
#根据bed文件来提取
samtools view -hb -L target.bed  wgs.sort.bam > target.region.bam

bedtools intersect -a  wgs.sort.bam  -b target.bed  > target.region.bam

sambamba view -hb chr:start-end  wgs.sort.bam > target.region.bam
#根据bed文件来提取可以用 `sambamba slice `
sambamba slice -L target.bed wgs.sort.bam > target.region.bam

#sambamba slice -L 会是速度最快资源消耗最少的

把gff/gtf转为genebank格式, ref: https://www.biostars.org/p/72220/
The EMBOSS tool seqret would be a possible option.

seqret   -sequence   reference.fasta   -feature   -fformat gff   -fopenfile 1.gff   -osformat genbank   -auto
#但是细节上需要自行修改

awk中的if与else

awk '{if($2<10)print $1"\t"$2-10 ;else print $1"\t"$2+10} input > output

批量生成sed命令行

awk '{print $1"\t"$2}' rename.txt | tr "\t" "#" | awk '{print "sed -i ""'\''""s""#"$1"#""g""'\''"" input"}' > run.sh
sh run.sh
#input就是需要批量sed的文件
#rename.txt有两列，我希望把第一列的内容全部批量替换为第二列

转载自生物信息文件格式中的坐标系以及互相转换
https://www.biochen.org/cn/blog/2020/%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%AD%E7%9A%84%E5%9D%90%E6%A0%87%E7%B3%BB/

生物信息文件格式中有很多格式是基于基因组坐标的，比如常见的BED格式或者GTF格式。然而对于对标系的定义，这两者有着截然的区别。BED格式第一个位置的下标是0，区间前开后闭；而GTF格式第一个位置的下标是1，区间都是闭的。不妨我们称前者为0-based，后者为1-based。0-based的优点是长度的计算很简单，直接相减就可以得到序列的长度；而1-based的优点是比较直观

除了BED格式和GTF格式，下表列举了其他格式的情况。

长度计算

Length(0-based) = End(0-based) - Start(0-based)
Length(1-based) = End(1-based) - Start(1-based) + 1

坐标转换

0-based转1-based
Start(1-based) = Start(0-based) + 1
End(1-based) = End(0-based)

1-based转0-based
Start(0-based) = Start(1-based) - 1
End(0-based) = End(1-based)

关于ChIpSeeker的注释

chip.png

有时候会出现，左边两列(geneChr/transcriptid) 和第三列 (distanceToTSS) 不同的情况
这是因为，左边两列表示的是输入的bed文件比如peak, 是落在哪个基因上
右边也就是第三列则是这个peak, 距离那个gene的TSS最近

如果没有额外的信息，基因的第一个exon的第一个碱基是TSS

Von Neumann Entropy (VNE) index的含义
This likely reflects a more disordered (the highentropy status) and relaxed chromatin architecture at early development (E38 and E80) (Fig.2b).
In agreement with the phenomenon that 3D structure in early mammalian embryos is initially obscure but gradually established throughout development45–47, the relatively loose chromatin folding highlights a highly plastic state for hepatocyte genomes at the early stages of development and may be essential for the rapid functional transitions in the liver before and after birth.

https://doi.org/10.1038/s41421-022-00416-z; Fig. 2a

We observed a significantly higher VNE in the POF stage (0.86, P < 0.016, Wilcoxon rank-sum test) than in the SWF (0.80) and F1 stages (0.79) (Fig. 2a). This is likely due to a more disordered and relaxed chromatin architecture in the POF stage (Fig. 2b), while the architecture is more stable and ordered in mature GCs at the F1 stages, which aligns with the relaxed genome architecture observed during senescence

https://doi.org/10.1038/s41467-021-27800-9 ; Fig. 2a

这句话出自文章https://doi.org/10.1080/19491034.2021.1910437, 文章中有这么一句话，并引用了两篇文献。
这个文章也是提供了一个可以计算VNE参数的工具。
Biologically, genomic regions with high entropy likely correlate with high proportions of euchromatin, as euchromatin is more structurally permissive than heterochromatin [1, 2]
1.Macarthur BD, Lemischka IR. Statistical mechanics of pluripotency. Cell. 2013;154(3):484–489
2.Rajapakse I, Groudine M, Mesbahi M. What can systems theory of networks offer to biology? PLoS Comput Biol. 2012;8(6):e1002543.

以下两句话出自文章: https://doi.org/10.1016/j.neo.2020.12.010
In the context of genome structure, the higher the entropy, the more conformations available to the system [46] . If the distant ends of a genomic region, e.g., a gene, interact to form a loop, there are fewer conformations available to the gene and thus the entropy of that genomic region is reduced.
46.Phillips, Rob, et al. "Physical biology of the cell." American Journal of Physics 78.11 (2010): 1230-1230.
and
We apply one such approach - a derivative of VNE - to measure local chromatin organization of individual gene regions [59]. Higher VNE values indicate that the number of conformations available to the gene and its immediate neighborhood are higher, indicating that chromatin is more accessible.
按照这个作者做的来看，VNE和基因的表达量是正相关的

The more disordered (and permissive) chromatin in the pgEpiSCs was also evident based on its high-entropy status.
然后引用了下图, 下图中的d图的图注是: The extent of disorder in chromatin structure (quantified by the Von Neumann Entropy (VNE))

https://doi.org/10.1038/s41422-021-00592-9; Fig. 5d

We found that Di-SG had higher entropy (Fig. 1C), suggestive of less compact chromatin structural organization in Di-SG.

https://doi.org/10.1016/j.jbc.2021.101559; Fig. 1c

使用MATLAB计算VNE的代码如下：

close all, clear %%% Close figures and reset variables
restoredefaultpath %%% Ensure no other folders on current path
addpath(genpath('D:\MATLAB-\toolbox\4DNvestigator')) %%% Add all 4DNvestigator folders and files to path

Folder_Result = 'E:\workspace\fenshuChr.2406\juicer'; %%% Output folder
Data_Loc2 = {'E:\workspace\fenshuChr.2406\juicer\Ebaileyi.30.hic'};
bpFrag = 'BP';
binSize = 1E5;
entropyExample(Data_Loc2, Folder_Result, 1, bpFrag, binSize)
#第三个参数 1 ，就是染色体号

#即
Folder_Result = 'E:\workspace\fenshuChr.2406\juicer'; %%% Output folder
Data_Loc2 = {'E:\workspace\fenshuChr.2406\juicer\Ebaileyi.30.hic'};
bpFrag = 'BP';
binSize = 1E5;
chrSelect = 1
entropyExample(Data_Loc2, Folder_Result, chrSelect, bpFrag, binSize)

https://github.com/HuiyangYu/PanDepth 基于sam bam cram算基因组（和基因集）的深度和覆盖度超级快高效的工具（低内存），超级大（几十G）的bam 也一两分钟的事。另外：默认内存至少是bamdeal 的1之10。速度也十分快。

李恒大牛新作｜compleasm：比BUSCO的更快、更准确评估工具
https://github.com/huangnengCSU/compleasm

Rather than reporting so much detail in the abstract, it might be better to make a more general statement like: "Deletions affecting introns and/or coding regions of numerous genes may have contributed to phenotypic differences between A. baiyi and other Ablax species"

Comparative Recombination Rates in the Rat, Mouse, and Human Genomes
10.1101/gr.1970304

遗传距离的系数转换，参考上述文献

awk '{print $1"\t"$4"\t"$4*0.000554779412}' Chr27.map | sort -Vk 1 | awk '{print "Chr"$1"\t"$2"\t"$3}' >  Chr27.genetic.map

SNP的pos * 0.000554779412
物理位置*0.000554779412

Phylogenomics-DensiTree绘制详细教程
所谓DensiTree，其实就是将多颗进化树的拓扑结构进行的叠加，以可视化进化树间的拓扑冲突(或基因树异质性)。绘制DensiTree绘制可以使用DensiTree软件(现在已经整合到BEAST2安装包中)，也可以使用R包phangorn进行。下面记录一下DensiTree的绘制过程。
https://mp.weixin.qq.com/s/PvxX02Pw_NPiV8aTpxL8TQ

Kingship四个级别的亲缘关系的具体阈值
0.0442 / 0.0884 / 0.177 / 0.354
这篇文章把大于0.0442也就是3rd degree relationship以上的个体都删除了

Fig. S3. https://doi.org/10.1073/pnas.1713288114

https://doi.org/10.1073/pnas.1713288114

https://coolpuppy.readthedocs.io/en/latest/walkthrough.html
Hi-C的pileup图的绘制

在不同的服务器之间传输文件
yum install rsynz
rsync -azv -P -e "ssh -p 20338" kuangzhr17@202.201.1.198:/home/kuangzhr17/test.fa /opt/synData

https://github.com/veg/hyphy-analyses/tree/master/AncestralSequences
HyPhy的祖先序列重建

保守loop的鉴定
https://github.com/adadiehl/mapLoopLoci
需要两个基因组的chain文件（query to ref），和相应的loop文件

loop文件格式是这样的
前三列是loop的左锚点，第四列到第六列是loop的右锚点，第七列是一个uniq的标识用于标注这个loop，第八列是read counts，第九列是p值。。实际运用下来，第七、八、九列都随意就行，无所谓

./mapLoopLoci.py query.loop target.loop query.to.target.chain > query.to.target.out
最后是以query.loop为基底，也就是判断query.loop中的loop，哪些是保守的，哪些是XX

提取cds和pep序列
gffread 35.gff -g 35.genomic.fa -x35.cds -y 35.pep

大鼠的遗传距离.jpg

awk '{print "'$i'""\t"$1}'  我知道怎么做了，就是多加一个单引号
awk "{if($3=="Chr10" && $1=="'$i'")print}" | wc -l

plink --vcf input.vcf --allow-extra-chr --double-id --indep-pairwise 50 10 0.1 --out ld
#这三个参数代表的意思分别是： 窗口大小，每一步移动窗口的距离，以及判定关联的r2阈值

#输出后缀.prune.in和.prune.out的两个文件
input_pruned.prune.in    #pruning后保留的互不相关的SNP
input_pruned.prune.out  #去除掉的SNP

awk 'BEGIN{ FS="_";OFS="\t"}{print $1,$2}' try.prune.in > keep_SNP.list

bgzip input.vcf
tabix -p vcf input.vcf.gz

bcftools view -R keep_SNP.list input.vcf.gz > ld_pruning.vcf

讲座杂记(一) --放开那个叫莱布尼兹的同学
个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯属闲...
讲座杂记(二)
个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯属闲...
个人杂记2017
迎来了这座陌生城市的第五个春天，和煦的春风寄来早春的信笺，我依旧一个人独自彷徨在这缤纷的花季。油菜花依然清新，山茶...
个人杂记2019.07
哲学是我的命与信仰。总是让我感到阴郁，不知道这样的生活能够持续多久呢？注定了我这一生得不到宁静，得不到归宿吗？ ...
个人杂记2018.5
坐了一回52路，百感交集，我不知道，不知道该怎么办，充了100元的公交卡，不知道该怎么继续。已经没有办法停止继续...
讲座杂记(五) -从达芬奇的秘密谈起
我这个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯...
讲座杂记(四) （上）
我这个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯...
讲座杂记(三) --从人性谈起
我这个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯...
讲座杂记(六)
我这个人读书太杂，以至于听讲座也很杂，杂着杂着就想写些东西，于是就有了这个讲座杂记，这杂记没什么系统的思想精神，纯...
杂记两则
杂记两则绝交昨天下午特殊情况，因工作和个人事务，加之...