HOMER安装和使用

作者: 笺牒九州的怪咖 | 来源:发表于2022-02-22 12:19 被阅读0次

Homer
homer安装和使用
HOMER安装和使用
homer的安装
2021-09-29-motif-homer分析
Motif 分析
Motif Discovery | DREME
Motif 分析(1) - HOMER 安装
ubuntu安装homer
Homer软件的介绍-Homer 找 DNA motif

1.HOMER的作用？

HOMER是一套基于 C++ 和 Perl 语言的用于 motif 查找和二代数据分析的工具，一般需要两个序列作为参数：

参考序列：hg19、mm10 等基因组序列、promoter 序列、自定义的 FASTA 序列
所要分析的序列：DNA 或 RNA 序列

HOMER 适用于在大规模数据中寻找 DNA 或 RNA 序列的 motif。

那什么是 motif 呢？

motif：反复出现的模式，即一种特征序列，比如 sequence motif, structure motif, network motif。它有或者可能有一定的生物学功能。

2.如何安装HOMER?

HOMER软件使用Perl和C++编写，可在UNIX系统流畅运行，在windows系统则需要先安装cygwin或者Unix虚拟系统。

本文主要介绍在linux/UNIX的安装，简单介绍在windows系统的安装。

有问题请参考HOMER官方安装过程。

2.1 For Linux/UNIX

先下载 configureHomer.pl 到目标文件夹（如 public/softwares/HOMER），然后使用cd切换到该文件夹并运行：

perl configureHomer.pl -install # 下载homer.package.zip 
vi ~/.bash_profile # 修改./bash_profile 
PATH=$PATH:/public/softwares/HOMER/homer/bin/ # 将homer软件中的/bin/目录添加到./bash_profile
:wq # 保存退出 
source ~/.bash_profile # 使修改内容马上生效

补充说明“.bash_profile”和“.bashrc”的区别：
1. ~/.bash_profile: The personal initialization file, executed for login shells
2. ~/.bashrc: The individual per-interactive-shell startup file

根据名字的不同，我们可以直观地将startup文件分为“profile”与“rc”两个系列，它们的功能都很类似，只是使用的场景不同。执行“profile”系列还是“rc”系列，取决于运行中的bash处于“交互”还是“登陆”。

（不过这个没弄明白也没有关系，不影响后续操作。）

2. 在命令行输入'R'，安装两个R包（DESeq2和EdgeR）：

> source("https://bioconductor.org/biocLite.R") 
> biocLite() 
> biocLite("DESeq2") 
> biocLite("edgeR") 
> q() # 退出R

3. 安装samtools，使用conda进行安装：

conda install samtools

2.2 For Windows

先从 http://www.cygwin.com/ 下载 cygwin 并安装。

注意：

在 homer/bin/ 去掉文件的 "*.exe" 后缀 (i.e. "homer.exe" to "homer")
PATH=/Users/chucknorris/homer/bin:${PATH}，格式和linux不同

其他的和linux下安装过程大同小异~

如果安装过程和说明的一样，排除多种可能但还是报错。恭喜你找到了一个bug！！！可以描述出错的具体内容给作者发邮件（cbenner@ucsd.edu）。

3. 如何使用HOMER寻找motif？

HOMER主要有三种功能：

1.findMotifs.pl

2.findMotifsGenome.pl

By default this will perform de novo motif discovery as well as check the enrichment of known motifs.

3.scanMotifGenomeWide.pl

具体使用方法：

3.1 寻找DNA序列的motif

HOMER最早是被开发用来寻找CHIP-Seq peaks数据中的motif。现在，它不仅可以被用来分析CHIP-Seq，还可用于分析基因组座位从而寻找motif。

用户只需要提供包含基因组坐标的文件，比如peak文件或BED文件。剩下的就不用操心啦~

分析peak文件中富集的motif，可以使用以下代码：

findMotifsGenome.pl <peak/BED file> <genome> <output directory> -size # [options]

代码示例：

findMotifsGenome.pl ERpeaks.txt hg18 ER_MotifOutput/ -size 200 -mask
# -mask 使用repeated-mask序列
# -size 设置motif长度

完整的输出结果可以查看，包括：

**homerMotifs.motifs<#> **: these are the output files from the de novo motif finding, separated by motif length, and represent separate runs of the algorithm.
homerMotifs.all.motifs : Simply the concatenated file composed of all the homerMotifs.motifs<#> files.
motifFindingParameters.txt : 记录执行findMotifsGenome.pl的命令
knownResults.txt : 记录motif的统计数据，text file(open in EXCEL).
seq.autonorm.tsv : autonormalization statistics for lower-order oligo normalization.
homerResults.html : *de novo *motif finding的格式化输出.

homerResults.html

参考：http://homer.ucsd.edu/homer/motif/index.html

3.2 寻找RNA序列的motif

和寻找DNA序列的motif区别在于：使用 findMotifs.pl和 findMotifsGenome.pl时，要加上 “-rna”参数，从而只寻找RNA+链的motif，并且匹配/显示U而不是T。

注意！HOMER尚未包含“RNA motif”列表，所以不支持“已知motif”的分析。如果使用FASTA文件格式，请在输入文件中使用T（DNA编码）。

代码示例1：

# 获取目标序列在人类mRNA上聚集的motif
findMotifs.pl mir1-downregulated.genes.txt human-mRNA MotifOutput/ -rna -len 8

结果：

image

代码示例2：

# 分析CLIP-Seq for RNA motifs 
findMotifsGenome.pl fox2.clip.bed hg17 MotifOutput -rna

结果（a UGCAUG FOX motif）：

image

3.3 获取已知Motif序列在全基因组上的分布情况

使用scanMotifGenomeWide.pl，代码如下：

scanMotifGenomeWide.pl <motif file> <genome> [options] 

# e.g. 小鼠mm10上已知motif的分布情况
scanMotifGenomeWide.pl pu1.motif mm10 -bed > pu1.sites.mm10.bed
# -bed : Output file will be in BED format - useful when you want to upload to the UCSC browser.

与MEME比较而言，个人觉得Homer比较顺手！
通过meme去来找motif，需要bed格式的peaks的坐标来获取fasta序列。
MEME，链接：http://meme-suite.org/

-------------------------------------------------------------------------------------------------------------------------------------------------------------I'm a line ! Thanks for your attention !----------------------------------------------------------------------------------------------------------------