基因表达谱的构建

作者: 小杜的生信筆記 | 来源:发表于2023-05-14 19:45 被阅读0次

#本月回顾#基因组分析方法教程汇总
【r<-差异分析】当使用limma时，它在比较什么
聚类在基因表达数据中的应用（R语言实现）
R计算mRNA和lncRNA之间的相关性+散点图
实践演示——WGCNA如何构建权重基因共表达网络
graphpad绘制基因表达散点图视频教程
基因或蛋白表达谱的时间动力学聚类分析（R包Mfuzz）
#TCGA系列#TCGA基因/miRNA表达谱数据整合
#TCGA系列#TCGA基因/miRNA表达谱数据整合(二)
转录组测序技术和结果解读（十二）——共表达网络

前言

大家好，我是小杜！最近，由于自己的事情比较多，公众号更新教程都是随机的。我也在前面的推文中说过，分享只是我学习中的一部分，自己也不是全职来做这块（PS：也做不下去），自己的推文仅仅只是记录自己学习过程。因此，只要自己有空闲的时间（PS:都是工作时间以外）才会来整理和分享相关的教程。毕竟做好本职工作是为“有能力”一直持续分享，是吧！

因此，针对大家的后台问题，大多数时间都是过很长时间才会回复，实在没有办法，独自一个人的精力有限，不可能全部顾及。但是，有那么一小批同学，在后台私信或留言实在是有点过分......，没法！我不可能会做到让每个人都满意，也不会这么做，因为自己能力有限。如果你觉得我分享的教程有用，你就持续关注；如果没用，那么就不关注就可以。

对于做公众号的博主，都是有很强的分享精神。但不是每个人都有这样无私精神！因此，请大家友好对待。对于，我们这些小的公众号而言，公众号中唯一有经济收入的就只是推广和有同学的赞赏（PS：因此，大家看到推广也不要太烦，这些推广都是基于这个公众号定位推的，大家需要可联系相关工作人员即可）。

物种基因表达谱

论文网址：

https://academic.oup.com/bioinformatics/article/33/15/2397/3096436?login=false

Github网址：

https://github.com/solgenomics/Tea/tree/master

这是今天找论文时，无意间看到的。但是，自己在很早以前就使用过这个网站。

http://tea.solgenomics.net/

基因表达谱网站的定位

对于我自己理解，这就是为了可视化每个基因在作物各个组织中的表达，方便我们做这块研究对某个基因的表达水平的评估。

点击进入Expression Viewer，可以看到如上界面内容。

共有4个数据集（基本都是果实发育至成熟阶段）

2018年发在NC的文章，我在以前上学时候组会上做过报告。2022年这篇，是我现在做的研究中查找过。中间绿色的部分是我还没找到的文章，也是今天一直在找。
在下面又包括了果实发育时期，Orange、Tiessues、Treatment,做的很全，你可以把你的做的相关的研究数据上传上去也可以，比如Tomato。
选择基因，在最上面也给输入基因的ID，BLAST或是Gene集。
可视化

基因表达水平

在组织中的可视化

Scatter Plot

热图

--
选择不同数据集，你获得结果也是不同的。

在干旱胁迫的数据中，有胁迫处理组和叶片的数据集。

是不是感觉很酷哦！！那么这种似的数据集要如何制作的呢！
我们可以看一下，作者相关的教程，也是分享在GitHub中，但是也需要有很强大的编程能力、绘图能力和生物信息学功底才可以完成。此外，这是团队合作才可以的，独自一个人我想很难吧！

题目：

The Tomato Expression Atlas

GitHub中的教程如何构建

网址：

https://github.com/solgenomics/Tea/tree/master

平台需求

Install Catalyst, Perl and R dependencies
This web tool was developed using the Perl framework Catalyst (http://www.catalystframework.org), so to run the application is necessary to install Perl, Catalyst and its dependencies.

Check this link in case of doubts installing Catalyst (http://www.catalystframework.org/#install).

To install Catalyst using cpanm, just execute: cpanm Catalyst::Devel

Also, if you are installing it in a new machine you maybe need to install cpanminus, gcc and make, and then some Perl dependencies like Catalyst, Lucy and Mason:

sudo aptitude install cpanminus
sudo aptitude install make
sudo aptitude install gcc
sudo aptitude install r-base
sudo aptitude install r-base-dev
sudo aptitude install postgresql
sudo aptitude install postgresql-server-dev-11    
cpanm -L ~/local-lib/ Catalyst::Devel
cpanm -L ~/local-lib/ Catalyst::Runtime
cpanm -L ~/local-lib/ Mason
cpanm -L ~/local-lib/ Statistics::R
cpanm -L ~/local-lib/ Catalyst::ScriptRunner
cpanm -L ~/local-lib/ Catalyst::Controller::REST
cpanm -L ~/local-lib/ Catalyst::View::HTML::Mason
cpanm -L ~/local-lib/ Lucy::Simple
cpanm -L ~/local-lib/ Array::Utils
cpanm -L ~/local-lib/ DBIx::Class
cpanm -L ~/local-lib/ Bio::Perl
cpanm -L ~/local-lib/ Bio::BLAST::Database
cpanm -L ~/local-lib/ DBD::Pg

If you are having trouble installing cpanm, there may be an issue with your system's dependencies. Visit (https://library.linode.com/linux-tools/utilities/cpanm) for help with installing dependencies.

In case local-lib is not in the path, you have to add the following line in the .bashrc file (for a local-lib in your home)

 export PERL5LIB=/home/username/local-lib/lib/perl5:$PERL5LIB

You might also need to add the next line to your .bashrc

export PERL5LIB=$PERL5LIB:/home/username/path_to_tea/Tea/

Do not forget to source .bashrc to be sure these changes take effect.

R v3 must be installed for the interactive heatmap. The R libraries 'd3heatmap', 'NOISeq' and 'htmlwidgets' should also be installed.

Clone Github repository
Go to the TEA repository at GitHub (https://github.com/solgenomics/Tea) and copy the link to clone this repository.

Go to your terminal, to the folder where you want to clone this repository and use the next command (using the link copied from the web):

git clone git@github.com:solgenomics/Tea.git

git clone https://github.com/solgenomics/Tea.git

You can run the local server to check Catalyst is running fine. If you are running it on a server, you should also check that the Apache or Nginx configuration is correct and the ports are open on the firewall.

Go to the folder Tea, created when cloned the repository and run the server to check if all the dependencies are installed.

cd Tea/
script/tea_server.pl -r -d --fork

If you got an error, you will probably will need to go back to step one and install some dependencies.

Configuration file

dbhost localhost
dbname my_db
dbuser web_usr
dbpass password

expression_indexes_path /home/user/index_files/expression
correlation_indexes_path /home/user/index_files/correlation
loci_and_description_index_path /home/user/index_files/description

#path to mason folder to overwrite default front-end
<View::Mason>
  add_comp_root /home/user/path_to_new_mason_dir
</View::Mason>

nt_blastdb_path /home/user/blastdbs/cdna_file.fasta
prot_blastdb_path /home/user/blastdbs/prots_file.fasta
tmp_path /home/user/tea_tmp_files

default_gene gene_name

Create database
Install PostgreSQL, create a database to store your project metadata and import the schema to the database:

On postgres terminal:

CREATE DATABASE my_db;

On Linux terminal create the database schema importing the file create_tea_schema.sql from import_project folder:

psql –U postgres –d my_db –h localhost –a –f create_tea_schema.sql

Use TEA_project_template.txt and TEA_project_template_example.txt` from import_project to create your project import file

# Please use one line per field and one file per project. Do not edit or remove any line starting with #

#organism
organism_species: Solanum lycopersicum
organism_variety: M82
organism_description: Tomato M82
# organism - end

#project
project_name: S. lycopersicum M82 Fruit Development
project_contact: Jocelyn Rose
project_description: Fruit development from anthsis to red ripe for whole fruit and for the cell types from the pericarp obtained by Laser Capture Microdissected (LCM)
expr_unit: RPM
index_dir_name: tomato_index
# project - end


# figure --- All info needed for a cluster of images (usually includes a stage and all its tissues). Copy this block as many times as you need (including as many tissue layer blocks as you need).
figure_name: 10DPA Total Pericarp
conditions: condition 1, condition 2
# write figure metadata

#stage layer
layer_name: 10DPA
layer_description: Ten days post anthesis
layer_type: stage
bg_color:
layer_image: slm82_fruit_10dpa_bg.png
image_width: 250
image_height: 500
cube_ordinal: 10
img_ordinal: 10
organ: fruit
# layer - end

#tissue layer
layer_name: Total_Pericarp
layer_description:
layer_type: tissue
bg_color:
layer_image: cassava_leaf.png
image_width: 250
image_height: 500
cube_ordinal: 100
img_ordinal: 100
organ: fruit
# layer - end

# figure - end

后面还有，自己到GitHub中看吧。

https://tea.solgenomics.net/网址，真的是个宝藏网址，内容很多，需要自己去发现。

寻求帮助：
如果你看到这里，那么如果你能获得或找到数据库中干旱胁迫的文章，请在后台留言。谢谢！！

往期文章：

1. 最全WGCNA教程（替换数据即可出全部结果与图形）

WGCNA分析 | 全流程分析代码 | 代码一

WGCNA分析 | 全流程分析代码 | 代码二

WGCNA分析 | 全流程代码分享 | 代码三

2. 精美图形绘制教程

精美图形绘制教程

小杜的生信筆記，主要发表或收录生物信息学的教程，以及基于R的分析和可视化（包括数据分析，图形绘制等）；分享感兴趣的文献和学习资料!!

#本月回顾#基因组分析方法教程汇总
TCGA数据分析: TCGA基因/miRNA表达谱及临床数据下载TCGA基因/miRNA表达谱数据整合TCGA基因...
【r<-差异分析】当使用limma时，它在比较什么
差异分析流程示例与资料基因芯片的差异表达分析主要有构建基因表达矩阵、构建实验设计矩阵、构建对比模型（对比矩阵）...
聚类在基因表达数据中的应用（R语言实现）
前言基因之间存在共表达共表达的基因可能具有相似的生物功能从具有相似的表达谱的基因去推测其功能利用不同基因表...
R计算mRNA和lncRNA之间的相关性+散点图
我们在做表达谱数据分析的时候，经常需要检测基因两两之间表达的相关性。特别是在构建ceRNA网络的时候，我们需要去检...
实践演示——WGCNA如何构建权重基因共表达网络
百迈客基因实践演示——WGCNA如何构建权重基因共表达网络
graphpad绘制基因表达散点图视频教程
graphpad绘制基因表达散点图视频教程在基因表达谱数据分析中，经常会使用到散点图比较组间基因表达量的差异，这...
基因或蛋白表达谱的时间动力学聚类分析（R包Mfuzz）
在研究基因表达谱或者蛋白表达谱时，经常会涉及到对时间序列的分析。例如，不同的基因或蛋白表达水平随时间表现出怎样的动...
#TCGA系列#TCGA基因/miRNA表达谱数据整合
上期(#TCGA系列#TCGA基因/miRNA表达谱及临床数据下载)介绍了使用TCGA 的API下载肿瘤表达谱及...
#TCGA系列#TCGA基因/miRNA表达谱数据整合(二)
上期(#TCGA系列#TCGA基因/miRNA表达谱数据整合)使用shell 对多样本表达谱文件整合,实现方式是...
转录组测序技术和结果解读（十二）——共表达网络
共表达网络基因共表达分析可以揭示转录调控的机制，选定一组基因，通过分析在不同样品中基因间表达量的相关性，构建基因...