导读
菌属丰度表一个常见的问题:不同菌科相同属名。此类菌属作为行名读进R会报错,因为R要求数据行名必须unique。一种很好的解决方式是用family_genus组合作为菌属名。来写一个一键处理的函数。
一、数据
Taxonomy = c("Prevotella", "Staphylococcus", "Ralstonia")
sample_1 = c(1, 2, 3)
sample_2 = c(4, 5, 6)
Tax_detail = c("k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Prevotellaceae;g__Prevotella;", "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;", "k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Oxalobacteraceae;g__Ralstonia;")
data = data.frame(Taxonomy, sample_1, sample_2, Tax_detail)
data
![](https://img.haomeiwen.com/i19404765/b5f262712d5a59c3.png)
二、获取family_genus
# 函数
handle = function(data, prefix="name")
{
# 删除注释不清
unknown = grep("Unspecified|unclassified|metagenome", data[, length(data[1,])])
data = data[-unknown,]
# 提取family genus
new_name = c()
for(i in 1:length(data[,1]))
{
input = as.character(data$Tax_detail[i])
new_name = c(new_name, paste(unlist(strsplit(input, split="__|;"))[c(10, 12)], collapse="_"))
}
# 保存mapping文件
# 删除首列,family_genusu作行名
data = data[, -c(1, length(data[1,]))]
rownames(data) = new_name
# 返回值
return(data)
}
三、处理和结果
data2 = handle(data, "data")
报错信息:Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
报错了,但是处理外部读取的数据是没问题的,把函数框架去掉一行行运行是OK的
![](https://img.haomeiwen.com/i19404765/499b653d5146c2b3.png)
网友评论