美文网首页
R语言读入表格报错

R语言读入表格报错

作者: CrimsonUMO | 来源:发表于2023-03-23 12:16 被阅读0次

师姐发我新的转录组数据让我处理,是一个103M的txt文件,我的小电脑跑得慢,就尝试直接用R读进来试一下。

exp <- data.frame(read.table("./2023年转录组/Expression_with_annotation-XHW78.txt"))

但是出现了如下的报错:

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<ff><fe><47>'
In addition: Warning messages:
1: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 1 appears to contain embedded nulls
2: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 2 appears to contain embedded nulls
3: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 3 appears to contain embedded nulls
4: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 4 appears to contain embedded nulls
5: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  embedded nul(s) found in input

我一开始没注意看报错,以为是因为有空值的原因,所以按照经验把sep、fill和header都填进去了试一下,发现还是报错。特别是这条报错非常可疑:

Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<ff><fe><47>'

那我就去搜了一下,有一个说法是跟编码相关。
read.table()函数里有2个编码可以选择:

encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see [Encoding](http://127.0.0.1:14213/help/library/utils/help/Encoding)): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’ and ‘Note’.

上面是帮助文档里的叙述,里面提到一个是Latin-1,另一个是UTF-8。那我就都试试。结果两个都不行。

继续搜发现有人提到可能是txt的头文件或者这个txt的编码不是UTF-8,那我就另存了一下,发现确实。默认另存的时候显示这个文档的编码是UTF-16。另存了UTF-8之后就能读进来了。

相关文章

网友评论

      本文标题:R语言读入表格报错

      本文链接:https://www.haomeiwen.com/subject/thjirdtx.html