R语言读入表格报错

作者: CrimsonUMO | 来源:发表于2023-03-23 12:16 被阅读0次

R语言笔记Day1（四文件读取）
Part 4:文件读写
四、文件读写
R语言读入table和csv格式的表格
R读表格开头是数字时，如何不加前缀
R语言报错之R code executive error（笔记
R语言-05数据框创建，以及按条件取数据框数据
R中读入txt文件报错
R语言报错
R读取Excel、R与本机文件

师姐发我新的转录组数据让我处理，是一个103M的txt文件，我的小电脑跑得慢，就尝试直接用R读进来试一下。

exp <- data.frame(read.table("./2023年转录组/Expression_with_annotation-XHW78.txt"))

但是出现了如下的报错：

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<ff><fe><47>'
In addition: Warning messages:
1: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 1 appears to contain embedded nulls
2: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 2 appears to contain embedded nulls
3: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 3 appears to contain embedded nulls
4: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 4 appears to contain embedded nulls
5: In read.table("./2023年转录组/Expression_with_annotation.txt") :
  line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  embedded nul(s) found in input

我一开始没注意看报错，以为是因为有空值的原因，所以按照经验把sep、fill和header都填进去了试一下，发现还是报错。特别是这条报错非常可疑：

Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<ff><fe><47>'

那我就去搜了一下，有一个说法是跟编码相关。
read.table()函数里有2个编码可以选择：

encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see [Encoding](http://127.0.0.1:14213/help/library/utils/help/Encoding)): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’ and ‘Note’.

上面是帮助文档里的叙述，里面提到一个是Latin-1，另一个是UTF-8。那我就都试试。结果两个都不行。

继续搜发现有人提到可能是txt的头文件或者这个txt的编码不是UTF-8，那我就另存了一下，发现确实。默认另存的时候显示这个文档的编码是UTF-16。另存了UTF-8之后就能读进来了。