Play with data：批量读取合并数据

作者: Bio_Infor | 来源:发表于2022-11-20 20:26 被阅读0次

Play with data：批量读取合并数据
R语言的文件读取小技能
批量文件读取合并
002 Core Data系列
SAS中的数据清洗
Python操作MySQL
python基础-17-数据分析python——pandas——
[tf]nlp任务中使用 tf.data
yaml实现参数化
14 Pandas实现数据的合并concat

Bio_Infor回归帖

这是一篇十分简短的帖子，但我仍然觉得它很有用，直到你需要它时，你可能会有和我一样的感受。

背景介绍

现在我们有1000个文件，这些文件的列信息类型一样，简单来说就是每一列所蕴含的信息是一样的，这样我们就能对其进行按列合并，当然这里只是简单举了个例子，你可以有更复杂的情形，而不是简单的批量读取他们并合并。

解决方案

青铜选手

青铜选手的解决方案不做示例也都知道，挨个读取，然后再rbind()，当然你不觉得烦的话，可以这么干，没人会拦着你。

黄金选手

黄金选手有着他们独特的解决方法，比如他们可以结合使用shell或perl和R，如果使用shell他们大概率会这么做：

#in shell
cat *.txt > combine.txt
#in R
data <- read.table(file = 'combine.txt', ...)

当然也有人会用perl来解决：

#perl script
#!/usr/bin/perl

use 5.010;
use strict;
use warnings;
use autodie;
use utf8;

#this script can be used to combine several files;
#the format of use:
#   combine.pl [files] [dest.files]
if (! defined $ARGV[0] || $ARGV[0] eq "--help" || $ARGV[0] eq "-h"){
    die "The usage of this script:\n\t$0 [input files] [dest files]\n";
}

my $out = pop @ARGV;

open my $out_fh, '>>', $out;
while (<>){
    print { $out_fh } $_;
}

close $out_fh;

然后调用这个脚本：

combine.pl *.txt combine.txt

再用R读取就可以了。

铂金选手

铂金选手会用R来解决所有问题，诀窍不过在于用活了apply家族函数和Reduce()函数：

files <- list.files(path = './', pattern = 'txt$')
data <- lapply(files, FUN = function(file){
  read.table(file = file, ...)
})

然后再用Reduce()函数合并：

combine <- Reduce(function(dtf1, dtf2)rbind(dtf1, dtf2), data)

除了使用Reduce()基础函数外，还有purrr包中的reduce()函数是完成同样的工作的。

combine <- data %>% purrr::reduce(rbind)

网友评论

本文标题：Play with data：批量读取合并数据

本文链接：https://www.haomeiwen.com/subject/iytixdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Play with data：批量读取合并数据

Bio_Infor回归帖

背景介绍

解决方案

青铜选手

黄金选手

铂金选手

相关文章