分割
直接用open函数就好(或pandas),将大文本分割为小文本代码如下:
with open("large_file.txt", "r") as input_file:
line_num = 0
for line in input_file:
if line_num % 10000 == 0:
output_file = open("output_file_{}.txt".format(line_num), "w")
output_file.write(line)
line_num += 1
if line_num % 10000 == 0:
output_file.close()
output_file.close()
将名为“large_file.txt”的大型文本文件拆分为每个 10,000 行的较小文件。每个较小的文件都被命名为“output_file_xxx.txt”,其中“xxx”是文件的起始行号。
合并
将多个文本文件合并为一个文件代码:
import glob
# Get a list of all the files to be merged
file_list = glob.glob("*.txt")
# Open the output file for writing
with open("merged_file.txt", "w") as output_file:
# Iterate through each input file and write its contents to the output file
for file_name in file_list:
with open(file_name, "r") as input_file:
for line in input_file:
output_file.write(line)
使用glob模块获取当前目录中扩展名为“.txt”的所有文本文件列表。
然后逐行合并到“merged_file.txt”。
总的来说python真的比C++简单,至少写码思路简单易懂多了,效率还没测试,不是很清楚。
若对你有帮助,点个赞就很nice
网友评论