[Python] 自动化办公从Excel提取信息到word表格

作者: 半为花间酒 | 来源:发表于2020-04-22 22:57 被阅读0次

[Python] 自动化办公从Excel提取信息到word表格
Python办公自动化｜从Excel到Word
[Python] 自动化办公从word表格中提取信息到Exce
Python 学习01：Word to Excel
office办公软件基础知识电脑自学|word excel pp
传统办公模式的“助推器”，搭建OA办公系统，原来就这么简单！
每天学①点一一Excel表格 → Word表格，转换不变形。
将多个Word文档中指定位置的值批量提取到Excel表格中
用Python自动化生成爱豆日历
Python操作EXCEL电子表格

转载请注明：陈熹 chenx6542@foxmail.com （简书号：半为花间酒）
若公众号内转载请联系公众号：早起Python

这篇文章能学到的主要内容：

openpyxl读取Excel获取内容

docx读写word文件

能学到的小技巧：

os获取桌面路径

win32com批量doc转换为docx（仅windows用户）

（文末附原始数据文件下载链接）

今天早起python公众号的读者提出了一个需求：
（由于涉及文件私密所以具体内容已做修改）

每一列的数据需要按照一定规则填到一个word模板里，规则和模板大致如下：

这些是需要填写的部分，整体的模板要复杂一些：

还有一个需求：最终输出的word文件命名如下：
C列的数据去重然后用&链接 + G2 + V列数据求和 + P列的数据去重后用&连接 + 当天日期(如：2020年04月22日) + 验货报告

从需求和文件格式上看，这次文件的读写解析任务较复杂，码代码和思考时间会较久，因此需要想清楚一个问题：
这次需要完成的任务是否工作量很多，或者以后长期需要进行，用python可以解放双手？

如果不是，实际上手动就可以完成，失去了自动化办公的意义

ok接下来我们正式码代码

1. 解析Excel的数据

将原始数据解压缩后文件夹放在桌面即可
当然如果你想放其他地方也可以，就指名绝对路径

from openpyxl import load_workbook
import os

# 获取桌面的路径
def GetDesktopPath():
    return os.path.join(os.path.expanduser("~"), 'Desktop')

path = GetDesktopPath() + '/资料/' # 形成文件夹的路径便后续重复使用

workbook = load_workbook(filename=path + '数据.xlsx')
sheet = workbook.active # 获取当前页

# 可以用代码获取数据范围，如果要批处理循环迭代也方便
# 获取有数据范围
print(sheet.dimensions)
# A1:W10

利用openpyxl读取单元格有以下几种用法：

cells = sheet['A1:A4']  # 返回A1-A4的4个单元格
cells = sheet['A'] # 获取A列
cells = sheet['A:C'] # 获取A-C列
cells = sheet[5] # 获取第5行

# 注意如果是上述用cells获取返回的是嵌套元祖
for cell in cells:
    print(cell[0].value) # 遍历cells依然需要取出元祖中元素才可以获取值

# 获取一个范围的所有cell
# 也可以用iter_col返回列
for row in sheet.iter_rows(min_row=1, max_row=3,
                           min_col=2, max_col=4):
    for cell in row:
        print(cell.value)

明白了原理我们就可以解析获取Excel中的数据了

# SQE
SQE = sheet['Q2'].value

# 供应商&制造商
supplier = sheet['G2'].value

# 采购单号
C2_10 = sheet['C2:C10'] # 返回cell.tuple对象
# 利用列表推导式后面同理
vC2_10 = [str(cell[0].value) for cell in C2_10]
# 用set简易去重后用,连接，填word表用
order_num = ','.join(set(vC2_10))
# 用set简易去重后用&连接，word文件名命名使用
order_num_title = '&'.join(set(vC2_10))

# 产品型号
T2_10 = sheet['T2:T10']
vT2_10 = [str(cell[0].value) for cell in T2_10]
ptype = ','.join(set(vT2_10))

# 产品描述
P2_10 = sheet['P2:P10']
vP2_10 = [str(cell[0].value) for cell in P2_10]
info = ','.join(set(vP2_10))
info_title = '&'.join(set(vP2_10))

# 日期
# 用datetime库获取今日时间以及相应格式化
import datetime
today = datetime.datetime.today()
time = today.strftime('%Y年%m月%d日')

# 验货数量
V2_10 = sheet['V2:V10']
vV2_10 = [int(cell[0].value) for cell in V2_10]
total_num = sum(vV2_10) # 计算总数量

# 验货箱数
W2_10 = sheet['W2:W10']
vW2_10 = [int(cell[0].value) for cell in W2_10]
box_num = sum(vW2_10)


# 生成最终需要的word文件名
title = f'{order_num_title}-{supplier}-{total_num}-{info_title}-{time}-验货报告'

print(title)

Excel的部分就结束了，接下来进行word的填表啦

这里我们默认读取的word是.docx格式的，实际上读者的需求是.doc格式文件

这里如果是windows用户可以用如下代码批量转化doc，前提是安装好win32com

# pip install pypiwin32
from win32com import client

docx_path = path + '模板.docx'

# doc转docx的函数
def doc2docx(doc_path,docx_path):
    word = client.Dispatch("Word.Application")
    doc = word.Documents.Open(doc_path)
    doc.SaveAs(docx_path, 16)
    doc.Close()
    word.Quit()
    print('\n doc文件已转换为docx \n')

if not os.path.exists(docx_path):
    doc2docx(docx_path[:-1], docx_path)

Mac暂时没有好的解决策略，如果有思路欢迎交流

有docx格式文件后我们继续操作


docx_path = path + '模板.docx'

from docx import Document

# 实例化
document = Document(docx_path)

# 读取word中的所有表格
tables = document.tables
# print(len(tables))
# 15

确定好每个表格数后即可进行相应的填报操作

table的用法和openpyxl中非常类似，注意索引和原生python一样都是从0开始

tables[0].cell(1, 1).text = SQE

tables[1].cell(1, 1).text = supplier
tables[1].cell(2, 1).text = supplier
tables[1].cell(3, 1).text = ptype
tables[1].cell(4, 1).text = info
tables[1].cell(5, 1).text = order_num
tables[1].cell(7, 1).text = time

for i in range(2, 11):
    tables[6].cell(i, 0).text = str(sheet[f'T{i}'].value)
    tables[6].cell(i, 1).text = str(sheet[f'P{i}'].value)
    tables[6].cell(i, 2).text = str(sheet[f'C{i}'].value)
    tables[6].cell(i, 4).text = str(sheet[f'V{i}'].value)
    tables[6].cell(i, 5).text = str(sheet[f'V{i}'].value)
    tables[6].cell(i, 6).text = '0'
    tables[6].cell(i, 7).text = str(sheet[f'W{i}'].value)
    tables[6].cell(i, 8).text = '0'

tables[6].cell(12, 4).text = str(total_num)
tables[6].cell(12, 5).text = str(total_num)
tables[6].cell(12, 7).text = str(box_num)

这里有两个细节：

word写入的数据需是字符串，所以从Excel获取的数据需要用str格式化
这个也是最耗费精力和时间的，表格可能存在合并等其他情况，因此你看到的行数和列数可能不是真实的，需要用代码不断测试。上述代码中跳过了第4列，试一试为什么

for i in range(2, 11):
    tables[13].cell(i - 1, 0).text = str(sheet[f'T{i}'].value)
    tables[13].cell(i - 1, 1).text = str(sheet[f'U{i}'].value)
    tables[13].cell(i - 1, 2).text = str(sheet[f'U{i}'].value)
    tables[13].cell(i - 1, 3).text = str(sheet[f'U{i}'].value)

需求大致就完成了，记得保存

document.save(path + f'{title}.docx')
print('\n文件已生成')

最后附上完整代码

from openpyxl import load_workbook
from docx import Document
import datetime
# pip install pypiwin32
# from win32com import client
import os


# 获取桌面的路径
def GetDesktopPath():
    return os.path.join(os.path.expanduser("~"), 'Desktop')

path = GetDesktopPath() + '/资料/' # 形成文件夹的路径便后续重复使用

workbook = load_workbook(filename=path + '数据.xlsx')
sheet = workbook.active # 获取当前页

# 获取有数据范围
# print(sheet.dimensions)
# A1:W10

# SQE
SQE = sheet['Q2'].value

# 供应商&制造商
supplier = sheet['G2'].value

# 采购单号
C2_10 = sheet['C2:C10'] # 返回cell.tuple对象
vC2_10 = [str(cell[0].value) for cell in C2_10]
order_num = ','.join(set(vC2_10))
order_num_title = '&'.join(set(vC2_10))

# 产品型号
T2_10 = sheet['T2:T10']
vT2_10 = [str(cell[0].value) for cell in T2_10]
ptype = ','.join(set(vT2_10))

# 产品描述
P2_10 = sheet['P2:P10']
vP2_10 = [str(cell[0].value) for cell in P2_10]
info = ','.join(set(vP2_10))
info_title = '&'.join(set(vP2_10))

# 日期
today = datetime.datetime.today()
time = today.strftime('%Y年%m月%d日')

# 验货数量
V2_10 = sheet['V2:V10']
vV2_10 = [int(cell[0].value) for cell in V2_10]
total_num = sum(vV2_10) # 计算总数量

# 验货箱数
W2_10 = sheet['W2:W10']
vW2_10 = [int(cell[0].value) for cell in W2_10]
box_num = sum(vW2_10)

title = f'{order_num_title}-{supplier}-{total_num}-{info_title}-{time}-验货报告'

print(title)

doc_path = path + '模板.docx'
docx_path = doc_path + 'x'

# doc转docx的函数
# def doc2docx(doc_path,docx_path):
#     word = client.Dispatch("Word.Application")
#     doc = word.Documents.Open(doc_path)
#     doc.SaveAs(docx_path, 16)
#     doc.Close()
#     word.Quit()
#     print('\n doc文件已转换为docx \n')

# if not os.path.exists(docx_path):
#     doc2docx(doc_path, docx_path)

document = Document(docx_path)

# 读取word中的所有表格
tables = document.tables
# print(len(tables))
# 15

# 开始填表
tables[0].cell(1, 1).text = SQE

tables[1].cell(1, 1).text = supplier
tables[1].cell(2, 1).text = supplier
tables[1].cell(3, 1).text = ptype
tables[1].cell(4, 1).text = info
tables[1].cell(5, 1).text = order_num
tables[1].cell(7, 1).text = time

for i in range(2, 11):
    tables[6].cell(i, 0).text = str(sheet[f'T{i}'].value)
    tables[6].cell(i, 1).text = str(sheet[f'P{i}'].value)
    tables[6].cell(i, 2).text = str(sheet[f'C{i}'].value)
    tables[6].cell(i, 4).text = str(sheet[f'V{i}'].value)
    tables[6].cell(i, 5).text = str(sheet[f'V{i}'].value)
    tables[6].cell(i, 6).text = '0'
    tables[6].cell(i, 7).text = str(sheet[f'W{i}'].value)
    tables[6].cell(i, 8).text = '0'

tables[6].cell(12, 4).text = str(total_num)
tables[6].cell(12, 5).text = str(total_num)
tables[6].cell(12, 7).text = str(box_num)

for i in range(2, 11):
    tables[13].cell(i - 1, 0).text = str(sheet[f'T{i}'].value)
    tables[13].cell(i - 1, 1).text = str(sheet[f'U{i}'].value)
    tables[13].cell(i - 1, 2).text = str(sheet[f'U{i}'].value)
    tables[13].cell(i - 1, 3).text = str(sheet[f'U{i}'].value)

document.save(path + f'{title}.docx')
print('文件已生成')

写在最后

如果有感兴趣的自动化办公方向，或者手上有具体的案例想利用python解决

欢迎与我交流，或者直接在公众号早起python留言

我们会选取有意思的例子无偿解决并发布教程分享经验让更多人获益

如果要提供案例需要说清楚需求，以及提供处理过的原始数据

我们发布教程前会对数据进行无害化处理的哈哈哈哈保护隐私

原数据下载：
https://pan.baidu.com/s/1YFZPT7KViB5O-oQe4y_6HQ
提取码：ym7p

[Python] 自动化办公从Excel提取信息到word表格
转载请注明：陈熹 chenx6542@foxmail.com （简书号：半为花间酒）若公众号内转载请联系公众号：早...
Python办公自动化｜从Excel到Word
Python办公自动化｜从Excel到Word 转自：Python数据科学前言在前几天的文章中我们讲解了如何从...
[Python] 自动化办公从word表格中提取信息到Exce
转载请注明：陈熹 chenx6542@foxmail.com （简书号：半为花间酒）若公众号内转载请联系公众号：早...
Python 学习01：Word to Excel
将 Word 文档中的表格细栏，提取一些栏目的信息到 Excel 表格中。将其中的日期格式处理成标准格式。 Wo...
office办公软件基础知识电脑自学|word excel pp
office办公软件基础知识电脑自学|word excel ppt从入门到精通 wps教程表格制作函数入门办公软件...
传统办公模式的“助推器”，搭建OA办公系统，原来就这么简单！
从Excel到协同型办公，从知识性自动化到智能型办公自动化，在海量信息面前，我们需要迅速找到有价值的信息，先于他人...
每天学①点一一Excel表格 → Word表格，转换不变形。
二、Excel表格 → Word表格，转换不变形。我有1个Excel表格，但要插入到Word文档中做汇报材料，怎...
将多个Word文档中指定位置的值批量提取到Excel表格中
将多个Word文档中指定位置的值批量提取到Excel表格中，方便查看。 Word精灵的当前功能适用于要提取的目标都...
用Python自动化生成爱豆日历
使用的模块为openpyxl，一个能读取和修改Excel电子表格的Python模块。实现自动化处理表格信息，摆脱...
Python操作EXCEL电子表格
Python操作EXCEL电子表格用到的是openpyxl库，从《Python编程快速上手-让繁琐工作自动化》这本...