Python-145 LPSN读取科内属的信息table 202

作者: RashidinAbdu | 来源:发表于2024-02-24 11:25 被阅读0次

爬虫 - python + selenium + webdriv
HDFS架构
R语言读写excel文件2021.2.24
LPSN | 原核生物标准命名列表
二、class = "table" 、float和flex的多列
数据获取_文本数据获取和存储
Day20操作系统权限知识
python数据处理——pandas的基本使用（一）
2018-02-05
野花芬芳（166）

主要是在文章撰写和统计中可能需要读取必要的物种等信息，所以撰写了该脚本，用于读取网页上的table：
事先需要：
1、安装requests,BeautifulSoup4,pandas, 还有可能需要更新pip；
2、复制-黏贴网址即可；

import requests
from bs4 import BeautifulSoup
import pandas as pd

# 指定要爬取的网站链接
url = 'https://lpsn.dsmz.de/family/clostridiaceae'

# 发起网络请求
response = requests.get(url)

if response.status_code == 200:
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')

    # 找到所有表格元素
    tables = soup.find_all('table')

    if tables:
        for i, table in enumerate(tables):
            # 使用pandas的read_html函数读取网页上的表格数据
            df = pd.read_html(str(table))[0]

            # 将表格数据保存为Excel文件
            file_name = f'table_{i+1}.xlsx'
            df.to_excel(file_name, index=False)
            print(f"表格数据已保存为 {file_name}")
    else:
        print("未找到表格元素")
else:
    print("无法访问网页")