美文网首页
Python 批量下载 网站资源pptx

Python 批量下载 网站资源pptx

作者: 火卫控 | 来源:发表于2023-10-15 13:55 被阅读0次

    Python 批量下载 网站资源pptx
    sci常用矢量图 模型 pptx文件

    下载文件如下:

    
    ai_qiangyun@DESKTOP-727JVLV:/mnt/d/Coding/python_gzlab_docu/pachong_gz_vs/scipptx$ tree -a
    .
    ├── Animals.pptx
    ├── Arteries_atherothrombosis.pptx
    ├── Arteries_pathophysiology.pptx
    ├── Arteries_physiology.pptx
    ├── Bacteriology_virology.pptx
    ├── Blood_immunology.pptx
    ├── Bone_fractures.pptx
    ├── Bone_structure.pptx
    ├── Bones.pptx
    ├── Cell_membrane.pptx
    ├── Chemistry.pptx
    ├── Dermatology.pptx
    ├── Diabetes.pptx
    ├── Dietetics.pptx
    ├── Digestive_system.pptx
    ├── Drugs.pptx
    ├── ENT.pptx
    ├── Embryology.pptx
    ├── Endocrinology.pptx
    ├── General-items.pptx
    ├── Genetics.pptx
    ├── Heart_pathophysiology.pptx
    ├── Heart_physiology.pptx
    ├── Intracellular_components.pptx
    ├── Lab_apparatus.pptx
    ├── Lipids.pptx
    ├── Lymphatic_system.pptx
    ├── Medical_acts.pptx
    ├── Medical_equipment.pptx
    ├── Microbiology_cellculture.pptx
    ├── Muscles.pptx
    ├── Nervous_system.pptx
    ├── Neural_cells.pptx
    ├── Nucleic_acids.pptx
    ├── Oncology.pptx
    ├── Ophthalmology.pptx
    ├── Paraclinical_exams.pptx
    ├── Parasitology.pptx
    ├── People.pptx
    ├── Receptors_channels.pptx
    ├── Reproduction.pptx
    ├── Respiratory_system.pptx
    ├── Risk_Factors.pptx
    ├── Scientific_graphs.pptx
    ├── Tissues.pptx
    ├── Urinary_system.pptx
    ├── Veins.pptx
    ├── World_maps.pptx
    ├── sci2ppt.py
    ├── scippt.html
    └── scipptx.py
    
    0 directories, 51 files
    

    源代码如下:

    # from bs4 import BeautifulSoup
    import requests
    import re
    import time
    import random
    import os
    from bs4 import BeautifulSoup
    
    # 以该脚本所在目录为工作路径
    homePath = os.path.dirname(os.path.abspath(__file__))
    os.chdir(homePath)
    
    
    with open("./scippt.html") as f:
        html = f.read()
    # print(html)
    
    soup = BeautifulSoup(html, 'html.parser')
    # 获取所有链接中含有.pptx 的标签,包括a 标签
    links = soup.find_all(href=re.compile(".pptx"))
    print(links)
    
    for link in links:
        href = link.get('href')
        print(href)
    
        # 下载
        headers ={
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
        }
    
        # rsplit从后切一段出来,且选择最后一段。即从url切一段作为文件名
        name = href.rsplit("/",1)[1]
        pptname = name
        print(pptname)
        # 由于链接url没有主站路径,需要再获取的url前面添加上"https://smart.servier.com/"
        r = requests.get(("https://smart.servier.com/"+ href),headers=headers)
    
        # 保存在当前路径
        with open(pptname ,mode = "wb") as f:
            f.write(r.content) #图片内容写入文件
    
        # x = random.randint(1, 4)  # 随机一个大于等于1且小于等于5的整数
        # time.sleep(x) 
    
    

    运行结果如下:

    (base) D:\Coding\python_gzlab_docu>D:/ruanjian/labsoft/anaconda3/python.exe d:/Coding/python_gzlab_docu/pachong_gz_vs/scipptx/sci2ppt.py
    [<a href="/wp-content/uploads/2016/10/Cell_membrane.pptx" style="font-size: 16px;">Cell membrane</a>, <a href="/wp-content/uploads/2016/10/Receptors_channels.pptx" style="font-size: 16px;">Receptors and Channels</a>, <a href="/wp-content/uploads/2016/10/Intracellular_components.pptx" style="font-size: 16px;">Intracellular components</a>, <a href="/wp-content/uploads/2016/10/Nucleic_acids.pptx" style="font-size: 16px;">Nucleic acids</a>, <a href="/wp-content/uploads/2016/10/Genetics.pptx" style="font-size: 16px;">Genetics</a>, <a href="/wp-content/uploads/2016/10/Tissues.pptx" style="font-size: 16px;">Tissues</a>, <a href="/wp-content/uploads/2016/10/Oncology.pptx" style="font-size: 16px;">Oncology</a>, <a href="/wp-content/uploads/2016/10/Heart_physiology.pptx" style="font-size: 16px;">Heart – Physiology</a>, <a href="/wp-content/uploads/2016/10/Heart_pathophysiology.pptx" style="font-size: 16px;">Heart – Pathophysiology</a>, <a href="/wp-content/uploads/2016/10/Blood_immunology.pptx" style="font-size: 16px;">Blood and Immunology</a>, <a href="/wp-content/uploads/2016/10/Arteries_physiology.pptx" style="font-size: 16px;">Arteries – Physiology</a>, <a href="/wp-content/uploads/2016/10/Arteries_atherothrombosis.pptx" style="font-size: 16px;">Arteries – Atherothrombosis</a>, <a href="/wp-content/uploads/2016/10/Arteries_pathophysiology.pptx" style="font-size: 16px;">Arteries – Pathophysiology</a>, <a href="/wp-content/uploads/2016/10/Veins.pptx" style="font-size: 16px;">Veins</a>, <a href="/wp-content/uploads/2016/10/Lymphatic_system.pptx" style="font-size: 16px;">Lymphatic system</a>, <a href="/wp-content/uploads/2016/10/Urinary_system.pptx" style="font-size: 16px;">Urinary system</a>, <a href="/wp-content/uploads/2016/10/Reproduction.pptx" style="font-size: 16px;">Reproduction</a>, <a href="/wp-content/uploads/2016/10/Embryology.pptx" style="font-size: 16px;">Embryology</a>, <a href="/wp-content/uploads/2016/10/Endocrinology.pptx" style="font-size: 16px;">Endocrinology</a>, <a href="/wp-content/uploads/2016/10/Diabetes.pptx" style="font-size: 16px;">Diabetes</a>, <a href="/wp-content/uploads/2016/10/Nervous_system.pptx" style="font-size: 16px;">Nervous system</a>, <a href="/wp-content/uploads/2016/10/Neural_cells.pptx" style="font-size: 16px;">Neural cells</a>, <a href="/wp-content/uploads/2016/10/Bones.pptx" style="font-size: 16px;">Skeletons and Bones</a>, 
    <a href="/wp-content/uploads/2016/10/Bone_structure.pptx" style="font-size: 16px;">Bone structure</a>, <a href="/wp-content/uploads/2016/10/Bone_fractures.pptx" style="font-size: 16px;">Fractures</a>, <a href="/wp-content/uploads/2016/10/Bacteriology_virology.pptx" style="font-size: 16px;">Bacteriology and virology</a>, <a href="/wp-content/uploads/2016/10/Parasitology.pptx" style="font-size: 16px;">Parasitology</a>, <a href="/wp-content/uploads/2016/10/Digestive_system.pptx" style="font-size: 16px;">Digestive system</a>, <a href="/wp-content/uploads/2016/10/Respiratory_system.pptx" style="font-size: 16px;">Respiratory system</a>, <a href="/wp-content/uploads/2016/10/ENT.pptx" style="font-size: 16px;">ENT</a>, 
    <a href="/wp-content/uploads/2016/10/Muscles.pptx" style="font-size: 16px;">Muscles</a>, <a href="/wp-content/uploads/2016/10/Ophthalmology.pptx" style="font-size: 16px;">Ophthalmology</a>, <a href="/wp-content/uploads/2016/10/Dermatology.pptx" style="font-size: 16px;">Dermatology</a>, <a href="/wp-content/uploads/2016/10/Risk_Factors.pptx" style="font-size: 16px;">Risk Factors</a>, <a href="/wp-content/uploads/2016/10/Lipids.pptx" style="font-size: 16px;">Lipids</a>, <a href="/wp-content/uploads/2016/10/Dietetics.pptx" style="font-size: 16px;">Dietetics</a>, <a href="/wp-content/uploads/2016/10/Medical_equipment.pptx" style="font-size: 16px;">Medical equipment</a>, <a href="/wp-content/uploads/2016/10/Medical_acts.pptx" style="font-size: 16px;">Medical acts</a>, <a href="/wp-content/uploads/2016/10/Paraclinical_exams.pptx" style="font-size: 16px;">Paraclinical Exams</a>, <a href="/wp-content/uploads/2016/10/Drugs.pptx" style="font-size: 16px;">Drugs</a>, <a href="/wp-content/uploads/2016/10/Microbiology_cellculture.pptx" style="font-size: 16px;">Cell culture and 
    microbiology</a>, <a href="/wp-content/uploads/2016/10/Chemistry.pptx" style="font-size: 16px;">Chemistry</a>, <a href="/wp-content/uploads/2016/10/Lab_apparatus.pptx" style="font-size: 16px;">Lab apparatus</a>, <a href="/wp-content/uploads/2016/10/People.pptx" style="font-size: 16px;">People</a>, <a href="/wp-content/uploads/2016/10/World_maps.pptx" style="font-size: 16px;">World maps</a>, <a href="/wp-content/uploads/2016/10/Animals.pptx" style="font-size: 16px;">Animals</a>, <a href="/wp-content/uploads/2016/10/Scientific_graphs.pptx" style="font-size: 16px;">Scientific graphs</a>, <a href="/wp-content/uploads/2016/10/General-items.pptx" style="font-size: 16px;">General Items</a>]
    /wp-content/uploads/2016/10/Cell_membrane.pptx
    Cell_membrane.pptx
    /wp-content/uploads/2016/10/Receptors_channels.pptx
    Receptors_channels.pptx
    /wp-content/uploads/2016/10/Intracellular_components.pptx
    Intracellular_components.pptx
    /wp-content/uploads/2016/10/Nucleic_acids.pptx
    Nucleic_acids.pptx
    /wp-content/uploads/2016/10/Genetics.pptx
    Genetics.pptx
    /wp-content/uploads/2016/10/Tissues.pptx
    Tissues.pptx
    /wp-content/uploads/2016/10/Oncology.pptx
    Oncology.pptx
    /wp-content/uploads/2016/10/Heart_physiology.pptx
    Heart_physiology.pptx
    /wp-content/uploads/2016/10/Heart_pathophysiology.pptx
    Heart_pathophysiology.pptx
    /wp-content/uploads/2016/10/Blood_immunology.pptx
    Blood_immunology.pptx
    /wp-content/uploads/2016/10/Arteries_physiology.pptx
    Arteries_physiology.pptx
    /wp-content/uploads/2016/10/Arteries_atherothrombosis.pptx
    Arteries_atherothrombosis.pptx
    /wp-content/uploads/2016/10/Arteries_pathophysiology.pptx
    Arteries_pathophysiology.pptx
    /wp-content/uploads/2016/10/Veins.pptx
    Veins.pptx
    /wp-content/uploads/2016/10/Lymphatic_system.pptx
    Lymphatic_system.pptx
    /wp-content/uploads/2016/10/Urinary_system.pptx
    Urinary_system.pptx
    /wp-content/uploads/2016/10/Reproduction.pptx
    Reproduction.pptx
    /wp-content/uploads/2016/10/Embryology.pptx
    Embryology.pptx
    /wp-content/uploads/2016/10/Endocrinology.pptx
    Endocrinology.pptx
    /wp-content/uploads/2016/10/Diabetes.pptx
    Diabetes.pptx
    /wp-content/uploads/2016/10/Nervous_system.pptx
    Nervous_system.pptx
    /wp-content/uploads/2016/10/Neural_cells.pptx
    Neural_cells.pptx
    /wp-content/uploads/2016/10/Bones.pptx
    Bones.pptx
    /wp-content/uploads/2016/10/Bone_structure.pptx
    Bone_structure.pptx
    /wp-content/uploads/2016/10/Bone_fractures.pptx
    Bone_fractures.pptx
    /wp-content/uploads/2016/10/Bacteriology_virology.pptx
    Bacteriology_virology.pptx
    /wp-content/uploads/2016/10/Parasitology.pptx
    Parasitology.pptx
    /wp-content/uploads/2016/10/Digestive_system.pptx
    Digestive_system.pptx
    /wp-content/uploads/2016/10/Respiratory_system.pptx
    Respiratory_system.pptx
    /wp-content/uploads/2016/10/ENT.pptx
    ENT.pptx
    /wp-content/uploads/2016/10/Muscles.pptx
    Muscles.pptx
    /wp-content/uploads/2016/10/Ophthalmology.pptx
    Ophthalmology.pptx
    /wp-content/uploads/2016/10/Dermatology.pptx
    Dermatology.pptx
    /wp-content/uploads/2016/10/Risk_Factors.pptx
    Risk_Factors.pptx
    /wp-content/uploads/2016/10/Lipids.pptx
    Lipids.pptx
    /wp-content/uploads/2016/10/Dietetics.pptx
    Dietetics.pptx
    /wp-content/uploads/2016/10/Medical_equipment.pptx
    Medical_equipment.pptx
    /wp-content/uploads/2016/10/Medical_acts.pptx
    Medical_acts.pptx
    /wp-content/uploads/2016/10/Paraclinical_exams.pptx
    Paraclinical_exams.pptx
    /wp-content/uploads/2016/10/Drugs.pptx
    Drugs.pptx
    /wp-content/uploads/2016/10/Microbiology_cellculture.pptx
    Microbiology_cellculture.pptx
    /wp-content/uploads/2016/10/Chemistry.pptx
    Chemistry.pptx
    /wp-content/uploads/2016/10/Lab_apparatus.pptx
    Lab_apparatus.pptx
    /wp-content/uploads/2016/10/People.pptx
    People.pptx
    /wp-content/uploads/2016/10/World_maps.pptx
    World_maps.pptx
    /wp-content/uploads/2016/10/Animals.pptx
    Animals.pptx
    /wp-content/uploads/2016/10/Scientific_graphs.pptx
    Scientific_graphs.pptx
    /wp-content/uploads/2016/10/General-items.pptx
    General-items.pptx
    

    参考
    ython 字符分割时,只分割最后一个(rsplit的使用)
    soup.find_all()用法
    Beautiful Soup之find()和find_all()的基本使用

    相关文章

      网友评论

          本文标题:Python 批量下载 网站资源pptx

          本文链接:https://www.haomeiwen.com/subject/nuutidtx.html