美文网首页
在ubuntu 20.04上使用selinium headles

在ubuntu 20.04上使用selinium headles

作者: awker | 来源:发表于2024-06-27 14:32 被阅读0次

    1、安装 selinium 和 chrome 浏览器

    # pip install selenium
    
    # wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    # dpkg -i google-chrome-stable_current_amd64.deb
    ## 可以看到 chrome 浏览器的版本是 126.0.6478.126-1
    # dpkg -l | grep chrome
    ii  google-chrome-stable                  126.0.6478.126-1                      amd64        The web browser from Google
    

    2、安装 chromedriver
    chromedriver 的版本要和 chrome 浏览器对应,比如都要是 126.xxx.xxx.xxx
    根据上面的安装的 chrome 浏览器版本 126.0.6478.126-1 ,从 https://googlechromelabs.github.io/chrome-for-testing/#stable 下载对应的 chromedriver 版本,比如 https://storage.googleapis.com/chrome-for-testing-public/126.0.6478.126/linux64/chromedriver-linux64.zip

    # cd /opt/
    # wget https://storage.googleapis.com/chrome-for-testing-public/126.0.6478.126/linux64/chromedriver-linux64.zip
    # unzip chromedriver-linux64.zip
    # ls /opt/chromedriver-linux64/chromedriver
    

    3、实现爬取的代码demo

    # cat get_bn_listing_demo.py 
    from selenium.webdriver.chrome.service import Service
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    import time
    import pandas as pd
    from selenium.webdriver.chrome.options import Options
    import datetime as dtdt
    
    
    def main():
        # 设置Chrome浏览器无头模式
        options = Options()
        options.add_argument("--headless")
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-dev-shm-usage')
        # 第 2 步下载的 chromedriver 路径
        chromedriver_path = "/opt/chromedriver-linux64/chromedriver"
        driver = webdriver.Chrome(service=Service(executable_path=chromedriver_path), options=options)
    
        # 要爬取的示例网页地址,获取最新公告的 时间 和 标题
        url = f"https://www.binance.com/en/support/announcement/new-cryptocurrency-listing?c=48&navId=48&hl=en"
    
        titles = []
        dts = []
        driver.get(url)
        time.sleep(5)
        title = driver.find_elements(By.CLASS_NAME, 'css-1yxx6id')
        for t in title:
            titles.append(t.text)
    
        # print(titles)
        dt = driver.find_elements(By.CLASS_NAME, 'css-eoufru')
        for t in dt:
            dts.append(t.text)
        # print(dts)
        driver.quit()
        row = {
            "title": titles,
            "datetime": dts,
        }
    
        df = pd.DataFrame(row)
        # print(df)
        filtered_df = df[df['title'].str.contains('Will List')]
        print(filtered_df)
        for index, row in filtered_df.iterrows():
            print(f"Title: {row['title']}")
            print(f"Date: {row['datetime']}")
    
    
    if __name__ == "__main__":
        main()
    
    
    

    4、运行结果


    5、一些问题
    如何在xshell中运行代码,可能会弹出X11转发请求的窗口 ,根据提示关闭就行


    相关文章

      网友评论

          本文标题:在ubuntu 20.04上使用selinium headles

          本文链接:https://www.haomeiwen.com/subject/mbdecjtx.html