美文网首页
收集资料 beautiful soup - python笔记

收集资料 beautiful soup - python笔记

作者: 自走炮 | 来源:发表于2020-08-15 01:24 被阅读0次
    from bs4 import BeautifulSoup # pip install beautifulsoup4
    import requests
    import time
    import random
    
    def run():
        page_url = "http://www7b.biglobe.ne.jp/~browneye/english/TOEIC400-1.htm"
        r = requests.get(page_url)
        r.encoding = r.apparent_encoding
        soup = BeautifulSoup(r.text, features="html.parser")
    
        td_list = soup.find_all("td")
        td_values = [x.text for x in td_list]
        splited_list = []
        for index in range(0, len(td_values), 4):
            word_row = td_values[index: index + 4]
            if word_row[0] == '\u3000':
                continue
            splited_list.append(word_row)
    
        with open("toeic_words.txt", "w") as f:
            for value in splited_list:
                f.write("{},{}\n".format(value[1], value[2]))
            print("Yes, done.")
    
    if __name__ == "__main__":
        run()
    

    相关文章

      网友评论

          本文标题:收集资料 beautiful soup - python笔记

          本文链接:https://www.haomeiwen.com/subject/xssgrktx.html