downloading video urls from yout

作者: 狼无雨雪 | 来源:发表于2019-07-05 12:57 被阅读0次

downloading video urls from yout
downloading images urls from wik
downloading image urls from baid
VR影片丨youtube-161012
test video stream online
Convert FIU trace to Disksim tra
downloading images from particul
django 2.0 重定向
node-sass ESOCKETTIMEDOUT 无法安装
TCGA数据下载(2)：Downloading Dataset



"""
really used in fetching url from google images
"""
import re
from selenium import webdriver
import time
import os
import sys
import re
from bs4 import BeautifulSoup
import random
from selenium.webdriver.chrome.options import Options

down_loading_urls = ["Aerial Freestyle Skiing",
                     "Freestyle Skiing Aerials",
                     "Freestyle Skiing Men Aerials",
                     "Freestyle Skiing Women Aerials",
                     "Freestyle Skiing - Ladies' Aerials",
                     "Men's Freestyle Skiing",
                     "Women's Freestyle Skiing",
                     "自由式滑雪空中技巧"]

if __name__ == "__main__":
    baidu_path = 'Skiing-youtube'  #"wikiart"
    


    temp_path = baidu_path + "/" + "temp_youtube.txt"
    path = baidu_path + "/" + "youtube.txt"



    # os.environ["PATH"] += os.pathsep + 'D:\google-art-downloader-master'
    if not os.path.exists(baidu_path):
        os.makedirs(baidu_path)
    # option = webdriver.ChromeOptions()
    # option.add_argument('--headless')
    # option.add_argument('--disable-gpu')
    # browser = webdriver.Chrome(chrome_options = option)
    fireFoxOptions = webdriver.FirefoxOptions()
    fireFoxOptions.set_headless()
    browser = webdriver.Firefox(firefox_options=fireFoxOptions)

    asserts_all=set()

    mark_time = 0
    last_value = 0

    # ------------------test start------------------------

    # browser.get(original_url)





    now_len = 0
    pre_len = 0
    count_all = 0

    try:
        for down_loading_url in down_loading_urls:
            print(down_loading_url)
            original_url =  'https://www.youtube.com/results?search_query='+ down_loading_url.replace(" ","+")
            browser.get(original_url)
        #  js="var q=document.documentElement.scrollTop=100000"
        #  browser.execute_script(js)
            while(True):
                time.sleep(random.randint(1,3))
                browser.execute_script("window.scrollBy(0,1000)")
        #         print(browser.find_element_by_xpath('//*[@id="smb"]'))

                pageSource = browser.page_source
                soup = BeautifulSoup(pageSource,'lxml')
                asserts = soup.find_all('a', {"id":"video-title"})
                for line in asserts:
        #             print(data.get("ou"))
                    try:
                        with open(temp_path,'a',encoding="utf-8") as w_file:
                            w_file.write("https://www.youtube.com" + line.get("href") + "\n")
                        if line.get("href") != None and line.get("href") != "":
                            asserts_all.add("https://www.youtube.com" + line.get("href"))
                    except Exception as e_t:
                        print("temp write", e_t,line)
                print(len(asserts_all))
                now_len = len(asserts_all)
                if now_len == pre_len:
                    count_all += 1
                else:
                    count_all = 0

                if count_all >=10:
                    break
                pre_len = now_len

    except Exception as e:
        print("global",e)
    finally:
        with open(path,'w',encoding="utf8") as write_file:
            for line in asserts_all:
                write_file.write(str(line)+"\n")
    #     pass
        browser.close()

网友评论

工作生活

本文标题：downloading video urls from yout

本文链接：https://www.haomeiwen.com/subject/qlxdhctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

downloading video urls from yout

相关文章

downloading video urls from yout

downloading images urls from wik

downloading image urls from baid

VR影片丨youtube-161012

test video stream online

Convert FIU trace to Disksim tra

downloading images from particul

django 2.0 重定向

node-sass ESOCKETTIMEDOUT 无法安装

TCGA数据下载(2)：Downloading Dataset

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

工作生活