Dynamic data - page content are loaded in ajax like Jianshu
Solution - use selenium with scrapy
https://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page
What is selenium?
Selenium is a tool that automates web applications for testing purpose. Requires the use of a specific type of browser webDriver to start.
- Can be solely used or along with scrapy, better used with scrapy since scrapy is faster and smaller
Steps
- Selenium automates browser and iterate on the javascript (loading ajax contents)
- pip install selenium && brew install chromedriver
- overrides start_requests which returns a request
- uses selector to extract
- return http request with url and callback
Last words
It is pretty repetitive in terms of using it to scrape, in terms of simulating ajax get or post request, it basically uses a jquery-like wrapper called click()
网友评论