Week 1_Practice 1.2_Crawling Ite

作者: Li_Tang | 来源:发表于2016-12-26 14:34 被阅读0次

Week 1_Practice 1.2_Crawling Ite
Week 1_Practice 1.4_Crawl Images
Week 1_Practice 1.1_A Simple We
Week 1_Practice 1.3_Crawling Hou
list 删除节点新姿势
高效的MAP遍历方式
UICollectionViewFlowLayout ite
星期的字典
promise
My Leetcode Solutions [week1 - w

Some critical information has been crawled from a website. The website is as below:

The information we need is "item title", "image", "review number", "price", and "star". The result is shown here:

The general process for the web crawling could be described as below (from the course website) :

1) The html file could be read (r) or write (w) from open() function. There are two ways:

(1) file = open('absolute or relative file path','r'); print(file.read()); file.close()

(2) with open('absolute or relative file path','r') as file: print(file.read())

2) A special, unique label information (i.e., css path) should be identified in the html file. The relevant commands are: inspect and copy selector.

2) One example of the css path looks like:

"body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)"

Note: "nth-child" should be changed for "nth-of-type(n)" in BeautifulSoap.

3) The information, or css path, should be incorporated in soup.select('css path') to get the result list:

"stars = soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')"

The "starts" is a list.

4) In order to get a single result from the list, we could use zip() function and for "for" "in" structure, to iterate through the "zipped" lists:

"for title,image,review,price,star in zip(titles,images,reviews,prices,stars):"

5) Use get_text(), get('src'), or get("href") functions to retrieve the desired content from the tag.

data = {

'title': title.get_text(), # 使用get_text()方法取出文本

'image': image.get('src'), # 使用get 方法取出带有src的图片链接

'review': review.get_text(),

'price': price.get_text(),

'star':len(star.find_all("span",class_='glyphicon glyphicon-star'))*'★'

# 使用find_all 统计有几处是★的样式

# 由于find_all()返回的结果是列表,我们再使用len()方法去计算列表中的元素个数,也就是星星的数量

}

网友评论

本文标题：Week 1_Practice 1.2_Crawling Ite

本文链接：https://www.haomeiwen.com/subject/ygmlvttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Week 1_Practice 1.2_Crawling Ite

Some critical information has been crawled from a website. The website is as below:

The information we need is "item title", "image", "review number", "price", and "star". The result is shown here:

The general process for the web crawling could be described as below (from the course website) :

相关文章