Python 强化训练：第九篇

作者: 谢小路 | 来源:发表于2016-11-12 16:29 被阅读258次

Python 强化训练：第九篇
Python进阶-简单数据结构
Python 强化训练：第一篇
外行学 Python 第十一篇数据可视化
Python 强化训练：第十一篇
Python 强化训练：第六篇
Python 强化训练：第五篇
Python 强化训练：第四篇
Python 强化训练：第三篇
Python 强化训练：第二篇

主题

数据处理

csv文件
json文件
xml: xpath
excel

1.

CSV: 逗号分隔值，其文件以纯文本形式存储表格数据(数字和文本)。
模块：csv
方法：csv.reader(), csv.writer(), csv.Dictreader(), csv.writerow(), csv.writerows()

import csv
headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
rows = [('AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800),
         ('AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500),
         ('AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000),
       ]

with open('name.csv', newline="") as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    print(headers)
    print("=====")
    for row in f_csv:
        print(row)
        print("===")

写入文件形式：

1478869402821.png

要求：将name.csv文件中Volume的值大于195500的数据写入name_copy.csv文件中.

import codecs
import csv

with codecs.open("name_copy.csv", 'w') as f_name_copy:
    f_name_one = csv.writer(f_name_copy)
    with codecs.open("name.csv", 'r') as f_name:
        f_name_two = csv.reader(f_name)
        headers = next(f_name_two)
        f_name_one.writerow(headers)
        for one in f_name_two:
            print(one)
            if int(one[5]) > 195500:
                f_name_one.writerow(one)

文件显示：

1478869756196.png

要求：获取雅虎指定股票历史数据，并存入csv文件中.

import requests
import csv
import codecs

response = requests.get('http://table.finance.yahoo.com/table.csv?s=000001.sz')
content = response.text

with codecs.open("pingan.csv", 'w') as f:
    content_all = csv.writer(f)
    for one in content.split('\n'):
        content_all.writerow(one.split(','))

Paste_Image.png

2.

python 如何处理json文件：

json 模块
dumps(),dump(), loads(),load()方法


import json
import codecs
# json.dumps()
# json.loads()
# json.dump()  # 接口是一个文件
# json.load()  # 接口是一个文件

one = {"wuhan": 10, "beijing": 1, "changsha": 6}
two = [1, 2, "apple", 'chuizi', {"a": 1, "b": 2}]
one_json = json.dumps(one, separators=[",  ", ":  "], indent=4)
one_1_json = json.dumps(one, sort_keys=True)
two_json = json.dumps(two, separators=[",", ":"])
print(one_json)
print(one_1_json)
print(two_json)

with codecs.open("one.json", 'w') as f:
     json.dump(one, f)

with codecs.open("one.json", 'r') as f:
    print(json.load(f))

转换对照表：

python	json
dict	object
list,tuple	array
str,unicode	string
int,long,float	number
True	true
False	false
None	null

print(json.dumps(None))
print(json.dumps(True))
print(json.dumps(False))

print(json.loads("null"))
print(json.loads("true"))
print(json.loads("false"))

# with codecs.open("one.json", 'w') as f:
#     json.dump(one, f)

with codecs.open("one.json", 'r') as f:
    print(json.load(f))


res = requests.get("http://www.weather.com.cn/data/cityinfo/101010100.html")
with codecs.open("weather.json", 'w', encoding="utf8") as f_wea:
    json.dump(res.text, f_wea)

with codecs.open("weather.json", 'r') as f_wea_r:
    A = json.load(f_wea_r)

print(A)

3.

xpath语法：

Syntax	Meaning
tag	Selects all child elements with the given tag.
*	Selects all child elements.
.	Selects the current node.
//	Selects all subelements, on all levels beneath the current element.
..	Selects the parent element.
[@atrrib]	Selects all elements that have the given attribute.
[@atrrib='value']	Selects all elements for which the given attribute has the given value.
[tag]	Selects all elements that have a child named tag.
[tag="text"]	Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.
[position]	Selects all elements that are located at the given position.

from xml.etree.ElementTree import parse
import requests
import codecs
tree = parse("html.xml")
root = tree.getroot()
print(root.tag)
print(root.attrib)
for child in root:
    print(child.tag, child.attrib)

# tag: 查找给定标签的子节点
print(root.findall('country'))

# *：查找所有子节点
print(root.findall("*"))

# . : 查找当前节点
print(root.findall("."))

# // :所有子孙节点
print(root.findall('.//'))

# .. : 父节点
print(root.findall('.//rank/..'))

# [@atrrib] :带有这个属性值的元素
print(root.findall('.//country[@name]'))

# [@atrrib=“value”]
print(root.findall('.//country[@name="Liechtenstein"]'))

# [tag] : 带有tag子节点的节点
print(root.findall('[country]'))

4.

模块： xlrd, xlwt
功能：负责读写操作

book.xlsx文件内容和结构：

1478938867731.png

import xlrd
import xlwt
name = xlrd.open_workbook('book.xlsx')
sheet = name.sheets()
for one in sheet:
    print(one.name)
result = name.sheet_by_name('result')
print(result.nrows, result.ncols)
one_one = result.cell(0, 0)
one_two = result.cell(0, 1)
one_three = result.cell(0, 2)
one_four = result.cell(0, 3)

# 1: text  2: number
print(one_one.ctype, one_one.value)
print(one_two.ctype, one_two.value)
print(one_three.ctype, one_three.value)
print(one_four.ctype, one_four.value)

print(result.row(1))
print(result.row_values(1))
print(result.row_values(1, 1))
print(result.col(1))
print(result.col_values(1))
print(result.col_values(1, 1))

result.put_cell(0, result.ncols, xlrd.XL_CELL_TEXT, u"总分", None)
for row in range(1, result.nrows):
    t = sum(result.row_values(row, 1))
    print(t)
    result.put_cell(row, result.ncols, xlrd.XL_CELL_NUMBER, t, None)

wbook = xlwt.Workbook()
wsheet = wbook.add_sheet("Sheet1")

style = xlwt.easyxf("align: vertical center, horizontal center")
value = [["名称", "hadoop编程实战", "hbase编程实战", "lucene编程实战"], ["价格", "52.3", "45", "36"], ["出版社", "机械工业出版社", "人民邮电出版社", "华夏人民出版社"], ["中文版式", "中", "英", "英"]]
for i in range(0, 4):
    for j in range(0, len(value[i])):
        wsheet.write(i, j, value[i][j], style)
wbook.save("wbook1.xls")

friend = name.sheet_by_index(1)
friend_copy = name.sheet_by_name("friend")

print(friend.nrows, friend.ncols)
print(friend_copy.nrows, friend_copy.ncols)

Python 强化训练：第九篇
主题数据处理 csv文件 json文件 xml: xpath excel 1. CSV: 逗号分隔值，其文件以纯...
Python进阶-简单数据结构
本文为《爬着学Python》系列第九篇文章。从现在开始算是要进入“真刀真枪”的Python学习了。之所以这么说，...
Python 强化训练：第一篇
强化训练: 第一篇目标 0. 校招算是结束了吧！简单回顾几句：校招python岗位极少，多是初创型公司对pyt...
外行学 Python 第十一篇数据可视化
在外行学 Python 爬虫第九篇读取数据库中的数据中完成了使用 API 从数据库中读取所需要的数据，但是...
Python 强化训练：第十一篇
主题正常情况下，程序的运行按顺序执行，但是涉及某些操作，等待结果完成却是非常耗时的操作，比如爬虫进行IO操作等，...
Python 强化训练：第六篇
强化训练：第六篇 1. 深浅拷贝:是否是同一个对象,使用id判断是否指向同一个对象, 深浅拷贝，引用区分可变对象和...
Python 强化训练：第五篇
强化训练：第五篇主题：函数基本定义方法任意数量参数只接受关键字参数显示数据类型默认参数匿名函数 N个...
Python 强化训练：第四篇
强化训练：第四篇问题来源面向对象的语言的重要特性是存在类的概念内容新式类和旧式类定义类的属性和“访问权限...
Python 强化训练：第三篇
强化训练：第三篇问题来源 pythoner面试经常会问到迭代器和生成器的区别内容可迭代对象迭代器：正向迭代...
Python 强化训练：第二篇
强化训练：第二篇摘要：心好累. 问题来源爬虫中会经常会遇到字符串的处理主要内容拆分字符串字符串开头结尾 ...

Python 强化训练：第九篇

主题

1.

2.

3.

4.

相关文章

Python 强化训练：第九篇

Python进阶-简单数据结构

Python 强化训练：第一篇

外行学 Python 第十一篇数据可视化

Python 强化训练：第十一篇

Python 强化训练：第六篇

Python 强化训练：第五篇

Python 强化训练：第四篇

Python 强化训练：第三篇

Python 强化训练：第二篇

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

菜鸟

Python 上手指南

Python 运维

程序员

生活不易我用python

我的Python自学之路

Python3自学爬虫实战