美文网首页
Python入门学习笔记

Python入门学习笔记

作者: Jason数据分析生信教室 | 来源:发表于2020-11-07 17:47 被阅读0次

Introduction to Data Science in python --Datacamp
有关于Python入门的工具书和参考资料有很多,根据作者的背景不同,切入点完全不一样。建议初学者根据自己使用Python的目的来选择。比如说码农背景的作者会把人带入编写小程序的世界里去。这里我的目的是操作编辑数据和数据分析,切入点会和R比较相似。

Module是什么

  • 是关联性很强的工具库,相当于R里的包
  • 常用例
    -- matplotlib: 绘图
    -- pandas:构建数据
    -- scikit-learn: 机器学习函数库
    -- scipy:数学计算
    -- nltk:自然语言

加载module

import numpy as np

用Pandas操作数据框

设置好路径以后,

# Import pandas under the alias pd
import pandas as pd

# Load the CSV "credit_records.csv"
credit_records = pd.read_csv('credit_records.csv')

# Display the first five rows of credit_records using the .head() method
print(credit_records.head())
            suspect         location              date         item  price
0    Kirstine Smith   Groceries R Us   January 6, 2018     broccoli   1.25
1      Gertrude Cox  Petroleum Plaza   January 6, 2018  fizzy drink   1.90
2  Fred Frequentist   Groceries R Us   January 6, 2018     broccoli   1.25
3      Gertrude Cox   Groceries R Us  January 12, 2018     broccoli   1.25
4    Kirstine Smith    Clothing Club   January 9, 2018        shirt  14.25

credit_records.head(): 查看前五行
credit_records.info():查看整体数据情况,有点像R里的summary

选择列

有两种方法,一种是用['变量名']来指定列

items = credit_records['item']
print(items)

还有一种方法是.变量名

items = credit_records.item
print(items)

效果是一样的,但是不一样的地方是.变量名中不能出现空格啊奇奇怪怪的标点符号之类的东西。

根据逻辑提取数据框里的数据

这个和R很像
mpr在这里是一个数据库,Age,Status,Dog Breed,Status是其中的一些变量。可以通过==, !=,>,>=等逻辑运算来提取数据。

# Select the dogs where Age is greater than 2
greater_than_2 = mpr[mpr.Age > 2]
print(greater_than_2)

# Select the dogs whose Status is equal to Still Missing
still_missing = mpr[mpr.Status=='Still Missing']
print(still_missing)

# Select all dogs whose Dog Breed is not equal to Poodle
not_poodle = mpr[mpr['Dog Breed']!='Poodle']
print(not_poodle)

简单绘图

大概分三步

  • 导入画图工具from xxx import xxx as xxx
  • 构建图形xxx.plot(x,y)
  • 展示结果xxx.show()
# From matplotlib, import pyplot under the alias put 
from matplotlib import pyplot as plt
# Plot Officer Deshaun's hours_worked vs. day_of_week
plt.plot(deshaun.day_of_week, deshaun.hours_worked)
# Display Deshaun's plot
plt.show()
  • 添加标题 plt.title()
  • 添加y轴标签plt.ylabel()
  • 添加副标题plt.legend()

线图

# Lines
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label='Deshaun')
plt.plot(aditya.day_of_week, aditya.hours_worked, label='Aditya')
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label='Mengfei')

# Add a title
plt.title("Officer Deshaun's plot")

# Add y-axis label
plt.ylabel("day_of_week")

# Legend
plt.legend()
# Display plot
plt.show()
  • 标注某个坐标点的信息 plt.text(x,y,"Info")
  • linestyle 选择线条style
    dot : :
    dashed: --
    line: ''

  • marker 选择点的style
    cirl : o
    diamond : d
    square: s

linestye marker

绘图进阶

scatter plot图

# Explore the data
print(cellphone.head())

# Create a scatter plot of the data from the DataFrame cellphone
plt.scatter(cellphone.x, cellphone.y)

# Add labels
plt.ylabel('Latitude')
plt.xlabel('Longitude')

# Display the plot
plt.show()

棒状图

  • bar plot
# Display the DataFrame hours using print
print(hours)

# Create a bar plot from the DataFrame hours
plt.bar(hours.officer, hours.avg_hours_worked,
        # Add error bars
       yerr=hours.std_hours_worked)

# Display the plot
plt.show()
  • 叠加的bar.plot
    指定参数bottom
# Plot the number of hours spent on desk work
plt.bar(hours.officer, hours.desk_work, label='Desk Work')

# Plot the hours spent on field work on top of desk work
plt.bar(hours.officer,hours.field_work,bottom=hours.desk_work,label="Field Work")

# Add a legend
plt.legend()

# Display the plot
plt.show()

直方图

# Change the range to start at 5 and end at 35
plt.hist(puppies.weight,
        range=(5, 35))

# Add labels
plt.xlabel('Puppy Weight (lbs)')
plt.ylabel('Number of Puppies')

# Display
plt.show()

至此,最基本的数据操作和绘图已经没有问题了,接下来可以进行进阶学习。

相关文章

网友评论

      本文标题:Python入门学习笔记

      本文链接:https://www.haomeiwen.com/subject/wqltbktx.html