继续前面的练习,之前的文章参考:
- pandas实例-了解你的数据-Chipotle
- pandas实例-筛选与排序-Chipotle
- pandas实例-数据可视化-Chipotle
- pandas实例-了解你的数据-Occupation
- pandas实例-筛选与过滤-Euro 12
- pandas实例-筛选与过滤-Fictional Army
- pandas实例-聚合-Alcohol Consumption
- pandas实例-聚合-Occupation
- pandas实例-聚合-Regiment
- pandas实例-Apply-Student Alcohol Consumption
- pandas实例-Apply-Crime Rates
- pandas实例-Merge-MPG Cars
- pandas实例-Merge-Fictitious Names
- pandas实例-merge-House Market
- pandas实例-Stats-US_Baby_Names
- pandas实例-Stats-Wind Statistics
- pandas实例-Visualization-Titanic_Desaster
- pandas实例-Visualization-Scores
- pandas实例-Visualization-Online Retail
- pandas实例-Visualization-Tips
- pandas实例-Time Series-Apple Stock
这一篇是关于数据操作的,前面有类似题目,这里就当做回顾好了
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
df = pd.read_csv(url)

这里有个问题,是没有列名
我们需要重新指定下列名
df = pd.read_csv(url , header=None , names=['sepal_length','sepal_width', 'petal_length', 'petal_width', 'class'])

1. Is there any missing value in the dataframe
这里要看数据中有没有缺失值,其实,通过上面的info
函数就可以看出来,这里还有另一种方法
pandas.isna
pandas.isnull
这俩函数貌似一样
This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
pd.isna(df).sum()
pd.isnull(df).sum()

2. Lets set the values of the rows 10 to 29 of the column 'petal_length' to NaN
将某几行数据设置为NaN
df['petal_length'].iloc[10:30] = np.nan

3. Good, now lets substitute the NaN values to 1.0
将NaN设置为1
df.fillna(1.0 , inplace=True)
df['petal_length'].iloc[10:30]

4. Now let's delete the column class
删除某一列
df.drop(columns='class' , inplace=True)

5. Set the first 3 rows as NaN
把前3行都设置为NaN
df.iloc[:4] = np.nan

我这里设置多了,注意哦
6. Delete the rows that have NaN
把包含NaN的行都删除掉
df.dropna(inplace=True)

7. Reset the index so it begins with 0 again
重置index
df.reset_index(drop=True , inplace=True)

好了,这一篇结束,收工,主要还是要了解函数,一开始会记不住,多用,用的时候查查API文档
网友评论