pandas 面试题挑战二

作者: 人工智能人话翻译官 | 来源:发表于2019-05-19 18:39 被阅读76次

pandas 面试题挑战二
pandas 面试题挑战一
pandas 面试题挑战五
pandas 面试题挑战四
pandas 面试题挑战六
pandas 面试题挑战九
pandas 面试题挑战十
pandas 面试题挑战七
pandas 面试题挑战八
pandas 面试题挑战十二

6 取出Series 1中独有的数据

现有两个Series， ser1和ser2 如下：

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

取出在ser1中出现，但不在ser2中出现的1，2，3出来

解决方法：

ser1.isin(ser2)

输出如下：

0    False
1    False
2    False
3     True
4     True
dtype: bool

这代表，ser1中的"0-2"元素没有出现在ser2中，所对应输出为False。"3-4"元素出现在ser2中，所对应的输出为True。
接下来我们就可以按照False和True来输出ser1了，因为我们要保留"0-2"，所以需要做一个取反运算，也就是用"~"来实现。

~ser1.isin(ser2)

输出

0     True
1     True
2     True
3    False
4    False
dtype: bool

最后可以把上面这个Series作为下标传给ser1就可以实现我们的需求了。

ser1[~ser1.isin(ser2)]

输出

0    1
1    2
2    3
dtype: int64

其实整个的答案就是"ser1[~ser1.isin(ser2)]"就可以实现了。

7 取出两个Series中非公有的数据

已有两个Series如下

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

取出两个Series中的非公有数据，也就是1,2,3,6,7,8
解决办法如下

ser_u = pd.Series(np.union1d(ser1, ser2))

取出合集

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64

ser_i = pd.Series(np.intersect1d(ser1, ser2))

取出交集

0    4
1    5
dtype: int64

然后的思路就是和上题一样了

~ser_u.isin(ser_i)

输出

0     True
1     True
2     True
3    False
4    False
5     True
6     True
7     True
dtype: bool

这样就取得了所有非公有元素的位置。
最后

ser_u[~ser_u.isin(ser_i)]

拿到我们想要的结果

0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

在Series中定位最小值，最大值，中值

现有Series如下：

state = np.random.RandomState(seed=100)
#np.random.RandomState 会返回一个随机序列生成器
ser = pd.Series(state.normal(10, 5, 25))
#设置随机序列生成器的均值为10，标准差为5，产生25个元素
ser

元素内容如下：

0      1.251173
1     11.713402
2     15.765179
3      8.737820
4     14.906604
5     12.571094
6     11.105898
7      4.649783
8      9.052521
9     11.275007
10     7.709865
11    12.175817
12     7.082025
13    14.084235
14    13.363604
15     9.477944
16     7.343598
17    15.148663
18     7.809322
19     4.408409
20    18.094908
21    17.708026
22     8.740604
23     5.787821
24    10.922593
dtype: float64

求该Series的最小值，最大值，中值
我们可以方便的通过np.percentile来解决

np.percentile(ser, q=[0,  50, 100])
#q 代表percentile也就是百分位数，0就是最小，50%就是中值，100就是最大

结果如下：

array([ 1.25117263, 10.92259345, 18.0949083 ])

9 统计Series中元素出现的个数

现有Series如下：

ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=20)))
ser

元素内容如下：

0     e
1     b
2     g
3     b
4     e
5     c
6     g
7     d
8     h
9     d
10    d
11    f
12    g
13    a
14    d
15    f
16    e
17    a
18    h
19    h
dtype: object

np.take讲解
np.take(list('abcdefgh'), np.random.randint(8, size=20))
np.random.randint(8, size=20)产生20个随机数，随机数的范围是0~7之间的整数。这个随机数一会将对应list的下标
list('abcdefgh')最为数据源。
整合到一起np.take就是从数据源中，做20次抽取，每次从list('abcdefgh')抽一个字符出来。

统计这个Series中，每个元素出现的个数。
解决方法如下：

ser.value_counts()

输出如下

d    4
g    3
e    3
h    3
b    2
a    2
f    2
c    1
dtype: int64

10 保留Series中出现最多的两个值，其他的值删除掉

现有Series如下

ser = pd.Series(np.random.randint(1, 5, 12))
ser

输入如下：

0     2
1     2
2     4
3     4
4     2
5     1
6     4
7     4
8     3
9     2
10    1
11    1
dtype: int64

通过统计：

print("Top 2 Freq:\n", ser.value_counts())

输出

Top 2 Freq:
 4    4
2    4
1    3
3    1
dtype: int64

代表4出现了4次，2出现了4次。为出现次数排名的前两名，现在只保留2和2，删除1和3

解决办法如下：

ser.value_counts().index[:2]

得到出现最多的前两个元素索引如下

Int64Index([4, 2], dtype='int64')

接下来判断ser中的元素，是否在[4,2]中，利用刚才学的isin进行判断

ser.isin(ser.value_counts().index[:2])

输出

0      True
1      True
2      True
3      True
4      True
5     False
6      True
7      True
8     False
9      True
10    False
11    False
dtype: bool

接下来取反，并把非前二元素设置为nan

ser[~ser.isin(ser.value_counts().index[:2])] = np.nan
ser

输入如下

0     2.0
1     2.0
2     4.0
3     4.0
4     2.0
5     NaN
6     4.0
7     4.0
8     NaN
9     2.0
10    NaN
11    NaN
dtype: float64

最后利用dropna进行删除操作。

ser.dropna()

得到最后的结果

0     2.0
2     4.0
4     2.0
6     4.0
7     4.0
8     2.0
9     2.0
10    4.0
11    2.0
dtype: float64

如果你看最后的索引不连续不爽，可以通过reindex操作进行从新排序

ser.dropna().reindex()

输出

0    2.0
1    2.0
2    4.0
3    4.0
4    2.0
6    4.0
7    4.0
9    2.0
dtype: float64

网友评论

本文标题：pandas 面试题挑战二

本文链接：https://www.haomeiwen.com/subject/khpqzqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！