pandas.cut()使用

作者: butters001 | 来源:发表于2020-09-25 18:51 被阅读0次

pandas.cut()使用
pandas函数-cut
pandas_cut
【Python学习】No.7 Pandas常用函数（一）
pandas.cut函数说明
pandas.cut与pandas.qcut使用方法与区别
使用使用
C++ STL 练手（vector的使用）
C++ STL 练手（multimap的使用）
C++ STL 练手（multiset的使用）

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise')

将一组连续值分成离散间隔。
当您需要将数据值分段和分类时，请使用cut。此功能对于从连续变量到分类变量也很有用。例如将年龄转换为年龄范围组。

参数说明

x：array-like
输入的连续值数组，必须是一维的
bins : int, sequence of scalars, or pandas.IntervalIndex
分组依据
- int：
  整数，代表将x平分成bins份。x的范围在每侧扩展0.1%，以包括x的最大值和最小值。
- sequence of scalars：
  标量序列，标量序列定义了被分割后每一个bin的区间边缘，此时x没有扩展。
- IntervalIndex：
  定义要使用的精确区间。
right : bool, default True
是否包含最右边的值。如果bins是[1, 2, 3, 4]，区间就是(1,2], (2,3], (3,4]。如果为False，不包含右边，区间就是(1,2), (2,3), (3,4)
labels : array or bool, optional
每组的标签，长度必须和组的长度一致。如果分组是(1,2), (2,3), (3,4)，则标签的长度必须为3，表示每组的别名。如果为False，则只返回垃圾箱(bins)，不返回out。
retbins : bool, default False。（return bins缩写）
是否返回垃圾桶(bins)，默认不返回。如果为True，cut将有两个返回值，第二个返回值类似 array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
precision : int, default 3
小数精度，默认为3
include_lowest : bool, default False
第一个桶的初始值是否包含在内。np.arange(0, 101, 10) 默认不包含0，第一个桶为(0, 10]。如果设置为True，则包含0，第一个桶就是(-0.001, 10.0]
duplicates : {default ‘raise’, ‘drop’}, optional
如果容器边缘不是唯一的，则引发ValueError或丢弃非唯一变量

返回值

out：pandas.Categorical，Series或ndarray
bins：numpy.ndarray或IntervalIndex
仅在retbins = True时返回

# 离散成三个相等大小的容器
In [33]: pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)                                                                                            
Out[33]: 
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]]

# 返回垃圾桶(bins)
In [35]: a, b = pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)                                                                       

In [36]: a                                                                                                                                  
Out[36]: 
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]]

In [37]: b                                                                                                                                  
Out[37]: array([0.994, 3.   , 5.   , 7.   ])

# 为相同类别的垃圾箱分配标签，返回的分类类别是标签并且已排序。
In [38]: pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["bad", "medium", "good"])                                                          
Out[38]: 
[bad, good, medium, medium, good, bad]
Categories (3, object): [bad < medium < good]

# 只返回垃圾箱bins，不返回out
In [40]: pd.cut([0, 1, 1, 2], bins=4, labels=False)                                                                                         
Out[40]: array([0, 1, 1, 3])

# 传递Series作为输入将返回一个新的 categorical 类型的Series
# 分成3段
In [41]: s = pd.Series(np.array([2, 4, 6, 8, 10]), index=['a', 'b', 'c', 'd', 'e'])                                                         

In [42]: s                                                                                                                                  
Out[42]: 
a     2
b     4
c     6
d     8
e    10
dtype: int64

In [43]: pd.cut(s, 3)                                                                                                                       
Out[43]: 
a    (1.992, 4.667]
b    (1.992, 4.667]
c    (4.667, 7.333]
d     (7.333, 10.0]
e     (7.333, 10.0]
dtype: category
Categories (3, interval[float64]): [(1.992, 4.667] < (4.667, 7.333] < (7.333, 10.0]]

# 返回一个映射
In [59]: pd.cut(s, [0, 2, 4, 6, 8, 10])                                                                                                     
Out[59]: 
a     (0, 2]
b     (2, 4]
c     (4, 6]
d     (6, 8]
e    (8, 10]
dtype: category
Categories (5, interval[int64]): [(0, 2] < (2, 4] < (4, 6] < (6, 8] < (8, 10]]

# 可以看到 0，1，2，3，4  其实是对 [(0, 2] < (2, 4] < (4, 6] < (6, 8] < (8, 10] 的位置(索引)的映射
In [58]: pd.cut(s, [0, 2, 4, 6, 8, 10], labels=False, retbins=True, right=True)                                                             
Out[58]: 
(a    0
 b    1
 c    2
 d    3
 e    4
 dtype: int64, array([ 0,  2,  4,  6,  8, 10]))

网友评论

本文标题：pandas.cut()使用

本文链接：https://www.haomeiwen.com/subject/poifuktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

pandas.cut()使用

相关文章

pandas.cut()使用

pandas函数-cut

pandas_cut

【Python学习】No.7 Pandas常用函数（一）

pandas.cut函数说明

pandas.cut与pandas.qcut使用方法与区别

使用使用

C++ STL 练手（vector的使用）

C++ STL 练手（multimap的使用）

C++ STL 练手（multiset的使用）

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读