Pandas tutorial: Indexing and Se

作者: 庞贝船长 | 来源:发表于2018-01-25 14:22 被阅读0次

Pandas tutorial: Indexing and Se
Different Choices for Indexing f
Python--MultiIndex多层次索引学习
pandas Indexing, Selection, and
Python pandas 0.23.1 Indexing an
Pandas练习笔记(一)----series
Tutorial： Pandas入门
Java A Beginner's Tutorial,
Pandas: Chained 索引与loc索引, 以及Sett
第八章（数据加工：连接, 合并, 整形）

.loc

is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found.

The .loc attribute is the primary access method. The following are valid inputs:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
A list or array of labels ['a', 'b', 'c']
A slice object with labels 'a':'f' (note that contrary to usual python slices, both the start and the stop are included, when present in the index!)
A boolean array
A callable

.loc

NOTE

This will not modify df because the column alignment is before value assignment.

df.loc[:,['B', 'A']] = df[['A', 'B']]

The correct way is to use raw values

df.loc[:,['B', 'A']] = df[['A', 'B']].values

or just,

df[['B', 'A']] = df[['A', 'B']]

.iloc

is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

You can also assign a dict to a row of a DataFrame:

In [28]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})

In [29]: x.iloc[1] = dict(x=9, y=99)

In [30]: x
Out[30]: 
   x   y
0  1   3
1  9  99
2  3   5

Allowed inputs are:

An integer
A list or array of integers [4, 3, 0]
A slice object with ints 1:7
A boolean array
A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

slicing

[]

With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels

With DataFrame, slicing inside of [] slices the rows.

[start: end: step]

[ ]

Fast scalar value getting and setting

If you only want to access a scalar value, the fastest way is to use the at and iat methods, which are implemented on all of the data structures.

Similarly to loc, at provides label based scalar lookups, while, iat provides integer based lookups analogously to iloc.

Boolean indexing

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

isin

Consider the isin method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want

Sample

where()

Selecting values from a Series with a boolean vector generally returns a subset of the data. To guarantee that selection output has the same shape as the original data, you can use the where method in Series and DataFrame.

Selecting values from a DataFrame with a boolean criterion now also preserves input data shape. where is used under the hood as the implementation. Equivalent is df.where(df < 0)

mask

mask is the inverse boolean operation of where.

Duplicate Data

duplicated
drop_duplicates

Dictionary-like get() method

lookup()

index object

set_index()

Reset the index

As a convenience, there is a new function on DataFrame called reset_index which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation to set_index

Returning a view versus a copy

chained indexing ？

MultiIndex (hierarchical index)

You can think of MultiIndex as an array of tuples where each tuple is unique.

All of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned.

the level labels

The method get_level_values will return a vector of the labels for each location at a particular level.

One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops” levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.

Data alignment and using reindex

Using slicers

pd.IndexSlice

Cross-section

The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.

Alignment

align

Swapping levels

The swaplevel function can switch the order of two levels

Reordering levels

The reorder_levels function generalizes the swaplevel function, allowing you to permute the hierarchical index levels in one step

Sorting a MultiIndex

.sort_index

The is_lexsorted() method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth.

Take Methods

.take

Index Types

MultiIndex
DatetimeIndex and PeriodIndex
TimedeltaIndex
CategoricalIndex
Int64Index and RangeIndex
Float64Index
IntervalIndex

Refenrence

MultiIndex / Advanced Indexing¶
Indexing and Selecting Data¶

网友评论

我爱编程

本文标题：Pandas tutorial: Indexing and Se

本文链接：https://www.haomeiwen.com/subject/imjvaxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Pandas tutorial: Indexing and Se

.loc

.iloc

slicing

Fast scalar value getting and setting

Boolean indexing

isin

Sample

where()

mask

Duplicate Data

Dictionary-like get() method

lookup()

index object

set_index()

Reset the index

Returning a view versus a copy

MultiIndex (hierarchical index)

the level labels

Data alignment and using reindex

Using slicers

Cross-section

Alignment

Swapping levels

Reordering levels

Sorting a MultiIndex

Take Methods

Index Types

Refenrence

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读