美文网首页我爱编程
Pandas tutorial: Indexing and Se

Pandas tutorial: Indexing and Se

作者: 庞贝船长 | 来源:发表于2018-01-25 14:22 被阅读0次

.loc

is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found.

The .loc attribute is the primary access method. The following are valid inputs:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f' (note that contrary to usual python slices, both the start and the stop are included, when present in the index!)
  • A boolean array
  • A callable
.loc

NOTE

This will not modify df because the column alignment is before value assignment.

df.loc[:,['B', 'A']] = df[['A', 'B']]

The correct way is to use raw values

df.loc[:,['B', 'A']] = df[['A', 'B']].values

or just,

df[['B', 'A']] = df[['A', 'B']]

.iloc

is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

You can also assign a dict to a row of a DataFrame:

In [28]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})

In [29]: x.iloc[1] = dict(x=9, y=99)

In [30]: x
Out[30]: 
   x   y
0  1   3
1  9  99
2  3   5

Allowed inputs are:

  • An integer

  • A list or array of integers [4, 3, 0]

  • A slice object with ints 1:7

  • A boolean array

  • A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

slicing

  • []

With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels

With DataFrame, slicing inside of [] slices the rows.

[start: end: step]
[ ]

Fast scalar value getting and setting

If you only want to access a scalar value, the fastest way is to use the at and iat methods, which are implemented on all of the data structures.

Similarly to loc, at provides label based scalar lookups, while, iat provides integer based lookups analogously to iloc.

Boolean indexing

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

isin

Consider the isin method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want

Sample

where()

Selecting values from a Series with a boolean vector generally returns a subset of the data. To guarantee that selection output has the same shape as the original data, you can use the where method in Series and DataFrame.

Selecting values from a DataFrame with a boolean criterion now also preserves input data shape. where is used under the hood as the implementation. Equivalent is df.where(df < 0)

mask

mask is the inverse boolean operation of where.

Duplicate Data

  • duplicated

  • drop_duplicates

Dictionary-like get() method

lookup()

index object

set_index()

Reset the index

As a convenience, there is a new function on DataFrame called reset_index which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation to set_index

Returning a view versus a copy

chained indexing ?

MultiIndex (hierarchical index)

You can think of MultiIndex as an array of tuples where each tuple is unique.

All of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned.

the level labels

The method get_level_values will return a vector of the labels for each location at a particular level.

One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops” levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.

Data alignment and using reindex

Using slicers

pd.IndexSlice

Cross-section

The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.

Alignment

  • align

Swapping levels

The swaplevel function can switch the order of two levels

Reordering levels

The reorder_levels function generalizes the swaplevel function, allowing you to permute the hierarchical index levels in one step

Sorting a MultiIndex

.sort_index

The is_lexsorted() method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth.

Take Methods

.take

Index Types

  • MultiIndex
  • DatetimeIndex and PeriodIndex
  • TimedeltaIndex
  • CategoricalIndex
  • Int64Index and RangeIndex
  • Float64Index
  • IntervalIndex

Refenrence

  • MultiIndex / Advanced Indexing

  • Indexing and Selecting Data

相关文章

网友评论

    本文标题:Pandas tutorial: Indexing and Se

    本文链接:https://www.haomeiwen.com/subject/imjvaxtx.html