.loc
is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found.
The .loc attribute is the primary access method. The following are valid inputs:
- A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index. This use is not an integer position along the index) - A list or array of labels
['a', 'b', 'c']
- A slice object with labels
'a':'f'
(note that contrary to usual python slices, both the start and the stop are included, when present in the index!) - A boolean array
- A
callable
![](https://img.haomeiwen.com/i3209607/6f6c44b596ab38db.png)
NOTE
This will not modify df because the column alignment is before value assignment.
df.loc[:,['B', 'A']] = df[['A', 'B']]
The correct way is to use raw values
df.loc[:,['B', 'A']] = df[['A', 'B']].values
or just,
df[['B', 'A']] = df[['A', 'B']]
.iloc
is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
You can also assign a dict to a row of a DataFrame:
In [28]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})
In [29]: x.iloc[1] = dict(x=9, y=99)
In [30]: x
Out[30]:
x y
0 1 3
1 9 99
2 3 5
Allowed inputs are:
-
An integer
-
A list or array of integers [4, 3, 0]
-
A slice object with ints 1:7
-
A boolean array
-
A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)
slicing
[]
With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels
With DataFrame, slicing inside of []
slices the rows.
[start: end: step]
![](https://img.haomeiwen.com/i3209607/3a5ad1fb3efea0d2.png)
Fast scalar value getting and setting
If you only want to access a scalar value, the fastest way is to use the at
and iat
methods, which are implemented on all of the data structures.
Similarly to loc
, at
provides label based scalar lookups, while, iat
provides integer based lookups analogously to iloc
.
Boolean indexing
Another common operation is the use of boolean vectors to filter the data. The operators are: |
for or
, &
for and
, and ~
for not
. These must be grouped by using parentheses.
isin
Consider the isin method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want
Sample
where()
Selecting values from a Series with a boolean vector generally returns a subset of the data. To guarantee that selection output has the same shape as the original data, you can use the where method in Series and DataFrame.
Selecting values from a DataFrame with a boolean criterion now also preserves input data shape. where is used under the hood as the implementation. Equivalent is df.where(df < 0)
mask
mask is the inverse boolean operation of where.
Duplicate Data
-
duplicated
-
drop_duplicates
Dictionary-like get() method
lookup()
index object
set_index()
Reset the index
As a convenience, there is a new function on DataFrame called reset_index which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation to set_index
Returning a view versus a copy
chained indexing ?
MultiIndex (hierarchical index)
You can think of MultiIndex as an array of tuples where each tuple is unique.
All of the MultiIndex constructors accept a names
argument which stores string names for the levels themselves. If no names are provided, None
will be assigned.
the level labels
The method get_level_values
will return a vector of the labels for each location at a particular level.
One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops” levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.
Data alignment and using reindex
Using slicers
pd.IndexSlice
Cross-section
The xs
method of DataFrame additionally takes a level
argument to make selecting data at a particular level of a MultiIndex easier.
Alignment
align
Swapping levels
The swaplevel
function can switch the order of two levels
Reordering levels
The reorder_levels function generalizes the swaplevel function, allowing you to permute the hierarchical index levels in one step
Sorting a MultiIndex
.sort_index
The is_lexsorted()
method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth.
Take Methods
.take
Index Types
- MultiIndex
- DatetimeIndex and PeriodIndex
- TimedeltaIndex
- CategoricalIndex
- Int64Index and RangeIndex
- Float64Index
- IntervalIndex
网友评论