分割数据集的方法一

作者: 微斯人_吾谁与归 | 来源:发表于2019-05-22 17:18 被阅读0次

数据集分割方法
分割数据集的方法一
4种语义分割数据集Cityscapes上SOTA方法总结
数据集的分割与sklearn实现
CVPR2019|In Defense of Pre-train
数据集分割
基于Keras实现Kaggle2013--Dogs vs. Ca
常用数据集介绍及转换
2.封装kNN算法之数据分割
scikit-learn 中的交叉验证方法

手撕数据集

1.随机数

2.哈希表

使用工具

1.sklearn.model_selection

Signature: train_test_split(*arrays, **options)

Docstring:

Split arrays or matrices into random train and test subsets
将数组或矩阵分割为随机训练和测试子集

Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data
into a single call for splitting (and optionally subsampling) data in a
oneliner.

Read more in the :ref:User Guide <cross_validation>.

Parameters

arrays : sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse
matrices or pandas dataframes.
具有相同长度/形状的可索引项序列允许的输入是列表、numpy数组、scipy-sparse矩阵或dataframes

test_size :float, int or None, optional (default=0.25)
如果是float表示比例，如果是int表示test样本的个数，如果是None将所有数据设为训练集。默认是等于0.25

train_size : float, int, or None, (default=None)
类上

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

shuffle : boolean, optional (default=True)
在分割前是否重新洗牌，如果False, 那么 stratify 必须是None

stratify : array-like or None (default=None)
If not None, data is split in a stratified fashion, using this as
the class labels.

Returns

splitting : list, length=2 * len(arrays)
List containing train-test split of inputs.
versionadded:: 0.16
If the input is sparse, the output will be a
scipy.sparse.csr_matrix. Else, output type is the same as the
input type.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False)
[[0, 1, 2], [3, 4]]

File: d:\anaconda3\lib\site-packages\sklearn\model_selection_split.py
Type: function

2.from sklearn.model_selection import StratifiedShuffleSplit

Init signature:

StratifiedShuffleSplit(n_splits=10, test_size='default', train_size=None, random_state=None)

Docstring:

Stratified ShuffleSplit cross-validator

Provides train/test indices to split data in train/test sets.

This cross-validation object is a merge of StratifiedKFold and
ShuffleSplit, which returns stratified randomized folds. The folds
are made by preserving the percentage of samples for each class.

Note: like the ShuffleSplit strategy, stratified random splits
do not guarantee that all folds will be different, although this is
still very likely for sizeable datasets.

Read more in the :ref:User Guide <cross_validation>.

Parameters

n_splits : int, default 10
Number of re-shuffling & splitting iterations.

test_size : float, int, None, optional
If float, should be between 0.0 and 1.0 and represent the proportion
of the dataset to include in the test split. If int, represents the
absolute number of test samples. If None, the value is set to the
complement of the train size. By default, the value is set to 0.1.
The default will change in version 0.21. It will remain 0.1 only
if train_size is unspecified, otherwise it will complement
the specified train_size.

train_size : float, int, or None, default is None
If float, should be between 0.0 and 1.0 and represent the
proportion of the dataset to include in the train split. If
int, represents the absolute number of train samples. If None,
the value is automatically set to the complement of the test size.

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

Examples

>>> from sklearn.model_selection import StratifiedShuffleSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 0, 1, 1, 1])
>>> sss = StratifiedShuffleSplit(n_splits=5, test_size=0.5, random_state=0)
>>> sss.get_n_splits(X, y)
5
>>> print(sss)       # doctest: +ELLIPSIS
StratifiedShuffleSplit(n_splits=5, random_state=0, ...)
>>> for train_index, test_index in sss.split(X, y):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [5 2 3] TEST: [4 1 0]
TRAIN: [5 1 4] TEST: [0 2 3]
TRAIN: [5 0 2] TEST: [4 3 1]
TRAIN: [4 1 0] TEST: [2 3 5]
TRAIN: [0 5 1] TEST: [3 4 2]
File:           d:\anaconda3\lib\site-packages\sklearn\model_selection\_split.py
Type:           ABCMeta

网友评论

本文标题：分割数据集的方法一

本文链接：https://www.haomeiwen.com/subject/mdjuzqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

分割数据集的方法一

手撕数据集

1.随机数

2.哈希表

使用工具

1.sklearn.model_selection

Docstring:

Parameters

Returns

Examples

2.from sklearn.model_selection import StratifiedShuffleSplit

Init signature:

Docstring:

Parameters

Examples

相关文章

数据集分割方法

分割数据集的方法一

4种语义分割数据集Cityscapes上SOTA方法总结

数据集的分割与sklearn实现

CVPR2019|In Defense of Pre-train

数据集分割

基于Keras实现Kaggle2013--Dogs vs. Ca

常用数据集介绍及转换

2.封装kNN算法之数据分割

scikit-learn 中的交叉验证方法

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读