1. 准备数据集

此处的数据集来源于：Kaggle Planet: Understanding the Amazon from Space
下载后即可使用。

这个数据集的标签都位于train_v2.csv中，第一列是图片名称，第二列是标签，用空格分开，可以首先用pandas看看多标签是什么样的。

df = pd.read_csv(path/'train_v2.csv')
df.head()

对于fastAI，数据集的准备最重要的就是准备DataBunch，前面我们使用ImageDataBunch这个子类来轻松准备数据集，但此处使用ImageList类来实现。

首先建立src，指定图片所在的路径，labels所在的路径，以及如何获取labels

np.random.seed(42)
src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg')
       .split_by_rand_pct(0.2)
       .label_from_df(label_delim=' '))

在开始时设定random seed的目的是确保每一次运行时都会得到相同的train set和val set，主要作用于split_by_rand_pct()。

然后将数据集进行增强，包装成databunch，

data = (src.transform(tfms, size=128)
        .databunch().normalize(imagenet_stats))

这几个函数的作用可以解释为：

data = (ImageList.from_csv(planet, 'labels.csv', folder='train', suffix='.jpg')
        #Where to find the data? -> in planet 'train' folder
        .split_by_rand_pct()
        #How to split in train/valid? -> randomly with the default 20% in valid
        .label_from_df(label_delim=' ')
        #How to label? -> use the second column of the csv file and split the tags by ' '
        .transform(planet_tfms, size=128)
        #Data augmentation? -> use tfms with a size of 128
        .databunch())                          
        #Finally -> use the defaults for conversion to databunch

通用性的，可以用show_batch来查看多标签的图片和对应labels，labels位于上方，用；隔开。

2. 训练模型

在模型的构建方面，仍然采用resnet50作为基本模型，使用finetune的思路，但是结果展示的时候，并不是用一般的error_rate或accuracy，而是用accuracy_thresh，这是因为，accuracy获得的是argmax，即最大的probs对应的作为标签，但是对于多标签问题，结果并不唯一，所以用threshold的方式，只要大于这个threshold就认为是labels.

构建模型和metrics的代码为：

arch = models.resnet50
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)
learn = cnn_learner(data, arch, metrics=[acc_02, f_score])

对于分类问题，一般查看结果用F2得分，此处用fbeta里面的参数模型就是F2.
训练5个epochs后，得到结果：