这里简单的使用最简单的fastai的处理方法,就达到了接近100%的准确率,还是不错的哦!
下面给出简单的代码分析。
1 设置训练和测试数据所在的文件夹,这里是kaggle竞赛的默认的input目录
data_folder = Path("../input")
2 加载训练数据的csv和测试数据的csv
train_df = pd.read_csv("../input/train.csv")
test_df = pd.read_csv("../input/sample_submission.csv")
3 根据csv,使用fastai的ImageList加载数据
test_img = ImageList.from_df(test_df, path=data_folder/'test', folder='test')
trfm = get_transforms(do_flip=True, flip_vert=True, max_rotate=10.0, max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75)
train_img = (ImageList.from_df(train_df, path=data_folder/'train', folder='train')
.split_by_rand_pct(0.005)
.label_from_df()
.add_test(test_img)
.transform(trfm, size=128)
.databunch(path='.', bs=64, device= torch.device('cuda:0'))
.normalize(imagenet_stats)
)
4 设置神经网络
learn = cnn_learner(train_img, models.densenet161, metrics=[error_rate, accuracy])
5 开始训练(基本的技巧见fastai的教程)
epoch | train_loss | valid_loss | error_rate | accuracy | time |
---|---|---|---|---|---|
0 | 0.043829 | 0.002492 | 0.000000 | 1.000000 | 01:41 |
1 | 0.054632 | 0.000278 | 0.000000 | 1.000000 | 01:29 |
2 | 0.025740 | 0.000703 | 0.000000 | 1.000000 | 01:30 |
3 | 0.007295 | 0.000020 | 0.000000 | 1.000000 | 01:29 |
4 | 0.003068 | 0.000012 | 0.000000 | 1.000000 | 01:29 |
6预测结果,并提交结果
preds,_ = learn.get_preds(ds_type=DatasetType.Test)
test_df.has_cactus = preds.numpy()[:, 0]
test_df.to_csv('submission20190525.csv', index=False)
网友评论