Will it fail CNNs (on MNIST)?

1、-"use the raw pixel values between[0,255]"

Correct. Almost all CNN's prefer to normalize pixel value normalized between [-1,1]

2、-"initialize all the CNN weights as 0"

Correct,Network weights should be initialized randomly

3、-"Use no intercept (i.e., Wx instead of Wx+b) in the fully connect layer"

No. Network with zero intercepts will still work

如果没有偏置的话，我们所有的分割线(Wx=0所代表的超平面就是决策边界)都是经过原点的

4、-“The batch size is too small (i.e., one sample per batch)”

No.Small batch size will still work, but make the optimization slower

5、-"The batch size is too big (i.e., use the whole dataset as one batch)"

correct. We will lose the "stochastic" factor by taking whole dataset as one batch, and the optimization will fall into bad local minimum

我们将失去“随机”因素，优化将陷入糟糕的局部最小值

6、-"Do not shuffle the data before training"

Usually correct. Random shuffling impress CNN a lot.

最明显的情况是，如果您的数据是按照它们的类/目标排序的，则需要对数据进行洗牌。在这里，您需要重新洗牌，以确保您的培训/测试/验证集能够代表数据的总体分布。

Suppose 假设 data is sorted in a specified order. For example a data set which is sorted base on their class. So, if you select data for training, validation, and test without considering this subject, you will select each class for different tasks, and it will fail the process.