DL基础知识

作者: EdwardMa | 来源:发表于2018-08-11 13:36 被阅读0次

DL基础知识
微信scheme
第三方常用scheme
android studio 3.0下载地址
Markdown学习
DL4J中文文档/调优与训练/可视化
linux
半知不解
控糖第四天
android studio安装

摘自DL面试问答：【博客】【B站视频】

What is Deep Learning?
DL是ML的一个子领域，关注使用深度人工神经网络来进行特征提取和表示，它有很多应用：计算机视觉，语音识别，自然语言处理等等。

Deep learning is an area of machine learning focus on using deep (containing more than one hidden layer) artificial neural networks, which are loosely inspired by the brain. The idea dates back to the mid 1960s, Alexey Grigorevich Ivakhnenko published the first general, working deep learning network. Deep learning is applicable over a range of fields such as computer vision, speech recognition, natural language processing.

What is a Neural Network? Why are deep networks better than shallow ones?

什么是神经网络？

Both shallow and deep networks are capable of approximating any function. For the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks are able to create deep representations, at every layer, the network learns a new, more abstract representation of the input.

What is a Multilayer Perceptron (MLP)?

什么是多层感知机

What is Data Normalization and why do we need it?

什么是数据标准化

Data normalization is a very important preprocessing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation.

What is a Boltzmann Machine?

什么是玻尔兹曼机？

What is the role of Activation Functions in neural network?

The goal of an activation function is to introduce non-linearity into the neural network so that it can learn more complex function. Without it, the neural network would be only able to learn function which is a linear combination of its input data.

什么是激活函数？

What is a cost function?

Cost function tells us how well the neural network is performing. Our goal during training is to find parameters that minimize the cost function.

What is the Gradient Descent?
Gradient descent is an optimization algorithm used in machine learning to learn values of parameters that minimize the cost function. It’s an iterative algorithm, in every iteration, we compute the gradient of the cost function with respect to each parameter and update the parameters of the function via the following.
$\Theta := \Theta – \alpha\frac{d}{\partial\Theta}J(\Theta)$

什么是梯度下降

What do you understand by Backpropagation?
Backpropagation is a training algorithm used for a multilayer neural networks. It moves the error information from the end of the network to all the weights inside the network and thus allows for efficient computation of the gradient.

The backpropagation algorithm can be divided into several steps:

1. Forward propagation of training data through the network in order to generate output.
1. Use target value and output value to compute error derivative with respect to output activations.
1. Backpropagate to compute the derivative of the error with respect to output activations in the previous layer and continue for all hidden layers.
1. Use the previously calculated derivatives for output and all hidden layers to calculate the error derivative with respect to weights.
1. Update the weights.
BP的仿真demo

什么是反向传播

What is the difference between Feedforward Neural Network and Recurrent Neural Network?

前馈网络

image.png

What are some applications of Recurrent Neural Network?
What are the Softmax and ReLU functions?

什么是Softmax

什么是ReLU?

What are hyperparameters?
Hyperparameters as opposed to model parameters can’t be learned from the data, they are set before the training phase.

Learning rate
It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it can overshoot the minima). It’s considered to be the most important hyperparameter.
Number of epochs
Epoch is defined as one forward pass and one backward pass of all training data.
Batch size
The number of training examples in one forward/backward pass.

What will happen if learning rate is set too low or too high?

学习率太低或太高的后果

What is Dropout and Batch Normalization?

Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of these models share weights. It’s a form of model averaging.

Dropout

Batch normalize

What is the difference between Batch Gradient Descent and Stochastic Gradient Descent?

image.png

Explain Overfitting and Underfitting and how to combat them.

image.png

image.png

How are weights initialized in a network?

Weight initialization is a very important step. Bad weight initialization can prevent a network from learning. Good initialization can lead to quicker convergence and better overall error. Biases can be generally initialized to zero. The general rule for setting the weights is to be close to zero without being too small.

不要全0初始化！ As a result of setting weights in the network to zero, all the neurons at each layer are producing the same output and the same gradients during backpropagation.
The network can’t learn at all because there is no source of asymmetry between neurons. That is why we need to add randomness to weight initialization process.

image.png

What are the different layers in CNN?

image.png

What is Pooling in CNN and how does it work?

image.png

Explain the following three variants of gradient descent: batch, stochastic and mini-batch?

Stochastic Gradient Descent： Uses only single training example to calculate the gradient and update parameters.
Batch GD: Calculate the gradients for the whole dataset and perform just one update at each iteration.
Mini-batch GD: Mini-batch gradient is a variation of stochastic gradient descent where instead of single training example, mini-batch of samples is used. It’s one of the most popular optimization algorithms.

What are the benefits of mini-batch gradient descent?

Computationally efficient compared to stochastic gradient descent.
Improve generalization by finding flat minima.
Improving convergence, by using mini-batches we approximating the gradient of the entire training set, which might help to avoid local minima.

What is a model capacity?
Ability to approximate any given function. The higher model capacity is the larger amount of information that can be stored in the network.
What is a convolutional neural network?

Convolutional neural networks, also known as CNN, are a type of feedforward neural networks that use convolution in at least one of their layers. The convolutional layer consists of a set of filter (kernels). This filter is sliding across the entire input image, computing dot product between the weights of the filter and the input image. As a result of training, the network learns filters that can detect specific features.

What is an autoencoder?

Autoencoder is artificial neural networks able to learn representation for a set of data (encoding), without any supervision. The network learns by copying its input to the output, typically internal representation has smaller dimensions than input vector so that they can learn efficient ways of representing data. Autoencoder consist of two parts, an encoder tries to fit the inputs to an internal representation and decoder converts the internal state to the outputs.

什么是suto-encoder?

image.png

What are some limitations of deep learning?

Deep learning usually requires large amounts of training data.
Deep neural networks are easily fooled.
Successes of deep learning are purely empirical, deep learning algorithms have been criticized as uninterpretable “black-boxes”.
Deep learning thus far has not been well integrated with prior knowledge (Deep Learning: A Critical Appraisal-by Gary Marcus).