if the basic technical idea is behind
deep learning behind your networks have
been around for decades why are they
only just now taking off in this video
let's go over some of the main drivers
behind the rise of deep learning because
I think this will help you that the spot
the best opportunities within your own
organization to apply these to over the
last few years a lot of people have
asked me Andrew why is deep learning
certainly working so well and when a
marsan question this is usually the
picture I draw for them let's say we
plot a figure where on the horizontal
axis we plot the amount of data we have
for a task and let's say on the vertical
axis we plot the performance on above
learning algorithms such as the accuracy
of our spam classifier or our ad click
predictor or the accuracy of our neural
net for figuring out the position of
other calls for our self-driving car it
turns out if you plot the performance of
a traditional learning algorithm like
support vector machine or logistic
regression as a function of the amount
of data you have you might get a curve
that looks like this where the
performance improves for a while as you
add more data but after a while the
performance you know pretty much
plateaus right suppose your horizontal
lines enjoy that very well you know was
it they didn't know what to do with huge
amounts of data and what happened in our
society over the last 10 years maybe is
that for a lot of problems we went from
having a relatively small amount of data
to having you know often a fairly large
amount of data and all of this was
thanks to the digitization of a society
where so much human activity is now in
the digital realm we spend so much time
on the computers on websites on mobile
apps and activities on digital devices
creates data and thanks to the rise of
inexpensive cameras built into our cell
phones accelerometers all sorts of
sensors in the Internet of Things we
also just have been collecting one more
and more data so over the last 20 years
for a lot of applications we just
accumulate
a lot more data more than traditional
learning algorithms were able to
effectively take advantage of and what
new network lead turns out that if you
train a small neural net then this
performance maybe looks like that
if you train a somewhat larger Internet
that's called as a medium-sized internet
to fall in something a little bit better
and if you train a very large neural net
then it's the form and often just keeps
getting better and better so couple
observations one is if you want to hit
this very high level of performance then
you need two things first often you need
to be able to train a big enough neural
network in order to take advantage of
the huge amount of data and second you
need to be out here on the x axes you do
need a lot of data so we often say that
scale has been driving deep learning
progress and by scale I mean both the
size of the neural network we need just
a new network a lot of hidden units a
lot of parameters a lot of connections
as well as scale of the data in fact
today one of the most reliable ways to
get better performance in the neural
network is often to either train a
bigger network or throw more data at it
and that only works up to a point
because eventually you run out of data
or eventually then your network is so
big that it takes too long to train but
just improving scale has actually taken
us a long way in the world of learning
in order to make this diagram a bit more
technically precise and just add a few
more things I wrote the amount of data
on the x-axis technically this is amount
of labeled data where by label data
I mean training examples we have both
the input X and the label Y I went to
introduce a little bit of notation that
we'll use later in this course we're
going to use lowercase alphabet to
denote the size of my training sets or
the number of training examples
this lowercase M so that's the
horizontal axis couple other details to
this Tigger
in this regime of smaller training sets
the relative ordering of the algorithms
is actually not very well defined so if
you don't have a lot of training data is
often up to your skill at hand
engineering features that determines the
foreman so it's quite possible that if
someone training an SVM is more
motivated to hand engineer features and
someone training even large their own
that may be in this small training set
regime the SEM could do better
so you know in this region to the left
of the figure the relative ordering
between gene algorithms is not that well
defined and performance depends much
more on your skill at engine features
and other mobile details of the
algorithms and there's only in this some
big data regime very large training sets
very large M regime in the right that we
more consistently see largely Ronettes
dominating the other approaches and so
if any of your friends ask you why are
known as you know taking off I would
encourage you to draw this picture for
them as well so I will say that in the
early days in their modern rise of deep
learning
it was scaled data and scale of
computation just our ability to Train
very large dinner networks
either on a CPU or GPU that enabled us
to make a lot of progress but
increasingly especially in the last
several years we've seen tremendous
algorithmic innovation as well so I also
don't want to understate that
interestingly many of the algorithmic
innovations have been about trying to
make neural networks run much faster so
as a concrete example one of the huge
breakthroughs in your networks has been
switching from a sigmoid function which
looks like this to a railer function
which we talked about briefly in an
early video that looks like this if you
don't understand the details of one
about the state don't worry about it but
it turns out that one of the problems of
using sigmoid functions and machine
learning is that there these regions
here where the slope of the function
would
gradient is nearly zero and so learning
becomes really slow because when you
implement gradient descent and gradient
is zero the parameters just change very
slowly and so learning is very slow
whereas by changing the what's called
the activation function the neural
network to use this function called the
value function of the rectified linear
unit our elu the gradient is equal to
one for all positive values of input
right and so the gradient is much less
likely to gradually shrink to zero and
the gradient here the slope of this line
is zero on the left but it turns out
that just by switching to the sigmoid
function to the rayleigh function has
made an algorithm called gradient
descent work much faster and so this is
an example of maybe relatively simple
algorithm in Bayesian but ultimately the
impact of this algorithmic innovation
was it really hope computation so the
regimen quite a lot of examples like
this of where we change the algorithm
because it allows that code to run much
faster and this allows us to train
bigger neural networks or to do so the
reason or multi-client even when we have
a large network roam all the data the
other reason that fast computation is
important is that it turns out the
process of training your network this is
very intuitive often you have an idea
for a neural network architecture and so
you implement your idea and code
implementing your idea then lets you run
an experiment which tells you how well
your neural network does and then by
looking at it you go back to change the
details of your new network and then you
go around this circle over and over and
when your new network takes a long time
to Train it just takes a long time to go
around this cycle and there's a huge
difference in your productivity building
effective neural networks when you can
have an idea and try it and see the work
in ten minutes or maybe ammos a day
versus if you've to train your neural
network for a month which sometimes does
happened
because you get a result back you know
in ten minutes or maybe in a day you
should just try a lot more ideas and be
much more likely to discover in your
network and it works well for your
application and so faster computation
has really helped in terms of speeding
up the rate at which you can get an
experimental result back and this has
really helped both practitioners of
neuro networks as well as researchers
working and deep learning iterate much
faster and improve your ideas much
faster and so all this has also been a
huge boon to the entire deep learning
research community which has been
incredible with just you know inventing
new algorithms and making nonstop
progress on that front so these are some
of the forces powering the rise of deep
learning but the good news is that these
forces are still working powerfully to
make deep learning even better Tech Data
society is still throwing up one more
digital data or take computation with
the rise of specialized hardware like
GPUs and faster networking many types of
hardware I'm actually quite confident
that our ability to do very large neural
networks or should a computation point
of view will keep on getting better and
take algorithms relative learning
research communities though continuously
phenomenal at innovating on the
algorithms front so because of this I
think that we can be optimistic answer
the optimistic the deep learning will
keep on getting better for many years to
come
so that let's go on to the last video of
the section where we'll talk a little
bit more about what you learn from this
course
网友评论