第一课:Introduction
学习这门课先从这门课的历史开始 官方笔记 cs231n.github.io
the majority bits flying around the internet are actually visual data.这些visual data很难去理解,有时我们称之为网络中的暗物质
网络中visual data产生的速度快,需要有技术来理解这些data
the history of vision:Evolution's Big Bang,543 million years,B.C.那个时候 the earth was mostly water,there were a few species of animals floating
around in the ocean. Animals didn't move around much there they don't have eyes or anything when food swims by they grab them if the food didn't swim by they just float around.但是有件非常重要的事发生在540 million years ago. From fossil studies zoologists found out within a very short
period of time - ten million years - the number of animal species just exploded. 有人提出一个非常令人信服的原因,动物有了眼睛!捕食从此就开始proactive,some predators went after prey and prey have to escape from predators so the evolution or onset of vision started a evolutionary arms race and animals had to evolve quickly in order to survive as a species --- 这是biology perspective的quick history
第二课:Image Classification Pipeline
这个课主要用的是python和numpy 关于这两个的精简教程在http://cs231n.github.io/python-numpy-tutorial/
numpy的详细一点教程 https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html
![](https://img.haomeiwen.com/i2907791/240bb4fcac4b2a27.png)
这里的rank 1 是指 单个数组,如[1,2,3,4] rank 2 是指 以数组为元素的数组 [[1,2,3,4]]
所以对rank 1的数组 进行transpose 没有任何事发生
![](https://img.haomeiwen.com/i2907791/91ee0a76c4c73304.png)
![](https://img.haomeiwen.com/i2907791/870e408150fd21c4.png)
对于不同shape的数组进行算术操作,我感觉不同的shape就是指rank 1 和rank 2的数组
![](https://img.haomeiwen.com/i2907791/2f8d9b1fecdb2af1.png)
![](https://img.haomeiwen.com/i2907791/61005567e617a149.png)
关于broadcasting的General Broadcasting Rules https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
![](https://img.haomeiwen.com/i2907791/a491edffacf8fbb4.png)
Image Classification:A core task in computer vision 对于机器来说是一个非常难的问题
输入一个图,然后给你一些标签,输出图对应的标签(assign it one of these fixed category labels)
the computer really is representing the image as gigantic grid of numbers 比如800 * 600 * 3,假如输入是一张猫的图片,机器很难从这些数字里面distill the cat-ness. We refer to this problem as semantic gap
为什么这个一个Hard Problem
Because you can change the picture in very small,subtle ways that will cause this pixel grid to change entirely,our algorithms need to be robust to this
not only viewpoint is one problem,another is illumination.There can be different lighting conditions going on in the scene.Whether the cat is appearing in this very dark,moody scene,or very bright sunlit scene,it's still a cat and our algorithms need to be robust to that.还有猫会有不同的姿势transformation
![](https://img.haomeiwen.com/i2907791/a4aee589a7355a02.png)
Occlusion:where you might only see part of a cat,like,just the face,or in extreme example,just a tail peeking out from under the couch cushion.
![](https://img.haomeiwen.com/i2907791/a484abb68aca4b82.png)
Background Clutter:where maybe the foreground object of the cat could actually look quite similar in appearance to the background
![](https://img.haomeiwen.com/i2907791/67c969ca96e19bca.png)
Intraclass variation:one notion of cat-ness,actually spans a lot of different visual appearances.And cats can come in different shapes and sizes and colors and ages.
![](https://img.haomeiwen.com/i2907791/e125d37bfe7a17c6.png)
no obvious way to hard-code the algorithm for recognizing a cat,or other classes Unlike e.g. sorting a list of numbers,我们要用的是data-driven approach 我们用到两个函数 一个是train 输入是图片和标签 输出是模型 另一个是predict 输入是模型 输出是预测
我感觉笔记就用cs231n.github.io/classfication 就行了
网友评论