美文网首页
机器学习笔记

机器学习笔记

作者: 前端混合开发 | 来源:发表于2019-08-08 23:15 被阅读0次

    两个故事:1.瑞雪兆丰年;2.沃尔玛,啤酒喝尿布

    为什么购买啤酒的人更有可能同时购买尿布呢?是因为有了小孩的男人比别人更爱喝啤酒?还是因为爱喝啤酒的男人比别人更顾家?这些臆测似乎都有些牵强。
    但是沃尔玛不需要关心规律背后的本质。对企业来讲,利用发现的规律,获得实实在在的利益就足够了。
    机器学习属于人工智能领域的一部分,其核心思想是让计算机程序随着数据样本积累,自动获得精准的判断和归纳能力。


    image.png

    现象在专业领域叫做训练集
    现象分析归纳的过程,叫做训练。

    image.png

    传播信息的方式有很多种,当你想要表达传递一个观点或者想法的时候,可以用音乐,电影,文字或者绘画。It begins with a special skill you likely take fro granted: language. All language allows you to take a thought or mental object and break it down into a series of conceptual chunks 所有的语言都允许你将一个想法或者精神客体分解成一连串的概念组块。These chunks are externalized using a series of signals or symbols这些组块用一连串的信号或符号具体化,人类用声音的变化和肢体动作来表达自己。而且我们的身体也是根据被称为DNA的微型书中的指令制造的。所有这些都是一个东西--信息的不同形式。Information is what allows one mind to influence another.信息就是一个头脑用来影响另一个头脑的东西。Information nomatter the form can be measured using a fundamental unit,无论信息的形式如何,都能用一个最基本的单位来衡量,就好像我们可以用kg来衡量物体重量,我们并不知道kg的来源或者怎样,但是我们知道一个人45kg是不胖,200kg可能就是胖了。
    Information too can be measured and compared using a measurement called entropy. 信息同样可以用一个称为“熵”的度量来测量和比价。Think of it as an information scale.把它看成信息秤。So no matter how Alice wants to communicate a specific message, text, music or movie, each would contain the same number of bits, though in different densities.所以无论用什么方式来表达这个观点,每个都会包含同样数目的”位“尽管密度不同。And a bit is linked to a very simple idea, the answer to a yes or no question. So how is information actually measured?信息是如何测量的?Does information have a speed limit, a maximum density?信息有速度限制么,有最大密度么?Information theory holds the exciting answer to these questions, 信息论给出了这些问题激动人心的答案,这是一个历经三千年发展的思想,It's an idea over three thousand years in the making.But before we can understand this, we must step back and explore perhaps the most powerful invention in humna history: the alphabet.必须回溯以往

    在五千年前,人类发展了他们将内在思想具体化的能力 humans develop the ability to externalize theire inner thoughts. (externalize这个词很好,很具象呀)。They began to communicate using language. At the time the universal written language was art. 那个时候,通用的书面语言是艺术。

    中间的这个“中”就是象形文字,因为“口”确实在中间位置。
    意思加意思等于新的意思
    Meaning plus meaning equals new meaning.

    象形文字是古埃及的书面语言 Hieroglyphics was the written language of the Ancient Egyptians.
    罗塞塔石碑 Rosetta Stone

    At the time the medium used to store the symbols was primarily rock. 那个时候,用来存储符号的媒介主要是石头。这很适合耐久的碑文。And this was ideal for durable inscriptions allowing messages to travel into the future.允许讯息传播到未来。Mobility was not a main concern when communicating messages in this way.当用这种方式交流时,活动性不是主要的考量。However a new physical medium for storing symbols was emerging at the time.然而当时一种存储讯息的新物理媒介出现了,他们种植一种作物叫纸莎(suo)草,它可以被切成条带,然后浸泡,编织到一起,然后压制,几天后它会变干,形成几乎没有重量的薄片,它是很理想的跨越空间传输讯息的媒介,而不是关注时间的更耐久的碑文。Now this shift towards cheap portable mediums for storing symbols coincided with the spread of writing into the hands of more people for new purposes. 存储符号媒介向这种廉价 便携方向的转变伴随着书写被更多人用于更多的用途.By escaping from the heavy medium of stone, thought gained lightness.由于逃离了沉重的媒介石头,思想得到了解放。 This led to a new writing system called demotic,around six hundred and fifty BC. 这导致了公元前650年左右一个新书写系统--通俗体的诞生

    2016剩下3个月的计划:
    掌握数据挖掘10个算法;
    复习/学习信息论,掌握最基本的公式;

    信息仅仅是从可能符号集中做出的选择,历来我们总是寻求更有效快速的方式把信息传输到更广阔的空间:information is just the selection from a collection of possible symbols and over time we always looked for faster more efficient ways of transporting information across greater and greater spaces.

    当看完一个视频,还记得多少信息?
    Alice和bob总共都做了些什么,举他俩的例子想说明什么?

    She realizes that the odds of each number being sent follows a simple pattern.她发现每个数字出现的概率有如下规律
    掷2个筛子:
    1 2 3 4 5 6
    1 2 3 4 5 6 7
    2
    3
    4
    5
    6

    加州理工学院公开课:机器学习和数据挖掘
    Machine learning is a very broad subject. It goes from very abstract theory to extreme practice as in rules of sum. And the inclusion of the topic in the course depends on the relevance to machine learning.这个课程活体的广泛性与其关联性有关 So some mathematics is useful. Because it gives you the conceptual framework. And then some practical aspects are useful.有些现实的例子也是恰到好处Because they give you the way to deal with real learning systme.处理现实学习体系的方法 If you look at the topics these are not meant to be separate topics for each lecture, they just highlight the main content of those lectures. But there is a story line that goes through it and let me tell you其实有一条贯通全课的线索,我来告诉你们
    It starts here with what is learning . Can we learn? How to do it? How to do it well? 在整个课程中有一个逻辑依存概念there is a logical dependency that goes through the course, and there is one exception to that logical dependency, one lecture which is the third one doesnot really belong here. It's a practical topic an d the reason I included it early on is because I needed to give you some tools to play around with to test the theoretical and conceptual aspects.如果要等到真正谈及属于这个话题即第二个话题learning models的时候才讲,那么,课程的开始就会是人的品味也太理论化了吧。 Too theoretical for people's tastes okay? And as you see if you look at the colors it is mostly red in the beginning and mostly blue in the end, so it starts building the concepts, the theory and then it goes on to the practical aspects.

    Lecture 1: the learning problem

    Outline:

      1. Example of machine learning
      1. Components of learning
      1. A simple model
      1. Types of learning
      1. puzzle
        What is learning?
        It's a fun example about movies that everybody watches,and then after that I'm going to abstract from the learning problem. The practical learning problem aspects that are common to all learning situations that you're going to face, and in abstracting them we will have the mathemaical formalization of the learning problem. And then we weil get our first algorithm for machine learning today. It's a very simple algorithm but it will fix the idea about .. What is the rule of algorithm in this case? And we will survey the types of learning
        这个有趣的例子与一部众所周知的电影有关,之后我会归纳出学习的问题,实际的学习问题很普遍,我们遇到的所有学习场景,通过对它们进行归纳就可以建立一个数学模型,一个与学习问题有关的模型。

    The example of machine learning that I'm going to start with is :
    How a viewer would rate a movie okay?
    10% improvement = 1 million dollar prize.
    The essence of machine learning:

      1. A pattern exists 存在一个模式
      1. We cannot pin it down mathematically
      1. We have data on it.
        Components of learning
        Metaphor: Credit approval信贷审批:
      1. Applicant information: age, gender, annual salary, years in residence, years in job, current debt……these fields are related to the credit worthiness.这些因素跟信誉价值相关
      1. Formalization: input: x (customer application) output: y (good/bad customer?) target function: f: x -> y (ideal credit approval formula)
        Data: (x1, y1),(x2, y2),(x3, y3),(x4, y4) ----> hypothesis g:X -> Y (formula to be used)


        image.png

    The hypothesis's formal name we are going to call the formula we get to approximate the target function. And if you look at the value of G it supposedly approximates F while F is unknown to us G is very much known actually we created it.
    我们称假设的正式名为为了得到目标函数的公式,F函数是未知的但是自创的G函数是已知的。令人欣慰的是它和X非常接近。
    That's the goal of learning.
    The target function is always F.
    我们创造的假设即最后的假设, it will be called G.
    The data will always have the notation there are capital end points which are the data set and the output is always Y。有指示意义的数据以大写字母结束,它们是一些数据集,输出通常用Y表示。

    Now that is an interesting problem and it's interesting for us because we watch movies . It is very interesting for a company that rents out movies and indeed a company which is Netfix wanted to improve.

    The in-house system by a mere 10 percent okay? 内部系统仅仅10%呢
    So they make recommendations when you log in, they recommend movies that they think you would like, so they think you would rate them highly, and they had a system and they want you to improve the system.
    So how much is a 10 percent of improvement and performance worth as a company?
    It was actually 1 million dollars. That was paid out to the first group that actually managed to get the 10 percent improvement. So you ask yourself okay 10 percent improvement is something like that. Why should that be worth a million dollars? It's because the recommendations that the movie company makes are spot on.完全正确 You will pay attention to the recommendation.你们会更加的看重它们提出的建议,you are likely to rent the movie that they recommend and they will make lots of money much more than they expect.

    This is a typical example for machine learning.

    For example machine learning has application in financial forecasting,you can imagine that the minutes improvement in financial forecasting.比如说机器学习在财务预测中的应用,你们可以想象一下财务预测时时都在改进。Can make a lot of money okay? So the fact that you can actually push the system, to be better using machine learning is a very attractive aspect of the technique in a wide spectrum of applications. 所以你们可以推行下这个体系,良好运用机器学习的前景相当诱人,从而使这项技术在更广泛的领域得到应用。
    So what did these guys do?

    They give the data and people started working on the problem using

    different algorithms until someone managed to get the prize.

    提供数据,人们对这些数据进行分析,运用不同的算法对同一个问题进行分析直到有人

    企图获得其中的奖励。
    Now if you look at the problem of rating a movie, it captures the essence

    of machine learning,它刚好阐释了机器学习的内涵,and the essence has 3

    components if you find these 3 components in the problem you have in your

    field. Then you know that machine learning is really an application tool.
    The first one is that a pattern exists, if a pattern didn't exist there

    will be nothing to look for.如果没有了模式就会无迹可寻
    So what is the pattern here?
    There is no question that the way a person rates a movie is related to how

    they rated other movies and is also related to how other people rated that

    movie. We know that there is a pattern to be discovered, however, we cannot

    really pin it down mathematically但是我们还不能用数学方式把它确定下来, I

    cannot ask you to write certain thing or their polynomial多项式 that

    captures how people rate movies. So the fact that there is a pattern and

    that we cannot pin it down mathematically, is the reason why we are going

    for machine learning.

    For learning from data, we couldnot write down a system on our own so we

    are going to depend on data in order to find a system.

    There is a missing component that is very important, if you don't have that

    you are out of luck. we have to have data we are learning from data.

    What data do you have?

    如果你们有数据我们就交易
    If you have data we are in business, if you don't you are out of luck.
    If you have these 3 components you are ready to apply machine learning.
    So here is a system let me try to focus on part of it.

    We are going to describe a viewer as a vector of factors, a profile if you

    like

    blockbuster大片还是fringe movies?边缘电影
    whether you like the lead actor or not.

    match -- mismatch

    这其实不是machine learning,如果要实现上面的方法,你需要亲自看电影,找出电影

    中的这些要素,然后采访观众,询问他们的taste,然后再进行匹配。

    Now the idea of machine learning is you don't have to do any of that, all you do is sit down and sip on your tea品茶 while the machine is doing something to come up with this figure on its own. So let's look at the learning approach, we know that the viewer will be a vector of different factors,我们知道观众是各种因素的矢量, and different components for every factor,而各个因素有不同的组成部分,so this vector will be different from one viewer to another,所以这个矢量因不同观众而异。
    And the way we said we are computing the rating is by simply taking this and combining them and getting the rating .

    Now what machine learning will do is reverse engineer that process。机器学习刚好是这个过程的方向工程。
    It starts from the rating,从评价开始,and then tries to find out what factors will be consistent with that rating。然后试图找出和评价相一致的因素。So think of it this way you start let's say with completely random factors所以假设从一个完全不同的随机因素出发,so you take these guys just random numbers from beginning to end.这里是从头到尾都是随机抽取的观众,and these guys random numbers from beginning to end.这里也是从头到尾随机抽取的电影。For every user and every movie that's your starting point ,每个用户每个电影都是你们的出发点。Obviously there is no chance in the world that when you get the anything that looks like the ratings.

    You apply for a credit card and the bank decides whether it's a good idea to extended a credit card for you or not.银行如果给你们办理了有盈利当然很开心,没有盈利当然就不好了。What they are going to do is to rely on hitorical records of previous customers and how their credit bahavior turn out and then try to reverse engineer the system.根据历史记录看看顾客以前的信用和行为如何,然后试图对体系进行反向研究。And when they get the system frozen,they are going to apply it to a future customer that's the deal.得到一个较为固定的体系后,就可以运用于以后情况相同的顾客身上。

    They don't necessarily uniquely determine it but they are related
    他们不一定有决定性作用但是还是有关联的

    观众给电影打分那个,就是根据已经有的评论,给每个观众做出一个线性回归公式,然后可以预测这个观众给其他电影的评分。

    Upside 好处
    downside坏处
    That from the practical point of view that's what you want to use a linear formula, I'm going to use a neural network, I'm going to use a support vector machine,从实际角度说这样做,神经网络,支持矢量的机器。So you are already dictating a set of hypothesis你们已经写出了一堆的假设。If you happen to be a brave soul and you don't want to restrict youself at all, 如果你们思维足够大胆的话,你们就不必局限于这些假设。Then your hypothesis set is the set of all possible hypothesis, so there is no loss of generality in putting it.之后你们的假设集就是所有可能的假设组合,这样就不会漏掉很多普遍性的例子。
    The hypothesis set will play a pivotal role in the theory of learning.假设集在学习的理论中起着关键的作用。 It will tell us can we learn and how well we learn and what not. Therefore having it as an explicit component详述的部分 and the problem statement will make the theory go through.
    Solution components
    The target function is not under your control.

    Implement a learning algorithm in real data if you want to实现一个算法

    The dimensional vector and each of them comes from the real number
    空间的每一个矢量都是实数

    Now you add them together and you add them in a linear form你把它们综合起来建立一个线性表,that's what makes it a preceptron.这就是感知器的由来。
    threshold临界值


    image.png

    Let's assume that the data you are working with are linearly separable我们假设所有的数据都是线性可分的。


    image.png
    Linearly separable in this case for example you have 9 data points,and if you look at the 9 data points some of them are good customers,and some of them are bad customers, and now you would like to apply the perceptron model in order to separate them correctly.线性可分指的是有9个数据点
    Will random weights give you any line?用随机权会产生直线吗?

    感知器学习算法:the perceptron learning algorithm


    image.png

    The premise of learning from which the different types came about, that's what is about.每种学习理论产生的前提
    用一系列观察来发掘潜在的过程,即我们说的目标函数
    统计学
    The underlying process is a probability distribution潜在的过程就是一个概率的分布问题
    And the observations are samples generated by that distribution观察结果就是从分布中抽出的样本,and you want to take the samples and predict what the probability distribution is.
    潜在的过程是一个概率分布,你观察的样本是概率分布产生的样例,通过这些样例来推测概率分布,之后就会有不同的学科以不同的名字陆续出现。


    image.png

    Supervised learning引导性学习
    Unsupervised learning非引导性学习
    Reinforcement learning 加强学习

    Supervised learning:
    Anytime you have the data given to you with the output explicitly given
    每次当你们拿到一堆明确的数据时
    Here is the previous customer and here is their credit behavior这是以前的客户这是他们的信用行为 It's as if a supervisor is helping you out in order to be able to classify the future on that's why it's called supervised. 引导人会帮助你们从而使你们能够区分出哪些是未来顾客,这就是引导性。
    比如,你有一个自动售货机,你们需要系统识别出硬币值You have physical mesurments of coins你们可以实际测量一下,

    image.png

    一定会存在一个目标函数吗?
    There is a separation between the target function目标函数之间有一定的区别 there is a pattern to detect and whether we can learn it这时就需要一个模型来验证我们是否可以找出
    Machine learning 是怎么和其他统计学联系起来的呢?
    Statistics 跟machine learning关系不大
    概率分布才有关系 distribution

    For example linear regression线性回归
    When we talk about linear regression it will have very few assumptions and the results will apply to a wide range because we didn't make too many assumptions说到线性回归时假设条件很少,但是结果可以运用到很多的领域,因为我们没有太多的假设
    When you study linear regression under statistics, there is a lot of mathematics that goes with it.当你在统计学中学到线性回归的时候,会用到很多的数学知识。Lots of assumptions.

    In general machine learning tries to make the least assumptions cover the most territory做最少的假设,覆盖最广的领域

    银行的信用数据从哪来?
    When you sue the bias data let's say the bank uses historical records
    当使用偏差数据的时候比如说银行使用历史记录时

    How much data do we need?
    Let me tell you the theoretical and the practical answer.我给你们一个理论的回答和一个实际的回答The theoretical answer is that this is exactly the cracks of the theory part that we are going to talk about
    理论的回答就是它刚好就是理论的断层部分

    是不是假设集越大处理效果就越好?
    As we mentioned learning is about be able to predict so you are using the data not to memorize it.
    学习算法的作用仅次于假设集
    The learning algorithm does play a role although it's a secondary role.
    想想感知器就知道算法的用处了,它试图最小化分类,这就是误差函数你们在试图最小化它
    Now the minimization aspect is an optimization question.现在最小化已经演变成了最优化的问题。
    一旦你们决定这就是误差函数,最小化它。

    相关文章

      网友评论

          本文标题:机器学习笔记

          本文链接:https://www.haomeiwen.com/subject/jsdqjctx.html