机器学习中的标准化,归一化全面总结

作者: 婉妃 | 来源:发表于2019-02-26 10:26 被阅读0次

机器学习中的标准化,归一化全面总结
Dogs vs. Cats比赛——代码调试&模型优化笔记
scikit_learn学习笔记八——scikit_learn标
Stata--标准化、归一化
数据预处理
机器学习面试之归一化与标准化
python中归一化、标准化模型保存与加载
标准化和归一化小记
用scikit-learn做数据预处理
问卷数据的标准化

1. 常见方法

"标准化"和"归一化"这两个中文词要指代四种Feature scaling(特征缩放)方法

Rescaling

Mean normalization

Standardization

image

Scaling to unit length

2. 区别和联系

归一化

特点

对不同特征维度的伸缩变换的目的是使各个特征维度对目标函数的影响权重是一致的，即使得那些扁平分布的数据伸缩变换成类圆形。这也就改变了原始数据的一个分布。

好处：

1 提高迭代求解的收敛速度
2 提高迭代求解的精度

标准化

特点

对不同特征维度的伸缩变换的目的是使得不同度量之间的特征具有可比性。同时不改变原始数据的分布。

好处

1 使得不同度量之间的特征具有可比性，对目标函数的影响体现在几何分布上，而不是数值上
2 不改变原始数据的分布

举例

根据人的身高和体重预测人的健康指数

假设有如下原始样本数据是四维的（当然一般不会有这么无聊的数据）

从上面两个坐标图可以看出，样本在数据值上的分布差距是不一样的，但是其几何距离是一致的。而标准化就是一种对样本数据在不同维度上进行一个伸缩变化（而不改变数据的几何距离），也就是不改变原始数据的信息（分布）。这样的好处就是在进行特征提取时，忽略掉不同特征之间的一个度量，而保留样本在各个维度上的信息（分布）。

从采用大单位的身高和体重这两个特征来看，如果采用标准化，不改变样本在这两个维度上的分布，则左图还是会保持二维分布的一个扁平性；而采用归一化则会在不同维度上对数据进行不同的伸缩变化（归一区间，会改变数据的原始距离，分布，信息），使得其呈类圆形。虽然这样样本会失去原始的信息，但这防止了归一化前直接对原始数据进行梯度下降类似的优化算法时最终解被数值大的特征所主导。归一化之后，各个特征对目标函数的影响权重是一致的。这样的好处是在提高迭代求解的精度。

3. 拓展

归一化后，计算欧氏距离，等价于余弦值啊，证明：两个向量x,y, 夹角为A，经过归一化，他们的欧氏距离 $D=\sqrt{(x-y)^2} =\sqrt{ x^2+y^2-2|x||y|\cos A }= \sqrt{2-2\cos A}$ ，也就是说 $D=\sqrt{2(1-cosA)}$ 。
在文本聚类中，一般是用欧氏距离还是余弦值？从stackoverflow这个帖子看，对于稀疏向量(文本向量显然是稀疏的)，一般用cosine比较好 clustering - Euclidean distance is usually not good for sparse data?
从信息检索的角度来理解以下：
假设有两篇文档d1,d2 ,文档向量的每一维度是tf-idf值。为了求d1,d2的相似度，一种思路是直接对两个向量在各个维度的上的差异进行累加，比如求每个维度差的平方和，也就是欧式距离，也相当于求两个文档向量的差向量的大小 $\|d_1-d_2\|$ 。然而如果有两篇内容相似的文档，一篇很长，一篇很短，那么由于长文档的词项频率tf比短文档大，就会导致得到的 $\|d_1-d_2\|$ 偏大，也就不能很好的度量相似度，为了消除长度的负面影响，就对长度进行归一化处理，计算单位向量之间的距离，然后进行归一化处理之后，欧式距离其实和余弦存在着等价关系。
所以余弦相似度和欧氏距离的区别就在于，余弦相似度消除了向量长度的影响。
不过有点疑惑，如果仅仅是为了消除长度的影响，直接对tf利用文档长度进行归一化(比如BM25就是这样)，然后再利用欧式距离来度量，似乎也是可行的。但如果用向量长度来消除文档长度的影响，向量长度还包括了idf的信息，而idf又是与文档长度无关的，似乎又不应该考虑idf。
简而言之，需要考虑scaling区别的用Euclidean Distance，否则Cosine Similarity得到的相似度度量更稳定，实际应用场景下后者适用的居多。

4. 为什么需要对一个vector做normalizing?

Q1.

For any vector V = (x, y, z), |V| = sqrt(x*x + y*y + z*z) gives the length of the vector.

When we normalize a vector, we actually calculate V/|V| = (x/|V|, y/|V|, z/|V|).

It is easy to see that a normalized vector has length 1. This is because:

| V/|V| | = sqrt((x/|V|)*(x/|V|) + (y/|V|)*(y/|V|) + (z/|V|)*(z/|V|))
          = sqrt(x*x + y*y + z*z) / |V|
          = |V| / |V|
          = 1

Hence, we can call normalized vectors as unit vectors (i.e. vectors with unit length).

Any vector, when normalized, only changes its magnitude, not its direction. Also, every vector pointing in the same direction, gets normalized to the same vector (since magnitude and direction uniquely define a vector). Hence, unit vectors are extremely useful for providing directions.

Note however, that all the above discussion was for 3 dimensional Cartesian coordinates (x, y, z). But what do we really mean by Cartesian coordinates?

Turns out, to define a vector in 3D space, we need some reference directions. These reference directions are canonically called i, j, k (or i, j, k with little caps on them - referred to as "i cap", "j cap" and "k cap"). Any vector we think of as V = (x, y, z) can actually then be written as V = xi + yj + zk. (Note: I will no longer call them by caps, I'll just call them i, j, k). i, j, and k are unit vectors in the X, Y and Z directions and they form a set of mutually orthogonal unit vectors. They are the basis of all Cartesian coordinate geometry.

There are other forms of coordinates (such as Cylindrical and Spherical coordinates), and while their coordinates are not as direct to understand as (x, y, z), they too are composed of a set of 3 mutually orthogonal unit vectors which form the basis into which 3 coordinates are multiplied to produce a vector.

So, the above discussion clearly says that we need unit vectors to define other vectors, but why should you care?

Because sometimes, only the magnitude matters. That's when you use a "regular" number (something like 4 or 1/3 or 3.141592653 - nope, for all you OCD freaks, I am NOT going to put Pi there - that shall stay a terminating decimal, just because I am evil incarnate). You would not want to thrown in a pesky direction, would you? I mean, does it really make sense to say that I want 4 kilograms of watermelons facing West? Unless you are some crazy fanatic, of course.

Other times, only the direction matters. You just don't care for the magnitude, or the magnitude just is too large to fathom (something like infinity, only that no one really knows what infinity really is - All Hail The Great Infinite, for He has Infinite Infinities... Sorry, got a bit carried away there). In such cases, we use normalization of vectors. For example, it doesn't mean anything to say that we have a line facing 4 km North. It makes more sense to say we have a line facing North. So what do you do then? You get rid of the 4 km. You destroy the magnitude. All you have remaining is the North (and Winter is Coming). Do this often enough, and you will have to give a name and notation to what you are doing. You can't just call it "ignoring the magnitude". That is too crass. You're a mathematician, and so you call it "normalization", and you give it the notation of the "cap" (probably because you wanted to go to a party instead of being stuck with vectors).

BTW, since I mentioned Cartesian coordinates, here's the obligatory XKCD*D](https://img.haomeiwen.com/i13256158/5a86fb76d6508484.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

This is a very good explanation. In fact, I always love these kind of questions because it produces this kind of answer and some of the people here always explain the jargon on its own without explaining the 'why' on why its called. I am developer and get caught up by jargons (which I don't care at all), but sometimes there are some point where the reason on their naming matters to me more that what it does. For example (Intending vs Intended methods in Espresso unit test) and (map vs flatMap methods in reactive). – Neon Warge J

Q2

Reading Godot Game Engine documentation about unit vector, normalization, and dot product really makes a lot of sense. Here is the article:

Unit vectors

Ok, so we know what a vector is. It has a direction and a magnitude. We also know how to use them in Godot. The next step is learning about unit vectors. Any vector with magnitude of length 1 is considered a unit vector. In 2D, imagine drawing a circle of radius one. That circle contains all unit vectors in existence for 2 dimensions:

enter image description here

So, what is so special about unit vectors? Unit vectors are amazing. In other words, unit vectors have several, very useful properties.

Can’t wait to know more about the fantastic properties of unit vectors, but one step at a time. So, how is a unit vector created from a regular vector?

Normalization

Taking any vector and reducing its magnitude to 1.0 while keeping its direction is called normalization. Normalization is performed by dividing the x and y (and z in 3D) components of a vector by its magnitude:

var a = Vector2(2,4)
var m = sqrt(a.x*a.x + a.y*a.y)
a.x /= m
a.y /= m

As you might have guessed, if the vector has magnitude 0 (meaning, it’s not a vector but the origin also called null vector), a division by zero occurs and the universe goes through a second big bang, except in reverse polarity and then back. As a result, humanity is safe but Godot will print an error. Remember! Vector(0,0) can’t be normalized!.

Of course, Vector2 and Vector3 already provide a method to do this:

a = a.normalized()

Dot product

OK, the dot product is the most important part of vector math. Without the dot product, Quake would have never been made. This is the most important section of the tutorial, so make sure to grasp it properly. Most people trying to understand vector math give up here because, despite how simple it is, they can’t make head or tails from it. Why? Here’s why, it’s because...

The dot product takes two vectors and returns a scalar:

var s = a.x*b.x + a.y*b.y

Yes, pretty much that. Multiply x from vector a by x from vector b. Do the same with y and add it together. In 3D it’s pretty much the same:

var s = a.x*b.x + a.y*b.y + a.z*b.z

I know, it’s totally meaningless! You can even do it with a built-in function:

var s = a.dot(b)

The order of two vectors does not matter, a.dot(b) returns the same value as b.dot(a).

This is where despair begins and books and tutorials show you this formula:

enter image description here

And you realize it’s time to give up making 3D games or complex 2D games. How can something so simple be so complex? Someone else will have to make the next Zelda or Call of Duty. Top down RPGs don’t look so bad after all. Yeah I hear someone did pretty will with one of those on Steam...

So this is your moment, this is your time to shine. DO NOT GIVE UP! At this point, this tutorial will take a sharp turn and focus on what makes the dot product useful. This is, why it is useful. We will focus one by one in the use cases for the dot product, with real-life applications. No more formulas that don’t make any sense. Formulas will make sense once you learn what they are useful for.

Siding The first useful and most important property of the dot product is to check what side stuff is looking at. Let’s imagine we have any two vectors, a and b. Any direction or magnitude (neither origin). Does not matter what they are, but let’s imagine we compute the dot product between them.

var s = a.dot(b)

The operation will return a single floating point number (but since we are in vector world, we call them scalar, will keep using that term from now on). This number will tell us the following:

If the number is greater than zero, both are looking towards the same direction (the angle between them is < 90° degrees). If the number is less than zero, both are looking towards opposite direction (the angle between them is > 90° degrees). If the number is zero, vectors are shaped in L (the angle between them is 90° degrees).

enter image description here

So let’s think of a real use-case scenario. Imagine Snake is going through a forest, and then there is an enemy nearby. How can we quickly tell if the enemy has seen discovered Snake? In order to discover him, the enemy must be able to see Snake. Let’s say, then that:

Snake is in position A. The enemy is in position B. The enemy is facing towards direction vector F.

enter image description here

So, let’s create a new vector BA that goes from the guard (B) to Snake (A), by subtracting the two:

var BA = A - B

enter image description here

Ideally, if the guard was looking straight towards snake, to make eye to eye contact, it would do it in the same direction as vector BA.

If the dot product between F and BA is greater than 0, then Snake will be discovered. This happens because we will be able to tell that the guard is facing towards him:

if (BA.dot(F) > 0):
    print("!")

Seems Snake is safe so far.

Siding with unit vectors Ok, so now we know that dot product between two vectors will let us know if they are looking towards the same side, opposite sides or are just perpendicular to each other.

This works the same with all vectors, no matter the magnitude so unit vectors are not the exception. However, using the same property with unit vectors yields an even more interesting result, as an extra property is added:

If both vectors are facing towards the exact same direction (parallel to each other, angle between them is 0°), the resulting scalar is 1. If both vectors are facing towards the exact opposite direction (parallel to each other, but angle between them is 180°), the resulting scalar is -1. This means that dot product between unit vectors is always between the range of 1 and -1. So Again...

If their angle is 0° dot product is 1. If their angle is 90°, then dot product is 0. If their angle is 180°, then dot product is -1. Uh.. this is oddly familiar... seen this before... where?

Let’s take two unit vectors. The first one is pointing up, the second too but we will rotate it all the way from up (0°) to down (180° degrees)...

enter image description here

While plotting the resulting scalar!

enter image description here

Aha! It all makes sense now, this is a Cosine function!

We can say that, then, as a rule...

The dot product between two unit vectors is the cosine of the angle between those two vectors. So, to obtain the angle between two vectors, we must do:

var angle_in_radians = acos( a.dot(b) )

What is this useful for? Well obtaining the angle directly is probably not as useful, but just being able to tell the angle is useful for reference. One example is in the Kinematic Character demo, when the character moves in a certain direction then we hit an object. How to tell if what we hit is the floor?

By comparing the normal of the collision point with a previously computed angle.

The beauty of this is that the same code works exactly the same and without modification in 3D. Vector math is, in a great deal, dimension-amount-independent, so adding or removing an axis only adds very little complexity.

This was a really good article. I think they have since edited and cut out many portions of it.

references

机器学习中的标准化,归一化全面总结
1. 常见方法 "标准化"和"归一化"这两个中文词要指代四种Feature scaling(特征缩放)方法 Res...
Dogs vs. Cats比赛——代码调试&模型优化笔记
1 图像预处理 1.1 标准化和归一化相关资料机器学习面试之归一化与标准化 - 简书CNN 入门讲解：什么是标准...
scikit_learn学习笔记八——scikit_learn标
数据标准化（Standardization）与归一化（Normalization）在机器学习领域中，不同评价指...
Stata--标准化、归一化
由来标准化、归一化是我们经常遇到的需求，如下式子标准化归一化整体标准化和归一化分组标准化和归一化
数据预处理
机器学习输入的特征数据进行训练时需要进行标准化、归一化，使数据的分布符合正态分布模型（生活中多数数据都是此模型），...
机器学习面试之归一化与标准化
在机器学习的面试中，数据是否需要归一化和标准化是个常见问题。之所以常见，是因为它有很多暗坑，每个暗坑都可以考察应聘...
python中归一化、标准化模型保存与加载
python中归一化、标准化模型保存与加载
标准化和归一化小记
首先看一下标准化和归一化的公式：归一化标准化归一化和标准化的区别：归一化是将样本的特征值转换到同一量纲下把数...
用scikit-learn做数据预处理
数据预处理是进行机器学习的必要环节，对原始数据进行加工，比如标准化、归一化、二进制化、特征编码、插补缺失值、生成多...
问卷数据的标准化
1 Normalization Method（标准化 / 归一化） 1.1 归一化方法（Normalization...