One-Hot Encoding

作者: ramblelily | 来源:发表于2017-12-27 16:57 被阅读42次

参考:
What is One Hot Encoding? Why And When do you have to use it?
preprocessing categorical features

1. 一组数据经过 One-Hot Encoding 处理后的结果,可以清楚的看出One-Hot Encoding 具体的做的事情。
One-Hot Encoding 处理前
One-Hot Encoding 处理后

这个过程可以用这句话概括:

This estimator transforms each categorical feature with m possible values into m binary features, with only one active.

2. 为什么需要 One-Hot Encoding

对于类别,在向量化的时候会编码成数字,由于类别之间没有明确的数值关系,编码产生的数字,会默认给类编加上数值关系,如下所述:

Let me explain: What this form of organization presupposes is VW > Acura > Honda based on the categorical values. Say supposing your model internally calculates average, then accordingly we get, 1+3 = 4/2 =2. This implies that: Average of VW and Honda is Acura. This is definitely a recipe for disaster. This model’s prediction would have a lot of errors.

One-Hot Encoding 实际将类别信息二进制化, 如果属于相应类别,相应值为 1, 否则为 0, 这样避在编码类别时,引入无关的数值关系。

相关文章

网友评论

    本文标题:One-Hot Encoding

    本文链接:https://www.haomeiwen.com/subject/qbbugxtx.html