【ML】Word Embedding & One-Hot-Enc

作者: 盐果儿 | 来源:发表于2023-04-15 18:24 被阅读0次

2019-04-24
自然语言处理（NLP）知识整理及概述（三）
Word Embedding
Word embedding
Distributional Similarity
ACL2018，文本的简单建模，Baseline needs m
pytorch
词向量和矩阵分解的关系
Word Embedding总结
CNN和LSTM实现DNA结合蛋白二分类（python+kera

One-hot encoding is a technique used to represent categorical data, such as words or tokens in natural language processing (NLP). In one-hot encoding, each word or token is represented as a binary vector with a length equal to the size of the vocabulary, where only one element in the vector is set to 1 to represent the corresponding word, and all other elements are set to 0.

In contrast, word embeddings are dense vector representations of words that are learned from data using techniques like neural networks. Unlike one-hot encoding, word embeddings represent each word as a vector of continuous real numbers with a fixed length, where each element in the vector captures a different aspect of the meaning of the word. Word embeddings are often learned by predicting the surrounding words in a given text corpus.

Compare

1. Word embeddings are much more compact than one-hot encoding, as they typically have a much lower dimensionality. This makes them more efficient to store and process.

2. Word embeddings capture more semantic information about words than one-hot encoding, as they are able to represent relationships between words based on their usage in context.

3. Word embeddings can be used to initialize the weights of neural network models for NLP tasks, which can lead to better performance on these tasks.

Overall, while one-hot encoding is a simple and interpretable way to represent words in NLP, word embeddings offer a more powerful and flexible representation that can capture the nuances of language more effectively.