Lecture Plan
Lecture 1: Introduction and Word Vectors
- The course (10 mins)
- Human language and word meaning (15 mins)
- Word2vec introduction (15 mins)
- Word2vec objective function gradients (25 mins)
- Optimization basics (5 mins)
- Looking at word vectors (10 mins or less)
Course logistics in brief
Instructor: Christopher Manning
- Head TA and co-instructor: Abigail See
- TAs: Many wonderful people! See website
- Time: TuTh 4:30–5:50, Nvidia Aud (à video)
- Other information: see the class webpage:
-
http://cs224n.stanford.edu/
a.k.a.,http://www.stanford.edu/class/cs224n/ - Syllabus, office hours, “handouts”, TAs, Piazza
- Office hours start this Thursday
- Slides uploaded before each lecture
-
http://cs224n.stanford.edu/
What do we hope to teach?
- An understanding of the effective modern methods for deep learning
- Basics first, then key methods used in NLP: Recurrent networks, attention, etc.
- A big picture understanding of human languages and the difficulties in understanding and producing them
- An understanding of and ability to build systems
(in PyTorch)
for some of the major problems in NLP:- Word meaning, dependency parsing, machine translation, question answering
What’s different this year?
- Lectures (including guest lectures) covering new material:
character models, transformers, safety/fairness, multitask learn
. - 5x one-week assignments instead of 3x two-week assignments
- Assignments covering new material (NMT with attention,
ConvNets, subword modeling) - Using
PyTorch
rather thanTensorFlow
- Assignments due before class (4:30pm) not at midnight!
- Gentler but earlier ramp-up
- First assignment is easy, but due one week from today!
• No midterm
High-Level Plan for Problem Sets
- HW1 is hopefully an easy on ramp – an
IPython Notebook
- HW2 is pure Python
(numpy)
but expects you to do
(multivariate) calculus so you really understand the basics - HW3 introduces
PyTorch
- HW4 and HW5 use PyTorch on a
GPU
(Microsoft Azure)- Libraries like PyTorch, Tensorflow (and Chainer, MXNet, CNTK, Keras, etc.) are becoming the standard tools of DL
- For FP, you either
- Do the default project, which is
SQuAD question answering
- Open-ended but an easier start;agoodchoiceformost
- Propose a custom final project, which we approve
- Youwillreceivefeedbackfromamentor(TA/prof/postdoc/PhD) • Can work in teams of 1–3; can use any language
- Do the default project, which is
Lecture Plan
- The course (10 mins)
- Human language and word meaning (15 mins)
- Word2vec introduction (15 mins)
- Word2vec objective function gradients (25 mins)
- Optimization basics (5 mins)
- Looking at word vectors (10 mins or less)
1. How do we represent the meaning of a word?
How do we have usable meaning in a computer?
Problems with resources like WordNet
Representing words as discrete symbols
Problem with words as discrete symbols
Representing words by their context
Word vectors
Word meaning as a neural word vector – visualization
3. Word2vec: Overview
Word2Vec Overview
Word2Vec Overview
Word2vec: objective function
Word2vec: objective function
Word2Vec Overview with Vectors
Word2vec: prediction function
Training a model by optimizing parameters
To train the model: Compute all vector gradients!
4. Word2vec derivations of gradient
Chain Rule
Interactive Whiteboard Session!
Calculating all gradients!
Word2vec: More details
5. Optimization: Gradient Descent
Gradient Descent
Stochastic Gradient Descent
Lecture Plan
参考链接:
https://www.youtube.com/watch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=1
网友评论