美文网首页程序员
每天进步一点点-tricks

每天进步一点点-tricks

作者: Klaas | 来源:发表于2016-02-25 21:04 被阅读67次

由于正在进行深度学习的研究,主要用的语言是python. 在实际写程序的过程中, 经常会遇到一些技巧性的东西,特此下来来并且不断更新, 如果有任何疑问, 麻烦在下方留言或者联系邮箱 strikerklaas@gmail.com.


one-hot vector

one-hot vector 在自然语言中处理非常重要, 常作为神经网络的输入, 有indexing的效果. 那么,实际情况中如何建立这样一个矩阵呢. 先考虑小的数据集. 比如有数据标记为两类0,1
one-hot vector is a term in NLP, as its name indicates, it is a vector where only one element is 1 and the others are 0s. Suppose that we have a vocabulary consists of 4000 words for text generation, there should exist 4000 unique one-hot vector for each word. For different tasks, there are different ways to initialize the vectors.

  • classification
    Suppose that there are only 2 classes: 0 and 1. The two one-hot vectors should be [1,0],[0,1]. suppose that we have six learning samples but they are store in an array like [0,1,0,1,1,0], so, we produce an eye matrix first and let the array selects which vector they belong to form a matrix includes all samples.
>>> import numpy as np
>>> x = np.eye(2) # Two types of vectors
>>> y = np.array([0,1,0,1,1,0]) # classes
>>> x
array([[ 1.,  0.],
       [ 0.,  1.]])
>>> y
array([0, 1, 0, 1, 1, 0])
>>> x[y] # By indexing, we generate a matrix for learning
array([[ 1.,  0.],
       [ 0.,  1.],
       [ 1.,  0.],
       [ 0.,  1.],
       [ 0.,  1.],
       [ 1.,  0.]])

float32 (theano)

The default floating point data type is float64, however, data must be tranferred to float32 to store in the GPU.

  • convert to float32
epilson = np.float32(0.01)
  • use shared statement
import theano
import theano.tensor as T
w = theano.shared((np.random.randn(input_dimension,output_dimension).astype('float32'), name='w')

MNIST dataset

The MNIST dataset is a universally-used dataset for digit recognition, its characters can be summed up as the following:

  1. train set:50,000, validation set:10,000,test set:10,000
  2. 28 x 28 pixels (each training example is represented as a 1-dimensional array whose length is 784.
    Now, we begin with opening the dataset in Python and try to optimize it to be used for GPU acceleration.
    <pre><code>
    import cPickle, gzip, numpy, theano

Load the dataset

f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

Next, store the data into GPU memory

def share_dataset(data_xy):
# use theano shared value form
data_x, data_y = data_xy
shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX))
shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX))
'''
Can also use the following syntax, it also works!
shared_x = theano.shared(data_x.astype('float32'))
shared_y = theano.shared(data_y.astype('float32'))
'''
# Since 'Y' should be intergers, not floats, we cast it
return shared_x, T.cast(shared_y, 'int32')

Now try it!

test_set_x, test_set_y = share_dataset(test_set)
valid_set_x, valid_set_y = share_dataset(valid_set)
train_set_x, train_set_y = share_dataset(train_set)

</code></pre>

代码块语法遵循标准markdown代码,例如:python@requires_authorizationdef somefunc(param1='', param2=0): '''A docstring''' if param1 > param2: # interesting print 'Greater' return (param2 - param1 + 1) or Noneclass SomeClass: pass>>> message = '''interpreter... prompt'''

相关文章

  • 每天进步一点点-tricks

    由于正在进行深度学习的研究,主要用的语言是python. 在实际写程序的过程中, 经常会遇到一些技巧性的东西,特此...

  • 目标与行动相结合

    每天进步一点点,记住方向,每天进步一点点!

  • 每天进步一点点

    每天进步一点点,成长足迹看得见。 每天进步一点点,走向成功是必然。 每天进步一点点,前进不止一小点。 每天进步一点...

  • 2017.8.9

    每天进步一点点 所谓的成长就是每天进步一点点,每周进步点点,每年进步一点点,然后观察复利效应,应该有不错的收益。 ...

  • 主动与被动的进步

    每天进步一点点,但是每天不会主动进步一点点。 每天做好这些事情,让自己每天都保持持续的进步。 随着时间的累积,年龄...

  • 集美与爱于一身的红玫瑰

    每天进步一点点,坚持就会进步

  • 临摹强化基本技能-如何看思维导图

    每天坚持一点点 每天进步一点点

  • 2018

    每天进步一点点。 每天开心一点点。

  • 每天一小步

    每天多做一点点,就是成功的开始;每天多创新一点点,就是领先的开始;每天多学一点点,就是进步的开始;每天多进步一点点...

  • 日常练习钢笔字第十九天

    今天写的有进步,每天进步一点点!

网友评论

    本文标题:每天进步一点点-tricks

    本文链接:https://www.haomeiwen.com/subject/mhzikttx.html