计算机基础导论学习总结中

作者: Nautilus1 | 来源:发表于2017-11-28 16:41 被阅读17次

计算机基础导论学习总结中
计算机基础导论学习总结上
计算机基础导论学习总结下
杂谈_碎片整合
1月19日 ITT培训基础导论
《百万讲师赋能训练营》1.19课程总结
计算机导论基础
ITT培训基础导论学习总结
（三）计算机学习系统
关于指针

第四单元

响应查询

本单元将建立一个可行的搜索引擎，可以抓取并建立一系列网页的索引，可以响应关键词查询。首先是建立网络语料库的索引，其结构将会是：列表的每一项是一个关键词，及该关键词对应的一系列URL，后面可能加一项指出共有多少页面含该关键词。

根据这个结构构建索引如下：

index = []

def add_to_index(index,keyword,url):
    for e in index:
        if e[0] == keyword:
            e[1].append(url)
            return index
    index.append([keyword,[url]])

add_to_index(index,'udacity','http://udacity.com')
add_to_index(index,'computing','http://acm.org')
add_to_index(index,'udacity','http://npr.org')
print index
#>>> [['udacity', ['http://udacity.com', 'http://npr.org']], 
#>>> ['computing', ['http://acm.org']]]

根据关键词查找：

index = [['udacity', ['http://udacity.com', 'http://npr.org']],
         ['computing', ['http://acm.org']]]

def lookup(index,keyword):
    for e in index:
        if e[0] == keyword:
            return e[1]
    return []

print lookup(index,'udacity')
#>>> ['http://udacity.com','http://npr.org']

将完整的页面添加进索引：

index = []

def add_to_index(index,keyword,url):
    for entry in index:
        if entry[0] == keyword and not url in entry[1]:               #保证一个关键词的一个URL只记录一次
            entry[1].append(url)
            return
    index.append([keyword,[url]])

def add_page_to_index(index,url,content):
    l = content.split()             #对页面内容粗略的切分
    for e in l:
        ok = 0                    #标记index中是否已存在该关键词
        for x in index:
            if e == x:
                x[1].append(url)
                ok = 1
        if not ok:
            index.append([e,[url]])
    return index

add_page_to_index(index,'fake.text',"This is a test")
print index
#>>> [['This', ['fake.text']], ['is', ['fake.text']], ['a', ['fake.text']],
#>>> ['test',['fake.text']]]

然后在第三单元定义的crawl_web函数中获取到content后加上一句add_page_to_index(index, page, content)即可生成索引，实际上第三单元给出的正是已加上索引的完整版代码。

本单元接下来介绍了因特网的基本原理，首先是页面爬取函数

def get_page(url):
    try:
        import urllib
        return urllib.urlopen(url).read()
    except:
        return ''

网络的定义：

两点：至少3个实体；即使两个实体不直接相连，仍可以通过第三个实体的传递消息而通信。

延迟与带宽：

之前调用系统自带的split函数切分页面内容出现一些问题，下面实现一个更为细致的切分函数

def split_string(source,splitlist):
    output = []
    atsplit = True                #标记是否到分隔符
    for e in source:
        if e in splitlist:           #splitlist是切分符号表，如标点符号等
            atsplit = True
        else:
            if atsplit:                     #前一个字符是分隔符，重新加入新字符
                output.append(e)
                atsplit = False
            else:                        #前一个字符不是分隔符，在最后一个字符串上加字符
                output[-1] = output[-1] + e
                
    return output

out = split_string("This is a test-of the,string separation-code!"," ,!-")
print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']

out = split_string("After  the flood   ...  all the colors came out.", " .")
print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

统计点击数，修改索引结构为：

def record_user_click(index,keyword,url):
    l = lookup(index, keyword)
    if l:                        #找到该关键词
        for e in l:
            if e[0] == url:
                e[1] += 1
                return index
        add_to_index(index, keyword, url)
    else:
        add_to_index(index, keyword, url)

def add_to_index(index, keyword, url):
    for entry in index:
        if entry[0] == keyword:
            for urls in entry[1]:               #去重
                if urls[0] == url:
                    return
            entry[1].append([url,0])
            return
    # not found, add new keyword to index
    index.append([keyword, [[url,0]]])

统计一段话中的词语个数

def count_words(page):
    s = 0
    ok = True
    for e in page:
        if e == ' ':
            ok = True
        else:
            if ok:
                s += 1
                ok = False
    return s

passage =("The number of orderings of the 52 cards in a deck of cards "
"is so great that if every one of the almost 7 billion people alive "
"today dealt one ordering of the cards per second, it would take "
"2.5 * 10**40 times the age of the universe to order the cards in every "
"possible way.")
print count_words(passage)
#>>>56

第五单元

程序怎样运行

首先要评估程序的好坏就要测定它运行时所占用的资源，即算法时间复杂度与空间复杂度。以下的程序以秒为单位测定一个程序运行的时间：

import time
def time_execution(code):
    start = time.clock()
    result = eval(code)
    run_time = time.clock() - start
    return result , run_time

def spin_loop(n):
    i = 0
    while i < n:
        i = i + 1
#测试循环1000次加法的时间
time_execution('spin_loop(1000)')
=>
(None, 0.000136000000000025)

其中eval函数的功能是：将字符串str当成有效的表达式来求值并返回计算结果。就可以用来在list,tuple,dict和string之间相互转化：

#字符串转换成列表
>>>a = "[[1,2], [3,4], [5,6], [7,8], [9,0]]"
>>>type(a)
<type 'str'>
>>> b = eval(a)
>>> print b
[[1, 2], [3, 4], [5, 6], [7, 8], [9, 0]]
>>> type(b)
<type 'list'>


#字符串转换成字典
>>> a = "{1: 'a', 2: 'b'}"
>>> type(a)
<type 'str'>
>>> b = eval(a)
>>> print b
{1: 'a', 2: 'b'}
>>> type(b)
<type 'dict'>



#字符串转换成元组
>>> a = "([1,2], [3,4], [5,6], [7,8], (9,0))"
>>> type(a)
<type 'str'>
>>> b = eval(a)
>>> print b
([1, 2], [3, 4], [5, 6], [7, 8], (9, 0))
>>> type(b)
<type 'tuple'>

构建一张大索引表，检测查找关键词的效率：

def add_to_index(index, keyword, url):
    for entry in index:
        if entry[0] == keyword:
            for urls in entry[1]:
                if urls[0] == url:
                    return
            entry[1].append([url])
            return
    # not found, add new keyword to index
    index.append([keyword, [url]])

def lookup(index, keyword):
    for entry in index:
        if entry[0] == keyword:
            return entry[1]
    return None

#将字符列表p中的元素连成字符串s
def make_string(p):
    s = ""
    for e in p:
        s += e
    return s

def make_big_index(size):
    index = []
    letters = ['a','a','a','a','a','a']
    while len(index) < size:
        word = make_string(letters)
        add_to_index(index, word, 'fake')   #每个关键词的URL都是fake
        #按字母从尾到头递增加入索引
        for i in range(len(letters) - 1, 0, -1):
            if letters[i] < 'z':
                letters[i] = chr(ord(letters[i]) + 1)
                break;
            else:
                letters[i] = 'a'
    
    return index 

#构建索引表，测试查找其中最后一个关键词的时间
index = make_big_index(10000)
print time_execution('lookup(index, "aaaoup")')
=>
(['fake'], 0.0017249999998512067)

然后介绍哈希表的思想，及好、坏哈希表的比较：

#坏哈希表，关键词分布不均
def bad_hash(key, size):
    return ord(key[0]) % size

#较好的哈希表实现
def hash_string(keyword,buckets):
    s = 0
    for e in keyword:
        s += ord(e)
    return  s % buckets

#哈希表测试函数，检测每个桶中关键词的分布情况
def test_hash_func(func, keys, size):
    ans = [0] * size
    key_used = []
    for w in keys:
        if w not in key_used:
            hv = func(w, size)
            ans[hv] += 1
            key_used.append(w)
            
    return ans

#测试
def get_page(url):
    try:
        import urllib
        return urllib.urlopen(url).read()
    except:
        return ''

words = get_page('http://www.gutenberg.org/cache/epub/1661/pg1661.txt').split()

cnt = test_hash_func(bad_hash, words, 12)
print cnt
=>
[730, 1541, 1055, 1752, 1784, 839, 1452, 2074, 1409, 754, 924, 899]

cnt = test_hash_func(hash_string, words, 12)
print cnt
=>
[1368, 1268, 1273, 1279, 1284, 1245, 1207, 1228, 1281, 1232, 1233, 1315]

cnt = test_hash_func(hash_string, words, 100)
print cnt
=>
[136, 127, 117, 137, 129, 149, 116, 126, 111, 128, 142, 131, 151, 129, 150, 124, 157, 144, 151, 150, 137, 105, 151, 144, 141, 153, 141, 185, 144, 154, 154, 163, 192, 159, 163, 190, 153, 177, 162, 175, 172, 166, 179, 164, 186, 167, 173, 144, 174, 167, 154, 164, 177, 179, 163, 171, 187, 162, 160, 181, 166, 161, 136, 154, 169, 156, 150, 147, 154, 164, 126, 173, 156, 165, 146, 151, 150, 145, 148, 152, 148, 148, 161, 140, 188, 150, 150, 121, 167, 123, 142, 132, 136, 132, 126, 141, 152, 135, 152, 162]

建立空的哈希表：

def make_hashtable(nbuckets):
    l = []
    for i in range( nbuckets):
        l.append([])
        
    return l

print make_hashtable(3)
=>
[[], [], []]

#按以下方式实现则存在问题
def make_hashtable_not(nbuckets):
    return [[]] * nbuckets
t = make_hashtable_not(3)
t[1].append(['yes',['https://udacity.com']])
print t
=>
[[['yes', ['https://udacity.com']]], [['yes', ['https://udacity.com']]], [['yes', ['https://udacity.com']]]]

这种方式创建的新列表是这个列表的三次复制，但不是三个副本，而是三个指向。外层列表的每一个元素都指向同一个空列表。

接下来要实现哈希表的查找更新等操作，表的结构为：

定义hashtable_get_bucket函数，找到所给关键词对应对哈希桶：

def hashtable_get_bucket(htable,keyword):
    l = len(htable)
    i = hash_string(keyword,l)
    return htable[i]


def hash_string(keyword,buckets):
    out = 0
    for s in keyword:
        out = (out + ord(s)) % buckets
    return out

def make_hashtable(nbuckets):
    table = []
    for unused in range(0,nbuckets):
        table.append([])
    return table

table = [[['Francis', 13], ['Ellis', 11]], [], [['Bill', 17],
['Zoe', 14]], [['Coach', 4]], [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]]

print hashtable_get_bucket(table, "Zoe")
#>>> [['Bill', 17], ['Zoe', 14]]

print hashtable_get_bucket(table, "Brick")
#>>> []

print hashtable_get_bucket(table, "Lilith")
#>>> [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]

定义 hashtable_add(htable,key,value)函数，在哈希表中添加给定关键词及其值。在上一个函数的基础上只需要在对应的桶的最后添加一项：

def hashtable_add(htable,key,value):
    # your code here
    hashtable_get_bucket(htable,key).append([key,value])
    return htable  
    
    
def hashtable_get_bucket(htable,keyword):
    return htable[hash_string(keyword,len(htable))]

def hash_string(keyword,buckets):
    out = 0
    for s in keyword:
        out = (out + ord(s)) % buckets
    return out

def make_hashtable(nbuckets):
    table = []
    for unused in range(0,nbuckets):
        table.append([])
    return table

table = make_hashtable(5)
hashtable_add(table,'Bill', 17)
hashtable_add(table,'Coach', 4)
hashtable_add(table,'Ellis', 11)
hashtable_add(table,'Francis', 13)
hashtable_add(table,'Louis', 29)
hashtable_add(table,'Nick', 2)
hashtable_add(table,'Rochelle', 4)
hashtable_add(table,'Zoe', 14)
print table
#>>> [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]], 
#>>> [['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]

定义hashtable_lookup(htable,key)函数，查找对应关键词的值。在前几个函数的基础上，首先调用hashtable_lookup(htable,key)判断表中是否有该词，在该关键词不在表中时调用hashtable_add(htable,key,value)插入；在表中时调用hashtable_get_bucket(htable,key)查找该关键词对应的桶，遍历找到后更新：

def hashtable_lookup(htable,key):
    l = hashtable_get_bucket(htable,key)
    for e in l:
        if e[0] == key:
            return e[1]
    return None


def hashtable_add(htable,key,value):
    bucket = hashtable_get_bucket(htable,key)
    bucket.append([key,value])


def hashtable_get_bucket(htable,keyword):
    return htable[hash_string(keyword,len(htable))]

def hash_string(keyword,buckets):
    out = 0
    for s in keyword:
        out = (out + ord(s)) % buckets
    return out

def make_hashtable(nbuckets):
    table = []
    for unused in range(0,nbuckets):
        table.append([])
    return table


table = [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
[['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]

print hashtable_lookup(table, 'Francis')
#>>> 13

print hashtable_lookup(table, 'Louis')
#>>> 29

print hashtable_lookup(table, 'Zoe')
#>>> 14

定义hashtable_update(htable,key,value)
函数，更新给定关键词的值：

def hashtable_update(htable,key,value):
    # Your code here
    x = hashtable_lookup(htable,key)
    if not x:
        hashtable_add(htable,key,value)
    else:
        l = hashtable_get_bucket(htable,key)
        for e in l:
            if e[0] == key:
                e[1] = value
        
    return htable

def hashtable_lookup(htable,key):
    bucket = hashtable_get_bucket(htable,key)
    for entry in bucket:
        if entry[0] == key:
            return entry[1]
    return None

def hashtable_add(htable,key,value):
    bucket = hashtable_get_bucket(htable,key)
    bucket.append([key,value])


def hashtable_get_bucket(htable,keyword):
    return htable[hash_string(keyword,len(htable))]

def hash_string(keyword,buckets):
    out = 0
    for s in keyword:
        out = (out + ord(s)) % buckets
    return out

def make_hashtable(nbuckets):
    table = []
    for unused in range(0,nbuckets):
        table.append([])
    return table


table = [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
[['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]

hashtable_update(table, 'Bill', 42)
hashtable_update(table, 'Rochelle', 94)
hashtable_update(table, 'Zed', 68)
print table
#>>> [[['Ellis', 11], ['Francis', 13]], [['Zed', 68]], [['Bill', 42], 
#>>> ['Zoe', 14]], [['Coach', 4]], [['Louis', 29], ['Nick', 2], 
#>>> ['Rochelle', 94]]]

使用Python自带的字典类型来实现哈希表，首先介绍字典类型，字符串、列表、字典三者的比较：

接下来就是用字典改写crawl_web、add_to_index、lookup三个函数。crawl_web的改动就是将index = [] 改为 index = {}。由于crawl_web调用add_page_to_index，add_page_to_index又调用add_to_index，故只需修改add_to_index：

改为：

lookup改为：

def lookup(index, keyword):
    if keyword in index:
        return index[keyword]
    return None

第五单元主要是修改前面的搜索引擎，使之更高效，能迅速反馈查询。计算机科学思想的核心是分析算法复杂度，设计更有效的数据结构。

习题集

定义缓存，比较斐波那契数列在有缓存和无缓存下的计算效率，无缓存：

import time
def time_execution(code):
    start = time.clock()
    result = eval(code)
    run_time = time.clock() - start
    return result , run_time

def cached_fibo(n):
    if n == 1 or n == 0:
        return n
    else:
        return cached_fibo(n - 1) + cached_fibo(n - 2)
        
print time_execution('cached_fibo(40)')     
#计算第40项，花费约52.5s
=>
(102334155, 52.544790000000006)

有缓存：

import time
def time_execution(code):
    start = time.clock()
    result = eval(code)
    run_time = time.clock() - start
    return result , run_time

def cached_execution(cache, proc, proc_input):
    # Your code here
    if proc_input in cache:
        return cache[proc_input]
    cache[proc_input] = proc(proc_input)
    return cache[proc_input]

# Here is an example showing the desired behavior of cached_execution:

def factorial(n):
    print "Running factorial"
    result = 1
    for i in range(2, n + 1):
        result = result * i
    return result

cache = {}

def cached_fibo(n):
    if n == 1 or n == 0:
        return n
    else:
        return (cached_execution(cache, cached_fibo, n - 1 )
               + cached_execution(cache,  cached_fibo, n - 2 ))
               
 
print time_execution('cached_execution(cache, cached_fibo,100)')
#计算第100项，花费远小于一秒，可见记忆化能避免大多数重复的计算
=>
(354224848179261915075L, 0.00022900000000447562)

移动字符：

#后移一个
def shift(letter):
    return chr( ord('a') + ( ord(letter) + 1- ord('a') )%26)


print shift('a')
#>>> b
print shift('n')
#>>> o
print shift('z')
#>>> a

#后移n个
def shift_n_letters(letter, n):
    l = ord(letter) + n
    l = (l - ord('a')) % 26
    return chr(ord('a') + l)

print shift_n_letters('s', 1)
#>>> t
print shift_n_letters('s', 2)
#>>> u
print shift_n_letters('s', 10)
#>>> c
print shift_n_letters('s', -10)
#>>> i

后移字符串

def shift(c, t):
    l = (ord(c) - ord('a') + t + 26)%26
    l = l + ord('a')
    return chr(l)
    
def rotate(s, t):
    # Your code here
    l = len(s)
    a = ''
    for i in range(l):
        if ord(s[i]) >= ord('a') and ord(s[i]) <= ord('z'):
            a += shift(s[i],t)
        else:
            a += s[i]
    return a
        

print rotate ('sarah', 13)
#>>> 'fnenu'
print rotate('fnenu',13)
#>>> 'sarah'
print rotate('dave',5)
#>>>'ifaj'
print rotate('ifaj',-5)
#>>>'dave'
print rotate(("zw pfli tfuv nfibj tfiivtkcp pfl jyflcu "
                "sv rscv kf ivru kyzj"),-17)
#>>> if your code works correctly you should be able to read this

计算机基础导论学习总结中
第四单元响应查询根据这个结构构建索引如下：根据关键词查找：将完整的页面添加进索引：然后在第三单元定义的c...
计算机基础导论学习总结上
课程大纲：从构建一个简单的搜索引擎项目出发，介绍构建过程中需要用到的技术，大致分为三个部分：爬取数据建立索引 ...
计算机基础导论学习总结下
第六单元如何拥有无穷力量本单元解决搜索引擎对给定查询只返回最佳页面的方法。这实际上是pagerank算法的思...
杂谈_碎片整合
数字媒体技术学习之路：基础学科:高数、线性代数、数据结构、计算机网络技术、数据库导论、数字媒体技术导论通识人文...
1月19日 ITT培训基础导论
1月19日韩颖老师主讲的《ITT培训基础导论》，课程学习完成之后总结如下：（一）我的学习内容总结： 1、清晰了 ...
《百万讲师赋能训练营》1.19课程总结
今天晚上是由韩颖老师主讲的《ITT培训基础导论》，课程学习完成之后总结如下：一、学到的内容总结：（一）清晰了ITT...
计算机导论基础
1.dataint string bool，etc2.operation+-*/，etc3.commentass...
ITT培训基础导论学习总结
一、清晰认识 1.通过基础导论学习对百万讲师赋能训练营有一个清晰的认识。 2.对it t系统构架有了初步的了解。 ...
（三）计算机学习系统
一、《计算机科学计算导论》阅读总结 1、网址：WWW.eingcmg.org/Lecture/toc.html/ ...
关于指针
《POINTERS ON C》一书总结指针以下方面:指针基础警告总结编程提示指针基础的总结计算机内存中的每一个...

计算机基础导论学习总结中

第四单元

响应查询

第五单元

程序怎样运行

习题集

相关文章