美文网首页
使用双连词生成随机文本

使用双连词生成随机文本

作者: 青椒rose炒饭 | 来源:发表于2019-06-27 14:56 被阅读0次

产生双连词(词对,两个单词的组合)使用nltk包下的ngram(text,2)切分出来的就是双连词。现阶段我理解的双连词就是能够将错误切分开的单词合在一起,例如Kunming university of science and technology(6元连词)就应该切分在一起。

生成随机文本

在条件概率的基础上,将所有的单词按照二元语法切分,切分之后构造条件频率分布,每次获取条件频率分布中出现次数最多的组合,然后输出连续的5个单词作为随机文本。

import nltk
from nltk.corpus import stopwords

#下面这个字符串算是训练数据吧
text = '' \
       'On May 25, the “Experience China” social practice named "Artificial Intelligence +X" or the Opening Ceremony of Social Practice and Cultural Experience Base for Overseas Students Supported by Chinese Government Scholarship, hosted by China Scholarship Council and undertaken by Kunming University of Science and Technology, was held in Kunming. KUST invited 100 overseas student representatives funded by Chinese government scholarship from countries including Thailand, Laos, Vietnam, Pakistan, Zambia, Madagascar to participate in the event which had received strong support from Yunnan Guorong Zhichuang Artificial Intelligence Industrial Park, Yunnan Langyi Network Technology Co., Ltd and Yunnan Nengtou Weishi Technology Co., Ltd.'\
'At the ceremony, the head of the School of International Education of KUST and the chairman of Guorong Zhichuang Artificial Intelligence Industrial Park jointly opened the ceremony of the social practice and cultural experience base for overseas students funded by Chinese Government Scholarship.'\
'At the ceremony, the overseas students experienced artificial intelligence, face recognition, AR and VR systems and were given details about the 115kV power grid modernization project in Vientiane, capital of Laos and the connection project between Northern Kachin State and 230KV State Grid in Myanmar. They also touched upon the information about the design, manufacture, transportation and service of the ancillary control system of PAKE hydropower station in Vietnam and how the industrial automation control systems have been applied in other countries. To break the ice, the overseas students played a game called "Drawing Something", through which they showed their fluent Chinese and deep understanding of the Chinese culture. This game has also helped them to further understand and appreciate the Chinese characters, objects and architecture from the perspective of Chinese aesthetics.'\
'This social practice truly impressed the overseas students with the remarkable achievements China has made in the development of high-tech enterprises and technological innovation. It also deepened the communication and contact between KUST and enterprises and offered more opportunities for overseas students to complete placements in enterprises and for schools and enterprises to cooperate. Omega, a Nigeria student from KUST, said that this is his fourth year in China and that after graduation, he wishes to continue to stay in China to do what he can for the Belt and Road Initiative.'\
'The "Experience China" social practice and cultural experience program is a series of activities organized by the China Scholarship Council since 2015 with the aim to enhance the Chinese government scholarship students’ knowledge and understanding about Chinese contemporary development and its culture while building a sense of identity. This event helped the overseas students in KUST further broaden their horizons by learning about the latest achievements of Chinese high-tech enterprises and their management methods and corporate culture. They also had a chance to admire China’s natural and human landscapes.'\
'By integrating the resources of this kind across the country, the "Social Practice and Cultural Experience Base for Overseas Students of Chinese Government Scholarship " is to select and establish a number of state-level educational platforms as social practice and cultural experience bases for overseas students in China. By attending various activities including visits, experiences, lectures, academic exchanges, cultural exhibitions and social practices organized by the bases, those students funded by the Chinese government scholarship are expected to learn about China\'s national conditions and cultures, promote their friendship with local people and make contributions to the national education and foreign affairs.'\
'Translated by:LUO Man, Faculty of Foreign Languages and Cultures'\
'Edited by: LI Junrong, Faculty of Foreign Languages and Cultures'\
'Source: School of International Education'\
'Issued by: Division of Overseas Cooperation (English)'\
'Edited by: KUST News Center' \
       ''

#获得英文得停止词
stop_words = stopwords.words('english')
#对文本进行分词处理
tokens = nltk.tokenize.regexp_tokenize(text ,r'\w+')
#去除文本分词中得停止词
tokens = [word for word in tokens if word not in stop_words]
#按照二元语法组合切分的单词
word_next = nltk.bigrams(tokens)
#构造单词的条件分布
cfd = nltk.ConditionalFreqDist(word_next)

def getNext(word):
    '''
    :param word:  单词
    :return: 查找出现最多组合返回单词,没有返回None
    '''
    try:
        word = cfd[str(word)].max()
    except Exception: #查找失败则抛出异常
        return None
    return word

word = input("请输入一个单词,程序预测它的下一个:")
#设置只查找5个单词
for i in range(5):
    nxt = getNext(word)
    print(nxt,end=" ")
    word = nxt
结果是有意思的文本

ConditionalFreqDist对象的一些操作:


cfd操作

相关文章

  • 使用双连词生成随机文本

    产生双连词(词对,两个单词的组合)使用nltk包下的ngram(text,2)切分出来的就是双连词。现阶段我理解的...

  • python必知必会6

    Python中生成随机整数、随机小数、0—1之间小数方法 生成随机整数使用random.randint()生成随机...

  • JS盒子模型常用属性

    JS盒子模型常用属性 CSS如下: HTML如下: 小技巧:自动生成随机文本 使用方法:在div元素中书写lore...

  • 使用random实现随机动画

    使用随机颜色,随机位置,实现动画 生成随机数方案 传统方式 使用随机数方式 特殊方式 使用random结合toSt...

  • 概率算法

    一. 生成随机数(可设定范围) 使用C#自带类System.Radom(int seed),来生成随机数,使用Gu...

  • 15 - shell之随机密码

    子串截取生产随机密码 使用命令生成随机密码uuidgenopenssl 使用随机设备文件(/dev/random、...

  • Scrapy管道及中间件个人常用配置示例

    中间件的使用示例: 随机UserAgent示例 使用fake_useragent生成随机UA 使用scrapy_f...

  • python opencv 生成验证码

    基本思路是使用opencv来把随机生成的字符,和随机生成的线段,放到一个随机生成的图像中去。 虽然没有加复杂的形态...

  • 未明学院:Faker库批量造数据,解决Python数据来源问题(

    Faker数据生成第二期来了! 今天使用Faker来生成随机时间,随机邮箱,随机浏览器头信息,还有随机文章句子。 ...

  • 数据可视化<第三天>

    随机漫步 在本节中,我们将使用python来生成随机漫步数据再使用matplotlib呈现数据随机漫步:每次行走都...

网友评论

      本文标题:使用双连词生成随机文本

      本文链接:https://www.haomeiwen.com/subject/vnulcctx.html