美文网首页
如何在python3下使用TextGrocery

如何在python3下使用TextGrocery

作者: 郭彦超 | 来源:发表于2019-10-11 15:43 被阅读0次

    TextGrocery是一款高效的短文本分类工具,后期我们会通过该工具训练文本规则实现给作品内容自动打标签; 该项目作者目前已不再维护此项目,最新版本只支持python2 ,为了在python3也能使用,需做如下修改

    首先第一步通过 pip 安装TextGrocery

    pip install tgrocery
    # 该项目作者已不再维护,最新版是0.14
    

    找不到module

    • No module named ‘converter’
      converter 不要使用第三方的,TextGrocery安装路径下有,修改init文件,在converter 前加 "."
    #1、修改 /home/bigdata/anaconda3/lib/python3.7/site-packages/tgrocery/__init__.py 为
    from .classifier import *
    from .converter import *
    
    #2、修改./site-packages/tgrocery/classifier.py 加 “.”
    from .converter import GroceryTextConverter
    from .learner import *
    from .base import *
    
    
    • No module named ‘cPickle’
      python2 中的cPickle模块在python3中改名了,先安装pickle5 在修改converter文件
    pip install pickle5
    
    vi ./site-packages/tgrocery/converter.py
    将 import cPickle  改为 import pickle5 as cPickle
    
    
    • No module named ‘base’
    #修改 site-packages/tgrocery/converter.py 在base前加“.”
    import .base
    
    

    print函数在python3中有调整(需加括号)

    # 修改site-packages/tgrocery/.base.py
     print( self.draw_table(
                zip(
                    ['%.2f%%' % (s * 100) for s in self.accuracy_labels.values()],
                    ['%.2f%%' % (s * 100) for s in self.recall_labels.values()]
                ),
                self.accuracy_labels.keys(),
                ('accuracy', 'recall')
      ) ) 
    
    

    NameError: name ‘unicode’ is not defined

    python3中将unicode换成了str,将 site-packages/tgrocery/classifier.py中所有出现的unicode进行替换

    TypeError: The argument should be plain text

    注释掉下面的语句

    # vi site-packages/tgrocery/classifier.py
     if not isinstance(text,str):
                   raise TypeError('The argument should be plain text')
    
    

    修改jieba.cache目录为当前安装目录

    # vi site-packages/jieba/__init__.py
    self.tmp_dir = "/home/bigdata/anaconda3/lib/python3.7/site-packages/jieba/"
    
    

    'dict' object has no attribute 'iteritems'

    在 site-packages/tgrocery/converter.py 将所有的 iteritems 替换为 items

    大功告成、官方实例运行如下

    >>> from tgrocery import Grocery
    >>> grocery = Grocery('sample')
    >>> train_src = [
    ...     ('education', '名师指导托福语法技巧:名词的复数形式'),
    ...     ('education', '中国高考成绩海外认可 是“狼来了”吗?'),
    ...     ('sports', '图文:法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'),
    ...     ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与')
    ... ]
    >>> grocery.train(train_src)
    Building prefix dict from the default dictionary ...
    Dumping model to file cache /home/bigdata/anaconda3/lib/python3.7/site-packages/jieba/jieba.cache
    Loading model cost 0.595 seconds.
    Prefix dict has been built succesfully.
    *
    optimization finished, #iter = 3
    Objective value = -1.092381
    nSV = 8
    <tgrocery.Grocery object at 0x7ffedbea5290>
    >>> grocery.predict('考生必读:新托福写作考试评分标准')
    <tgrocery.base.GroceryPredictResult object at 0x7ffed68e9610>
    >>> grocery.predict('考生必读:新托福写作考试评分标准').accuracy_labels
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'GroceryPredictResult' object has no attribute 'accuracy_labels'
    >>> grocery.predict('考生必读:新托福写作考试评分标准').dec_values
    {'education': 0.03393735426359166, 'sports': -0.033937354263591644}
    
    

    内容自动标注demo

    规则打标签系统

    相关文章

      网友评论

          本文标题:如何在python3下使用TextGrocery

          本文链接:https://www.haomeiwen.com/subject/euotmctx.html