美文网首页
基于yelps评论数据的情感分析系统baseline

基于yelps评论数据的情感分析系统baseline

作者: 璆_ca09 | 来源:发表于2020-07-15 00:31 被阅读0次

    目标:

    建立一个简单的情感分析系统,支持Aspect-Based Sentiment Analysis,即不仅要考虑到 整体的情感,也需要考虑到用户对产品的每个方面(aspects)的评价也需要抽取出来。而且一个用户评价中可能会存在多个 aspect,比如”我对这款产品的电池比较满意,但它太贵了!” 从这句话里我们可以得出: ‘电池’:正面, ‘价格’:负面”。

    输出内容:

    Business Name: XXXXX

    Overall Rating: X

    Detailed Rating:

    aspect1: { rating: XXX, pos: [XXX], neg: [XXX]} 

    aspect2: {rating: XXX, pos: [XXX], neg: [XXX]} 

    aspect3: {rating: XXX, pos: [XXX], neg: [XXX]}

    aspect5: {rating: XXX, pos:[xxx], neg:[xxxx]}

    具体内容可见我最后的项目输出(引用内容)

    过程:

    1.准备: 数据集下载/ python工具包

    数据集:https://www.yelp.com/dataset/download

    yelp_academic_dataset_business.json: 用来描述一个 business,包括地理位置,属性,邮编等信息 

    yelp_academic_dataset_review.json: 一个用户对一个 business 的评价,这里包括具体的评价文本还有 stars。

    python: nltk(其中nltk的corpus语料最好挂梯子下载)

    2. 设计:项目结构

    --data/

                /yelp_academic_dataset_business.json

                /yelp_academic_dataset_review.json

    --main.py: 项目主流程

    --model_trainning.py:用于训练并生成判断评论极性的分类器

    --sentence.py:用于封装并标准化句子的类模块(LEMMATIZER, ASPECT_EXTRACTOR,WORD_TOKENIZER,POS_tag )

    --model.pickle:生成好的 分类器的序列化文件

    3. Tricks:方法实现和部分捷径

    1. 内存管理

    由于review评论文件比较大(9G左右),在吃进内存的策略上可能需要多注意,不然内存泄漏就会经常光顾你

    其中   1. Dataframe.append(dataframe )如果使用循环来append数据行,效率会非常非常低

                2. np.array强转dataframe内存消耗很大,请不要在load数据时使用

                3. python中大对象并不会自动启动gc机制,此处我为了保证内存不出现泄漏,使用策略:每隔50000行手动启动一次gc.collect() 来手动销毁循环中建立的字符串对象

    2.抽取核心实体- 特性(aspects)

    基于规则进行抽取,即只对评论中的名词及名词复数进行抽取,并形成aspect标签

    3. 抽取特性对应评论中的子文本

    由于大部分评论会很长,且评论会带有多个aspect,所以此处需要将多个aspect分离出来,并分别对其进行情感极性的判断。其中aspect对应着的子文本分离办法 为使用正则进行提取(相邻标签之间含有aspect单词的子文本)

    4. 模型的技术选型

    使用nltk的朴素贝叶斯做为baseline: 原因有三 1. 时间及内存开销少 2. 整个项目的业务建模方案,在未引入词向量和句向量的前提下,基于统计的贝叶斯方法直观上也能比较好的契合实际。3. 实验效果可行

    4. 待改进的,未实现的

    1. 文本的清洗:1. 大量非评论类的文本未清洗,例如链接,表情 2.单词的纠错统一

    2. aspects的抽取: 1. 模糊概念的实体统一 2. 动名词短语的捕获

    3. aspect对应短文本的抽取: 抽取规则最好是设定在相邻两个aspects之间的最长子文本(被常用标点所分割的)

    4. 相同aspect下正向及负向评论集合的内容多样性的考量

    产出结果范例

    {

        'id': '--1UhMGODdWsrMastO9DZw',

        'content': {

            'biz_name': 'The Spicy Amigos',

            'stars': '4.0',

            'summary': {

                'taco': {

                    'stars': 3.8747578276357726,

                    'aspect_pos_review': [

                        '. we will definitely be coming back to get our taco fix!',

                        ', tacos,',

                        '. the shrimp especiale taco is unreal.',

                        ', tried the tacos,',

                        'we were in the mood for tacos,',

                        'i have been in search of good grilled steak tacos here in calgary for 2 years.'],

                    'aspect_neg_review': [

                        ". the menus not extremely extensive so don't expect pages of choice but they have a great variety of tacos."]

                },

                'food': {

                    'stars': 4.249734391600525,

                    'aspect_pos_review': [

                        'if you are looking for authentic mexican street food,',

                        '.  very fresh and tasty mexican food.',

                        ', authentic mexican street food that gives appropriate portions relative to the prices.',

                        'great food,',

                        '! the decor is amazing and the food is to die for.',

                        ', but the food is still delicious.'],

                    'aspect_neg_review': [

                        ". the sorry excuses for mexican food i've found in canada so far."]

                },

                'price': {

                    'stars': 4.555049438951228,

                    'aspect_pos_review': [

                        '.  a little on the pricey side but worth it.',

                        ', authentic mexican street food that gives appropriate portions relative to the prices.',

                        ', prices are great for real cooked food.',

                        ', service and price.',

                        ', authentic mexican food for a great price.'],

                    'aspect_neg_review': [

                        '.  honestly i could have taken 1/2 home for later which is why i say good value for the price.']

                },

                'lunch': {

                    'stars': 4.374453193350831,

                    'aspect_pos_review': [

                        'fantastic spot for lunch with great value for your money.',

                        '! this spot opens at 11 and i can visualize it backed out the door for lunch.',

                        ", it might be a decent lunch spot but nothing spectacular and definitely wouldn't go again for dinner."],

                    'aspect_neg_review': [

                    ]

                }

            }

        }

    }

    {

        'id': '--6MefnULPED_I942VcFNA',

        'content': {

            'biz_name': "John's Chinese BBQ Restaurant",

            'stars': '3.0',

            'summary': {

                'pork': {

                    'stars': 3.4998970618511223,

                    'aspect_pos_review': [

                        'the bbq pork is very juicy and i only come here for that.',

                        ', is their bbq pork.',

                        '. the signature roasted pork was juicy and moist with the sweet tangy taste.',

                        ', minced pork meat pie,',

                        'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

                        ', the best roast pork and bbq pork on highway 7.'],

                    'aspect_neg_review': [

                        'service with boss lady is horrible but the bbq pork is really tasty!']

                },

                'bbq': {

                    'stars': 3.3527439562378683,

                    'aspect_pos_review': [

                        'the bbq pork is very juicy and i only come here for that.',

                        '.  the decor itself is your usual chinese bbq house/restaurant and is nothing to go crazy over,',

                        'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

                        ', the best roast pork and bbq pork on highway 7.',

                        ". they're well known for the bbq pork,",

                        '.  the honey deep fried oysters and bbq pork were excellent too.'],

                    'aspect_neg_review': [

                        'service with boss lady is horrible but the bbq pork is really tasty!',

                        'i walked by the restaurant more than 5 years ago when i witnessed from the window one of the employees drop a bbq chicken wing on the floor,']

                },

                'place': {

                    'stars': 2.9998965552911967,

                    'aspect_pos_review': [

                        '. the place was a little outdated,',

                        ', this is theeee place.',

                        '.this place has,',

                        'this place is a restaurant and a chinese bbq restaurant.',

                        '.  for the prices this place charges,',

                        '. compare to other places that sell bbq dishes,'],

                    'aspect_neg_review': [

                    ]

                },

                'chines': {

                    'stars': 2.9992501874531365,

                    'aspect_pos_review': [

                        'if you want a quick fix for a scrumptious char siu or chinese pork bbq,',

                        'this place is a restaurant and a chinese bbq restaurant.',

                        ', the chinese spare rib with onion,',

                        '. beef with gai lan (chinese broccoli?'],

                    'aspect_neg_review': [

                    ]

                }

            }

        }

    }

    {

        'id': '--7zmmkVg-IMGaXbuVd0SQ',

        'content': {

            'biz_name': 'Primal Brewery',

            'stars': '4.0',

            'summary': {

                'beer': {

                    'stars': 3.9826899536214895,

                    'aspect_pos_review': [

                        '. primal had great beer,',

                        ', fantastic beer (try the grim creeper!',

                        "if you're a harry potter fan then you will enjoy their variety of butter beers.",

                        ". \n\ni wasn't in the mood for beer,",

                        'the hubby and i stopped in for a quick beer before going home.',

                        ', the beer.'],

                    'aspect_neg_review': [

                        "the beer is horrible and dave doesn't know the difference between a cat and a dog.",

                        '.  i sat at the bar and asked about the type of beer i prefer.']

                },

                'breweri': {

                    'stars': 3.817834742296155,

                    'aspect_pos_review': [

                        ', especially if you combine it with a trip to some of the breweries in cornelius.',

                        ". it's also conveniently located not far up the same road from crafty beer guys where you can find some of primal's and other local breweries on tap.",

                        'on a recent tour of lake norman area breweries,',

                        '. \n\nalso i love it when breweries have "activities".'],

                    'aspect_neg_review': [

                    ]

                },

                'food': {

                    'stars': 3.6664629742792063,

                    'aspect_pos_review': [

                        '.\n\nthey have a food truck on site,',

                        '.  there are often a few food trucks present,',

                        ", out front there's a few umbrella-ed tables and a slew of full-sun seats over by the food truck/corn hole arena if you're so inclined.",

                        "! the fried pickle chips at the food truck outside were some of the best i've ever had (i've had a lot).",

                        '. there was no food truck when we were there but it was raining.',

                        ', food trucks,'],

                    'aspect_neg_review': [

                    ]

                },

                'place': {

                    'stars': 3.9998571479590015,

                    'aspect_pos_review': [

                        'this is exactly the type of place that huntersville has always needed.',

                        '. a little popcorn as i sit by the fireplace drinking my beer!',

                        '. relaxing atmosphere with a fire place!',

                        ', as they have a delightful fireplace and sitting area.',

                        ', a firepit and indoor fireplace,',

                        'primal brewery is a quaint and small place up in huntersville/cornelius.'],

                    'aspect_neg_review': [

                        "i'm a tad reluctant to write a review as i run the risk of spoiling one of the things i love about the place,",

                        "i've been meaning to write a review for this place for a while but now seems fitting."]

                }

            }

        }

    }

    {

        'id': '--9QQLMTbFzLJ_oT-ON3Xw',

        'content': {

            'biz_name': 'Great Clips',

            'stars': '3.0',

            'summary': {

                'hair': {

                    'stars': 2.8459349280824555,

                    'aspect_pos_review': [

                        '. affordable haircuts and pleasant hair cutters.',

                        ', i was greeted by the hair stylist and she asked,'],

                    'aspect_neg_review': [

                        '. would it be ok for me to bring her in to get her hair washed and cut?',

                        ". ask for a hair cut i got seated at stephanie's corals chair.",

                        ". ask for a hair cut i got seated at stephanie's corals chair."]

                },

                'cut': {

                    'stars': 2.7998133457769483,

                    'aspect_pos_review': [

                        '. affordable haircuts and pleasant hair cutters.'],

                    'aspect_neg_review': [

                        ". she told me that they do not cut extensions and that's great clips policy .",

                        'haircut was good and stylist nice but the manager was unpleasant from the time i stepped in until i left.',

                        '. would it be ok for me to bring her in to get her hair washed and cut?',

                        ", blah blah) she seemed uninterested on my answers (it's fine in a boring person) but she would get upset when i keep asking to cut more on the top,",

                        ". ask for a hair cut i got seated at stephanie's corals chair.",

                        ". ask for a hair cut i got seated at stephanie's corals chair."]

                },

                'time': {

                    'stars': 2.3330741028774584,

                    'aspect_pos_review': [

                        'tried this place one more time.'],

                    'aspect_neg_review': [

                        'haircut was good and stylist nice but the manager was unpleasant from the time i stepped in until i left.',

                        'service was quick even though it said waiting time to be 30 mins,',

                        '. she slaps some shampoo on the top of my head several times in the same spot and barely moved it around with her finger tips as if she was grossed out by me.',

                        '. she slaps some shampoo on the top of my head several times in the same spot and barely moved it around with her finger tips as if she was grossed out by me.']

                },

                'didnt': {

                    'stars': 0.0,

                    'aspect_pos_review': [

                    ],

                    'aspect_neg_review': [

                    ]

                }

            }

        }

    }

    相关文章

      网友评论

          本文标题:基于yelps评论数据的情感分析系统baseline

          本文链接:https://www.haomeiwen.com/subject/mhjxhktx.html