美文网首页python
python json解析注意事项

python json解析注意事项

作者: leyu | 来源:发表于2017-03-02 11:59 被阅读396次

    今天需要解析一个非常长的json字符串,中间碰到了各种问题,总结了一下所有的注意事项。
    首先我有一个字符串,原本非常长,我精简了一下,如下所示:

    >>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128, 
    'monitors': [{'use': 100, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,
    'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"
    

    这应该不是正规调用json.dumps()得到的字符串,而是用str(),原数据结构是由字典、列表、字符串、长整型的数据拼接起来的,还包含着中文的Unicode字符。即

    >>> origin={"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128, 
    "monitors": [{"use": 100, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L, 
    "monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}
    >>> json.dumps(origin)
    '{"product": "\\\\u62c9\\\\u52fe\\\\u7f51", 
    "monitors": [{"use": 100, "monitorweight": 10, 
    "monitorname": "\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22", 
    "monitorurl": "http://oss.lagou.com/"}], "downtime": 3.1280000000000001}'
    >>> str(origin)
    "{'product': u'\\\\u62c9\\\\u52fe\\\\u7f51', 
    'monitors': [{'use': 100, 'monitorweight': 10L, 
    'monitorname': u'\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22', 
    'monitorurl': u'http://oss.lagou.com'}], 'downtime': 3.128}"
    

    如果是json.dumps(s),直接就可以用json.loads(s)便可转换为对象。那么针对这种用str()的,便会出现各种问题。总结出现的如下几点问题:

    1. 字符串里的键值对必须是用双引号,不能用单引号。单引号会报:Expecting property name: line 1 column 1 (char 1)
    >>> s1="{'a':'a'}";s2='{"a":"a"}'
    >>> json.loads(s1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
        return _default_decoder.decode(s)
      File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
        obj, end = self._scanner.iterscan(s, **kw).next()
      File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
        rval, next_pos = action(m, context)
      File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
        raise ValueError(errmsg("Expecting property name", s, end))
    ValueError: Expecting property name: line 1 column 1 (char 1)
    >>> json.loads(s2)
    {u'a': u'a'}
    
    1. str()后不管原来的键值是单引号还是双引号,最终都会变成单引号,外层是双引号。所以需要替换为双引号
    >>> s={"a":"a"};str(s)
    "{'a': 'a'}"
    >>> s={'a':'a'};str(s)
    "{'a': 'a'}"
    
    >>> s={'a':'a'};s1=str(s)
    >>> s1
    "{'a': 'a'}"
    >>> s2=s1.replace('\'','\"')
    >>> s2
    '{"a": "a"}'
    >>> json.loads(s2)
    {u'a': u'a'}
    
    
    1. unicode字符串,str()后还会带u标志,需要去掉。
    >>> s={'a':u'拉勾网'}
    >>> s
    {'a': u'\u62c9\u52fe\u7f51'}
    >>> s1=str(s)
    >>> s1
    "{'a': u'\\u62c9\\u52fe\\u7f51'}"
    >>> json.loads(s1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
        return _default_decoder.decode(s)
      File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
        obj, end = self._scanner.iterscan(s, **kw).next()
      File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
        rval, next_pos = action(m, context)
      File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
        raise ValueError(errmsg("Expecting property name", s, end))
    ValueError: Expecting property name: line 1 column 1 (char 1)
    >>> 
    

    4.长整型数据,str()后还带有L标志,也需要处理。

    >>> s={"a":10L}
    >>> s1=str(s)
    >>> s1
    "{'a': 10L}"
    >>> s
    >>> s1='{"a":10L}'
    >>> json.loads(s1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
        return _default_decoder.decode(s)
      File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
        obj, end = self._scanner.iterscan(s, **kw).next()
      File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
        rval, next_pos = action(m, context)
      File "/usr/lib64/python2.6/json/decoder.py", line 193, in JSONObject
        raise ValueError(errmsg("Expecting , delimiter", s, end - 1))
    ValueError: Expecting , delimiter: line 1 column 7 (char 7)
    

    最后再回到之前那个复杂的字符串。

    >>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128, 'monitors': [{'use': 100L, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"
    >>> #替换单引号为双引号
    >>> s1=s.replace('\'','\"')
    >>> s1
    '{"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L,"monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}'
    >>> s2=s1.replace('u\"','\"')
    >>> #去掉unicode标志u
    >>> s2
    '{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
    >>> s3=s2.replace('..L','')
    >>> s3
    '{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
    >>> #去掉长整型的L
    >>> import re
    >>> s3=re.sub(r'(\d+)L','\g<1>',s2)
    >>> s3
    '{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100, "monitorurl": "http://oss.lagou.com","monitorweight": 10,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
    >>> #最终可以用json.loads()了。
    >>> json.loads(s3)
    {u'product': u'\u62c9\u52fe\u7f51', u'monitors': [{u'use': 100, u'monitorweight': 10, u'monitorname': u'\u804c\u4f4d\u641c\u7d22', u'monitorurl': u'http://oss.lagou.com'}], u'downtime': 3.1280000000000001}
    

    相关文章

      网友评论

        本文标题:python json解析注意事项

        本文链接:https://www.haomeiwen.com/subject/qhhqgttx.html