美文网首页
python如何实现 数据的attribute access:

python如何实现 数据的attribute access:

作者: 9_SooHyun | 来源:发表于2023-03-11 15:39 被阅读0次

    1. data's [attribute access] comparison between go and python

    本人总结了3个主要差异

    • 载体结构:结构化 vs 非结构化
      go: 通常,Go的数据载体是struct,是结构化的pure data container。
      python: Python的数据载体通常是dict,是非结构化的,等于用go 的map去承载数据……这是无法想象的,因为你根本不知道map里面有什么k-v。要使用结构化的数据载体,就需要定义数据类,然后创建类实例并赋值

    • 载体创建:灵活 vs 略生硬
      go: struct可以任意组合,快速创建新结构体。通过 即时创建/即时组合 得到nested struct,可以很方便地将某些灵活的api返回值直接整体映射。但Python无法方便地做到这一点
      python: 任何通过定义class实现面向对象设计的编程语言,在定义nested class上都是不太方便的

    • 载体使用(数据交互):直接 vs 间接
      go: 对struct使用Marshal/Unmarshal可以直接进行数据交互:json bytes <-> struct
      python: 为了实现value的attribute access,还需要dict to PyObject 这个数据结构化过程(如dacite.from_dict):json bytes/string <-> dict <-> obj

    对比下来,go显然更加简洁易用,因为它抓住了本质:

    program is no more than logic on the data

    2. attribute access for data in python

    #! /usr/bin/env python3
    # -*- coding: utf-8 -*-
    
    # Using dicts is a weak-sauce way to do object-oriented programming.
    # Dictionaries are a very poor way to communicate expectations to readers of your code.
    # Using a dictionary, how can you clearly and reusably specify that
    # some dictionary key-value pairs are required, while others aren't?
    
    # This module provides several methods to `transfer nested dictionaries to nested data structures` 
    # so we can have `attribute access(aka, dot access)` to values.
    
    # Address is a simple class
    class Address(object):
        def __init__(self, street: str, number: int):
            self.street = street
            self.number = number
    
    # Person is a person with one main address
    class Person(object):
        def __init__(self, name: str, addr: Address):
            self.name = name
            self.addr = addr
    
    # Person is a person with more than one address
    class RichPerson(object):
        def __init__(self, name: str, addrs: dict[str, Address]):
            self.name = name
            self.addrs = addrs
    
    ### 1. use `Dynamic Attribute` to transfer dict to obj ###
    # notes:
    # Dynamic attributes are indeterminate, so it is impossible to explicitly know what attributes an instance has.
    # This means that the interpreter/IDE cannot give a prompt for dot access.
    # 动态属性的 dot access 和字典访问其实没有本质区别,因为我们无法清楚地知道有哪些属性(哪些key)
    
    # 1.1 SimpleDynamicStruct (not recommended. if you need dynamic attribute access, use NestedDynamicStruct instead)
    class SimpleDynamicStruct(object):
        def __init__(self, **entries):
            # 这种update __dict__的方式,只能更新__dict__的k:v
            # 也就是说,只有entries中的【顶层k-v】可以作为实例的动态attribute
            # the outter class will be constructed, however the inner class will be loaded as a dict. so SimpleDynamicStruct is not recommended.
            self.__dict__.update(entries)
    
    def test_dynamic_struct():
        data = {
        "name": "paul",
        "addrs": {
            "usa": {
                "street": "wall street",
                "number": 99
            }
        }
        }
        s : SimpleDynamicStruct = SimpleDynamicStruct(**data)
        # `s.addrs`(only top layer key as attribute) is ok, while `s.addrs.usa` not
        print(s.name, s.addrs["usa"]["number"])
    
    # 1.2 NestedDynamicStruct
    # now we use recursion to make SimpleDynamicStruct to NestedDynamicStruct.
    # NestedDynamicStruct provides attributes access(dot access) to dict, just like jsonpath. 
    class NestedDynamicStruct(object):
    
        # generate_dict_for_dynamic_struct is a recursive function
        # to transfer all nested subdictionaries to NestedDynamicStruct object.
        def generate_dict_for_dynamic_struct(normal_d : dict) -> dict:
    
            # generate_subobj_list is a recursive helper function
            # to transfer all nested subdictionaries to DynamicStruct object
            def generate_subobj_list(l : list) -> list:
                res = []
                for v in l:
                    if isinstance(v, dict):
                        # dict found and transferred to NestedDynamicStruct object
                        res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
                    elif isinstance(v, list):
                        res.append(generate_subobj_list(v))
                    elif isinstance(v, tuple):
                        res.append(generate_subobj_tuple(v))
                    else:
                        res.append(v)
                return res
    
            # generate_subobj_tuple is a recursive helper function
            # to transfer all nested subdictionaries to NestedDynamicStruct object
            def generate_subobj_tuple(t : tuple) -> tuple:
                res = []
                for v in t:
                    if isinstance(v, dict):
                        # dict found and transferred to NestedDynamicStruct object
                        res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
                    elif isinstance(v, list):
                        res.append(generate_subobj_list(v))
                    elif isinstance(v, tuple):
                        res.append(generate_subobj_tuple(v))
                    else:
                        res.append(v)
                return tuple(res)
    
            res = {}
            for k, v in normal_d.items():
                if isinstance(v, dict):
                    # dict found and transferred to NestedDynamicStruct object
                    res[k] = NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v))
                elif isinstance(v, list):
                    res[k] = generate_subobj_list(v)
                elif isinstance(v, tuple):
                    res[k] = generate_subobj_tuple(v)
                else:
                    res[k] = v
            return res
    
        def __init__(self, **data):
            generated_data = NestedDynamicStruct.generate_dict_for_dynamic_struct(data)
            self.__dict__.update(**generated_data)
    
    
    def test_nested_dynamic_struct():
        data =  {
        "name": "paul",
        "addrs": [[{
            "usa": {
                "street": "wall street",
                "number": 99
            }
        }]],
        "friends": ["bob", "amy"]
        }
        s : NestedDynamicStruct = NestedDynamicStruct(**data)
        print(s.name, s.friends[1], s.addrs, s.addrs[0][0].usa.number)
    
    
    # 1.2.1 types.SimpleNamespace is like NestedDynamicStruct.
    def test_simple_namespace():
        import json
        from types import SimpleNamespace
    
        data =  {
        "name": "paul",
        "addrs": [[{
            "usa": {
                "street": "wall street",
                "number": 99
            }
        }]],
        "friends": ["bob", "amy"]
        }
    
        # x = SimpleNamespace(**data) # using this method to get x won't get nested SimpleNamespace object 
    
        data_str = json.dumps(data)
        # should use `json.loads` with object_hook, rather than `x = SimpleNamespace(**data)`
        x = json.loads(data_str, object_hook=lambda d: SimpleNamespace(**d))
    
        print(x.name, x.friends[1], x.addrs, x.addrs[0][0].usa.number)
    
    # with dynamic attribute struct, we can use "dot access", but we still don't know exactly which properties exist.
    # so we should transfer to **static** data struct.
    # let's go on.
    
    ### 2. use `jsonpickle` and `type hints` to serialize/unserialize py object.
    # notes:
    # `jsonpickle` can serialize/unserialize py object perfectly.
    # The only downside is: the serialized json has `py/object` field, such as
    # {"py/object": "__main__.Person", "name": "Awesome", ...}
    # attention: jsonpickle encode的时候直接把类型元数据作为序列化信息的一部分,这样反序列化时利用类型元数据可以很方便地还原成python obj
    def test_jsonpickle():
        import jsonpickle
    
        p = Person('Awesome', addr=Address("wall street",10))
        # encode python object `p` to str
        frozen: str = jsonpickle.encode(p)
        
        print(frozen) # {"py/object": "__main__.Person", "name": "Awesome", "addr": {"py/object": "__main__.Address", "street": "wall street", "number": "10"}}
        # decode str to python object
        alive_p: Person = jsonpickle.decode(frozen)
        print(alive_p.name, alive_p.addr.street, alive_p.addr.number) # pass
    
    
        rich_p = RichPerson("paul", addrs={
            "usa":Address("wall street", 99)
            }
        )
        frozen : str = jsonpickle.encode(rich_p)
        print(frozen)
        alive_rp : RichPerson = jsonpickle.decode(frozen)
        print(alive_rp.name, alive_rp.addrs, alive_rp.addrs["usa"].street)
    
    # `jsonpickle` does a good job.
    # But thers is an additional field `"py/object"` stored in its json, which is an unusual case.
    # So let's see if there is other transfer approach base on common json.
    
    
    ### 3. Using `pymarshaler` (Which is close to to golang approach)
    # notes:
    # `pymarshaler`'s support for nested classes is not perfect.
    # marshal/unmarshal Person is ok, but unmarshal RichPerson is unexpected.
    # it failed to convert to a dict of objects
    
    def test_pymarshaler():
        # refers to https://pythonawesome.com/marshall-python-objects-to-and-from-json/
        from pymarshaler.marshal import Marshal
        import json
    
        m = Marshal(ignore_unknown_fields=True)
    
        # ---marshal/unmarshal Person---
        p = Person(name="bob",addr=Address("wall street", 99))
        json_p : bytes = m.marshal(p)
        returned_p : Person = m.unmarshal(Person, json.loads(json_p))
        print(returned_p, returned_p.addr.street) # pass
    
        # ---marshal/unmarshal RichPerson---
        rich_p = RichPerson("paul", addrs={
            "usa":Address("wall street", 99)
            }
        )
        json_rp : bytes = m.marshal(rich_p)
        returned_rp : RichPerson = m.unmarshal(RichPerson, json.loads(json_rp))
        print(returned_rp, returned_rp.addrs)
    
        # attention: the outter class will be constructed, however the inner class will be loaded as a dict.
        print(type(returned_rp.addrs["usa"])) # <class 'dict'>. Attention: not 'Address', so we have the error below
        # print(returned_rp.addrs["usa"].street) # -> 'dict' object has no attribute 'street'
    
    ### 4. use `dataclass` and `dacite`
    # notes:
    # `dacite` is the final perfect solution `transfer nested dictionaries to nested data structures`. 
    from dataclasses import dataclass, field
    @dataclass
    class DCAddress(object):
        street : str = field(default='')
        number : int = field(default=0)
    
    @dataclass
    class DCRichPerson(object):
        name : str = field(default='')
        addrs : dict[str, DCAddress] = field(default_factory=dict)
    
    def test_dacite():
        import dacite
    
        data = {
        "name": "paul",
        "addrs": {
            "usa": {
                "street": "wall street",
                "number": 99
            }
        }
        }
    
        # using data below is also ok
        # data = {
        # "name": "paul",
        # "addrs": {
        #   "usa": DCAddress("wall street", 99)
        # }
        # }
    
        rich_p : DCRichPerson = dacite.from_dict(data_class=DCRichPerson, data=data)
        print(rich_p.name, rich_p.addrs)
        print(type(rich_p.addrs["usa"]), rich_p.addrs["usa"].street)
    
    
    if __name__ == '__main__':
        # test_dynamic_struct()
        # test_nested_dynamic_struct()
        # test_simple_namespace()
        # test_jsonpickle()
        # test_pymarshaler()
        test_dacite()
    
    

    可以参考https://mp.weixin.qq.com/s/tNCphYBgtaBLesEDPHyzLA,写Python的时候要考虑三个等级:
    第一个等级不需要用类,直接使用list, dict这些,写个方法就搞定了。
    第二个等级,如果为了更直观的表达和存储数据(数据容器) ,可以用dataclass。
    第三个等级,要表达的事物既有状态,又有很多行为,那就用普通的类

    相关文章

      网友评论

          本文标题:python如何实现 数据的attribute access:

          本文链接:https://www.haomeiwen.com/subject/qrpxrdtx.html