美文网首页
python如何实现 数据的attribute access:

python如何实现 数据的attribute access:

作者: 9_SooHyun | 来源:发表于2023-03-11 15:39 被阅读0次

1. data's [attribute access] comparison between go and python

本人总结了3个主要差异

  • 载体结构:结构化 vs 非结构化
    go: 通常,Go的数据载体是struct,是结构化的pure data container。
    python: Python的数据载体通常是dict,是非结构化的,等于用go 的map去承载数据……这是无法想象的,因为你根本不知道map里面有什么k-v。要使用结构化的数据载体,就需要定义数据类,然后创建类实例并赋值

  • 载体创建:灵活 vs 略生硬
    go: struct可以任意组合,快速创建新结构体。通过 即时创建/即时组合 得到nested struct,可以很方便地将某些灵活的api返回值直接整体映射。但Python无法方便地做到这一点
    python: 任何通过定义class实现面向对象设计的编程语言,在定义nested class上都是不太方便的

  • 载体使用(数据交互):直接 vs 间接
    go: 对struct使用Marshal/Unmarshal可以直接进行数据交互:json bytes <-> struct
    python: 为了实现value的attribute access,还需要dict to PyObject 这个数据结构化过程(如dacite.from_dict):json bytes/string <-> dict <-> obj

对比下来,go显然更加简洁易用,因为它抓住了本质:

program is no more than logic on the data

2. attribute access for data in python

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

# Using dicts is a weak-sauce way to do object-oriented programming.
# Dictionaries are a very poor way to communicate expectations to readers of your code.
# Using a dictionary, how can you clearly and reusably specify that
# some dictionary key-value pairs are required, while others aren't?

# This module provides several methods to `transfer nested dictionaries to nested data structures` 
# so we can have `attribute access(aka, dot access)` to values.

# Address is a simple class
class Address(object):
    def __init__(self, street: str, number: int):
        self.street = street
        self.number = number

# Person is a person with one main address
class Person(object):
    def __init__(self, name: str, addr: Address):
        self.name = name
        self.addr = addr

# Person is a person with more than one address
class RichPerson(object):
    def __init__(self, name: str, addrs: dict[str, Address]):
        self.name = name
        self.addrs = addrs

### 1. use `Dynamic Attribute` to transfer dict to obj ###
# notes:
# Dynamic attributes are indeterminate, so it is impossible to explicitly know what attributes an instance has.
# This means that the interpreter/IDE cannot give a prompt for dot access.
# 动态属性的 dot access 和字典访问其实没有本质区别,因为我们无法清楚地知道有哪些属性(哪些key)

# 1.1 SimpleDynamicStruct (not recommended. if you need dynamic attribute access, use NestedDynamicStruct instead)
class SimpleDynamicStruct(object):
    def __init__(self, **entries):
        # 这种update __dict__的方式,只能更新__dict__的k:v
        # 也就是说,只有entries中的【顶层k-v】可以作为实例的动态attribute
        # the outter class will be constructed, however the inner class will be loaded as a dict. so SimpleDynamicStruct is not recommended.
        self.__dict__.update(entries)

def test_dynamic_struct():
    data = {
    "name": "paul",
    "addrs": {
        "usa": {
            "street": "wall street",
            "number": 99
        }
    }
    }
    s : SimpleDynamicStruct = SimpleDynamicStruct(**data)
    # `s.addrs`(only top layer key as attribute) is ok, while `s.addrs.usa` not
    print(s.name, s.addrs["usa"]["number"])

# 1.2 NestedDynamicStruct
# now we use recursion to make SimpleDynamicStruct to NestedDynamicStruct.
# NestedDynamicStruct provides attributes access(dot access) to dict, just like jsonpath. 
class NestedDynamicStruct(object):

    # generate_dict_for_dynamic_struct is a recursive function
    # to transfer all nested subdictionaries to NestedDynamicStruct object.
    def generate_dict_for_dynamic_struct(normal_d : dict) -> dict:

        # generate_subobj_list is a recursive helper function
        # to transfer all nested subdictionaries to DynamicStruct object
        def generate_subobj_list(l : list) -> list:
            res = []
            for v in l:
                if isinstance(v, dict):
                    # dict found and transferred to NestedDynamicStruct object
                    res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
                elif isinstance(v, list):
                    res.append(generate_subobj_list(v))
                elif isinstance(v, tuple):
                    res.append(generate_subobj_tuple(v))
                else:
                    res.append(v)
            return res

        # generate_subobj_tuple is a recursive helper function
        # to transfer all nested subdictionaries to NestedDynamicStruct object
        def generate_subobj_tuple(t : tuple) -> tuple:
            res = []
            for v in t:
                if isinstance(v, dict):
                    # dict found and transferred to NestedDynamicStruct object
                    res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
                elif isinstance(v, list):
                    res.append(generate_subobj_list(v))
                elif isinstance(v, tuple):
                    res.append(generate_subobj_tuple(v))
                else:
                    res.append(v)
            return tuple(res)

        res = {}
        for k, v in normal_d.items():
            if isinstance(v, dict):
                # dict found and transferred to NestedDynamicStruct object
                res[k] = NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v))
            elif isinstance(v, list):
                res[k] = generate_subobj_list(v)
            elif isinstance(v, tuple):
                res[k] = generate_subobj_tuple(v)
            else:
                res[k] = v
        return res

    def __init__(self, **data):
        generated_data = NestedDynamicStruct.generate_dict_for_dynamic_struct(data)
        self.__dict__.update(**generated_data)


def test_nested_dynamic_struct():
    data =  {
    "name": "paul",
    "addrs": [[{
        "usa": {
            "street": "wall street",
            "number": 99
        }
    }]],
    "friends": ["bob", "amy"]
    }
    s : NestedDynamicStruct = NestedDynamicStruct(**data)
    print(s.name, s.friends[1], s.addrs, s.addrs[0][0].usa.number)


# 1.2.1 types.SimpleNamespace is like NestedDynamicStruct.
def test_simple_namespace():
    import json
    from types import SimpleNamespace

    data =  {
    "name": "paul",
    "addrs": [[{
        "usa": {
            "street": "wall street",
            "number": 99
        }
    }]],
    "friends": ["bob", "amy"]
    }

    # x = SimpleNamespace(**data) # using this method to get x won't get nested SimpleNamespace object 

    data_str = json.dumps(data)
    # should use `json.loads` with object_hook, rather than `x = SimpleNamespace(**data)`
    x = json.loads(data_str, object_hook=lambda d: SimpleNamespace(**d))

    print(x.name, x.friends[1], x.addrs, x.addrs[0][0].usa.number)

# with dynamic attribute struct, we can use "dot access", but we still don't know exactly which properties exist.
# so we should transfer to **static** data struct.
# let's go on.

### 2. use `jsonpickle` and `type hints` to serialize/unserialize py object.
# notes:
# `jsonpickle` can serialize/unserialize py object perfectly.
# The only downside is: the serialized json has `py/object` field, such as
# {"py/object": "__main__.Person", "name": "Awesome", ...}
# attention: jsonpickle encode的时候直接把类型元数据作为序列化信息的一部分,这样反序列化时利用类型元数据可以很方便地还原成python obj
def test_jsonpickle():
    import jsonpickle

    p = Person('Awesome', addr=Address("wall street",10))
    # encode python object `p` to str
    frozen: str = jsonpickle.encode(p)
    
    print(frozen) # {"py/object": "__main__.Person", "name": "Awesome", "addr": {"py/object": "__main__.Address", "street": "wall street", "number": "10"}}
    # decode str to python object
    alive_p: Person = jsonpickle.decode(frozen)
    print(alive_p.name, alive_p.addr.street, alive_p.addr.number) # pass


    rich_p = RichPerson("paul", addrs={
        "usa":Address("wall street", 99)
        }
    )
    frozen : str = jsonpickle.encode(rich_p)
    print(frozen)
    alive_rp : RichPerson = jsonpickle.decode(frozen)
    print(alive_rp.name, alive_rp.addrs, alive_rp.addrs["usa"].street)

# `jsonpickle` does a good job.
# But thers is an additional field `"py/object"` stored in its json, which is an unusual case.
# So let's see if there is other transfer approach base on common json.


### 3. Using `pymarshaler` (Which is close to to golang approach)
# notes:
# `pymarshaler`'s support for nested classes is not perfect.
# marshal/unmarshal Person is ok, but unmarshal RichPerson is unexpected.
# it failed to convert to a dict of objects

def test_pymarshaler():
    # refers to https://pythonawesome.com/marshall-python-objects-to-and-from-json/
    from pymarshaler.marshal import Marshal
    import json

    m = Marshal(ignore_unknown_fields=True)

    # ---marshal/unmarshal Person---
    p = Person(name="bob",addr=Address("wall street", 99))
    json_p : bytes = m.marshal(p)
    returned_p : Person = m.unmarshal(Person, json.loads(json_p))
    print(returned_p, returned_p.addr.street) # pass

    # ---marshal/unmarshal RichPerson---
    rich_p = RichPerson("paul", addrs={
        "usa":Address("wall street", 99)
        }
    )
    json_rp : bytes = m.marshal(rich_p)
    returned_rp : RichPerson = m.unmarshal(RichPerson, json.loads(json_rp))
    print(returned_rp, returned_rp.addrs)

    # attention: the outter class will be constructed, however the inner class will be loaded as a dict.
    print(type(returned_rp.addrs["usa"])) # <class 'dict'>. Attention: not 'Address', so we have the error below
    # print(returned_rp.addrs["usa"].street) # -> 'dict' object has no attribute 'street'

### 4. use `dataclass` and `dacite`
# notes:
# `dacite` is the final perfect solution `transfer nested dictionaries to nested data structures`. 
from dataclasses import dataclass, field
@dataclass
class DCAddress(object):
    street : str = field(default='')
    number : int = field(default=0)

@dataclass
class DCRichPerson(object):
    name : str = field(default='')
    addrs : dict[str, DCAddress] = field(default_factory=dict)

def test_dacite():
    import dacite

    data = {
    "name": "paul",
    "addrs": {
        "usa": {
            "street": "wall street",
            "number": 99
        }
    }
    }

    # using data below is also ok
    # data = {
    # "name": "paul",
    # "addrs": {
    #   "usa": DCAddress("wall street", 99)
    # }
    # }

    rich_p : DCRichPerson = dacite.from_dict(data_class=DCRichPerson, data=data)
    print(rich_p.name, rich_p.addrs)
    print(type(rich_p.addrs["usa"]), rich_p.addrs["usa"].street)


if __name__ == '__main__':
    # test_dynamic_struct()
    # test_nested_dynamic_struct()
    # test_simple_namespace()
    # test_jsonpickle()
    # test_pymarshaler()
    test_dacite()

可以参考https://mp.weixin.qq.com/s/tNCphYBgtaBLesEDPHyzLA,写Python的时候要考虑三个等级:
第一个等级不需要用类,直接使用list, dict这些,写个方法就搞定了。
第二个等级,如果为了更直观的表达和存储数据(数据容器) ,可以用dataclass。
第三个等级,要表达的事物既有状态,又有很多行为,那就用普通的类

相关文章

网友评论

      本文标题:python如何实现 数据的attribute access:

      本文链接:https://www.haomeiwen.com/subject/qrpxrdtx.html