1. data's [attribute access] comparison between go and python
本人总结了3个主要差异
-
载体结构:结构化 vs 非结构化
go: 通常,Go的数据载体是struct
,是结构化的pure data container。
python: Python的数据载体通常是dict
,是非结构化的,等于用go 的map
去承载数据……这是无法想象的,因为你根本不知道map
里面有什么k-v。要使用结构化的数据载体,就需要定义数据类,然后创建类实例并赋值 -
载体创建:灵活 vs 略生硬
go:struct
可以任意组合,快速创建新结构体。通过 即时创建/即时组合 得到nested struct,可以很方便地将某些灵活的api返回值直接整体映射。但Python无法方便地做到这一点
python: 任何通过定义class实现面向对象设计的编程语言,在定义nested class上都是不太方便的 -
载体使用(数据交互):直接 vs 间接
go: 对struct
使用Marshal/Unmarshal可以直接进行数据交互:json bytes <-> struct
python: 为了实现value的attribute access,还需要dict to PyObject 这个数据结构化过程(如dacite.from_dict
):json bytes/string <-> dict <-> obj
对比下来,go显然更加简洁易用,因为它抓住了本质:
program is no more than logic on the data
2. attribute access for data in python
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
# Using dicts is a weak-sauce way to do object-oriented programming.
# Dictionaries are a very poor way to communicate expectations to readers of your code.
# Using a dictionary, how can you clearly and reusably specify that
# some dictionary key-value pairs are required, while others aren't?
# This module provides several methods to `transfer nested dictionaries to nested data structures`
# so we can have `attribute access(aka, dot access)` to values.
# Address is a simple class
class Address(object):
def __init__(self, street: str, number: int):
self.street = street
self.number = number
# Person is a person with one main address
class Person(object):
def __init__(self, name: str, addr: Address):
self.name = name
self.addr = addr
# Person is a person with more than one address
class RichPerson(object):
def __init__(self, name: str, addrs: dict[str, Address]):
self.name = name
self.addrs = addrs
### 1. use `Dynamic Attribute` to transfer dict to obj ###
# notes:
# Dynamic attributes are indeterminate, so it is impossible to explicitly know what attributes an instance has.
# This means that the interpreter/IDE cannot give a prompt for dot access.
# 动态属性的 dot access 和字典访问其实没有本质区别,因为我们无法清楚地知道有哪些属性(哪些key)
# 1.1 SimpleDynamicStruct (not recommended. if you need dynamic attribute access, use NestedDynamicStruct instead)
class SimpleDynamicStruct(object):
def __init__(self, **entries):
# 这种update __dict__的方式,只能更新__dict__的k:v
# 也就是说,只有entries中的【顶层k-v】可以作为实例的动态attribute
# the outter class will be constructed, however the inner class will be loaded as a dict. so SimpleDynamicStruct is not recommended.
self.__dict__.update(entries)
def test_dynamic_struct():
data = {
"name": "paul",
"addrs": {
"usa": {
"street": "wall street",
"number": 99
}
}
}
s : SimpleDynamicStruct = SimpleDynamicStruct(**data)
# `s.addrs`(only top layer key as attribute) is ok, while `s.addrs.usa` not
print(s.name, s.addrs["usa"]["number"])
# 1.2 NestedDynamicStruct
# now we use recursion to make SimpleDynamicStruct to NestedDynamicStruct.
# NestedDynamicStruct provides attributes access(dot access) to dict, just like jsonpath.
class NestedDynamicStruct(object):
# generate_dict_for_dynamic_struct is a recursive function
# to transfer all nested subdictionaries to NestedDynamicStruct object.
def generate_dict_for_dynamic_struct(normal_d : dict) -> dict:
# generate_subobj_list is a recursive helper function
# to transfer all nested subdictionaries to DynamicStruct object
def generate_subobj_list(l : list) -> list:
res = []
for v in l:
if isinstance(v, dict):
# dict found and transferred to NestedDynamicStruct object
res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
elif isinstance(v, list):
res.append(generate_subobj_list(v))
elif isinstance(v, tuple):
res.append(generate_subobj_tuple(v))
else:
res.append(v)
return res
# generate_subobj_tuple is a recursive helper function
# to transfer all nested subdictionaries to NestedDynamicStruct object
def generate_subobj_tuple(t : tuple) -> tuple:
res = []
for v in t:
if isinstance(v, dict):
# dict found and transferred to NestedDynamicStruct object
res.append(NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v)))
elif isinstance(v, list):
res.append(generate_subobj_list(v))
elif isinstance(v, tuple):
res.append(generate_subobj_tuple(v))
else:
res.append(v)
return tuple(res)
res = {}
for k, v in normal_d.items():
if isinstance(v, dict):
# dict found and transferred to NestedDynamicStruct object
res[k] = NestedDynamicStruct(**NestedDynamicStruct.generate_dict_for_dynamic_struct(v))
elif isinstance(v, list):
res[k] = generate_subobj_list(v)
elif isinstance(v, tuple):
res[k] = generate_subobj_tuple(v)
else:
res[k] = v
return res
def __init__(self, **data):
generated_data = NestedDynamicStruct.generate_dict_for_dynamic_struct(data)
self.__dict__.update(**generated_data)
def test_nested_dynamic_struct():
data = {
"name": "paul",
"addrs": [[{
"usa": {
"street": "wall street",
"number": 99
}
}]],
"friends": ["bob", "amy"]
}
s : NestedDynamicStruct = NestedDynamicStruct(**data)
print(s.name, s.friends[1], s.addrs, s.addrs[0][0].usa.number)
# 1.2.1 types.SimpleNamespace is like NestedDynamicStruct.
def test_simple_namespace():
import json
from types import SimpleNamespace
data = {
"name": "paul",
"addrs": [[{
"usa": {
"street": "wall street",
"number": 99
}
}]],
"friends": ["bob", "amy"]
}
# x = SimpleNamespace(**data) # using this method to get x won't get nested SimpleNamespace object
data_str = json.dumps(data)
# should use `json.loads` with object_hook, rather than `x = SimpleNamespace(**data)`
x = json.loads(data_str, object_hook=lambda d: SimpleNamespace(**d))
print(x.name, x.friends[1], x.addrs, x.addrs[0][0].usa.number)
# with dynamic attribute struct, we can use "dot access", but we still don't know exactly which properties exist.
# so we should transfer to **static** data struct.
# let's go on.
### 2. use `jsonpickle` and `type hints` to serialize/unserialize py object.
# notes:
# `jsonpickle` can serialize/unserialize py object perfectly.
# The only downside is: the serialized json has `py/object` field, such as
# {"py/object": "__main__.Person", "name": "Awesome", ...}
# attention: jsonpickle encode的时候直接把类型元数据作为序列化信息的一部分,这样反序列化时利用类型元数据可以很方便地还原成python obj
def test_jsonpickle():
import jsonpickle
p = Person('Awesome', addr=Address("wall street",10))
# encode python object `p` to str
frozen: str = jsonpickle.encode(p)
print(frozen) # {"py/object": "__main__.Person", "name": "Awesome", "addr": {"py/object": "__main__.Address", "street": "wall street", "number": "10"}}
# decode str to python object
alive_p: Person = jsonpickle.decode(frozen)
print(alive_p.name, alive_p.addr.street, alive_p.addr.number) # pass
rich_p = RichPerson("paul", addrs={
"usa":Address("wall street", 99)
}
)
frozen : str = jsonpickle.encode(rich_p)
print(frozen)
alive_rp : RichPerson = jsonpickle.decode(frozen)
print(alive_rp.name, alive_rp.addrs, alive_rp.addrs["usa"].street)
# `jsonpickle` does a good job.
# But thers is an additional field `"py/object"` stored in its json, which is an unusual case.
# So let's see if there is other transfer approach base on common json.
### 3. Using `pymarshaler` (Which is close to to golang approach)
# notes:
# `pymarshaler`'s support for nested classes is not perfect.
# marshal/unmarshal Person is ok, but unmarshal RichPerson is unexpected.
# it failed to convert to a dict of objects
def test_pymarshaler():
# refers to https://pythonawesome.com/marshall-python-objects-to-and-from-json/
from pymarshaler.marshal import Marshal
import json
m = Marshal(ignore_unknown_fields=True)
# ---marshal/unmarshal Person---
p = Person(name="bob",addr=Address("wall street", 99))
json_p : bytes = m.marshal(p)
returned_p : Person = m.unmarshal(Person, json.loads(json_p))
print(returned_p, returned_p.addr.street) # pass
# ---marshal/unmarshal RichPerson---
rich_p = RichPerson("paul", addrs={
"usa":Address("wall street", 99)
}
)
json_rp : bytes = m.marshal(rich_p)
returned_rp : RichPerson = m.unmarshal(RichPerson, json.loads(json_rp))
print(returned_rp, returned_rp.addrs)
# attention: the outter class will be constructed, however the inner class will be loaded as a dict.
print(type(returned_rp.addrs["usa"])) # <class 'dict'>. Attention: not 'Address', so we have the error below
# print(returned_rp.addrs["usa"].street) # -> 'dict' object has no attribute 'street'
### 4. use `dataclass` and `dacite`
# notes:
# `dacite` is the final perfect solution `transfer nested dictionaries to nested data structures`.
from dataclasses import dataclass, field
@dataclass
class DCAddress(object):
street : str = field(default='')
number : int = field(default=0)
@dataclass
class DCRichPerson(object):
name : str = field(default='')
addrs : dict[str, DCAddress] = field(default_factory=dict)
def test_dacite():
import dacite
data = {
"name": "paul",
"addrs": {
"usa": {
"street": "wall street",
"number": 99
}
}
}
# using data below is also ok
# data = {
# "name": "paul",
# "addrs": {
# "usa": DCAddress("wall street", 99)
# }
# }
rich_p : DCRichPerson = dacite.from_dict(data_class=DCRichPerson, data=data)
print(rich_p.name, rich_p.addrs)
print(type(rich_p.addrs["usa"]), rich_p.addrs["usa"].street)
if __name__ == '__main__':
# test_dynamic_struct()
# test_nested_dynamic_struct()
# test_simple_namespace()
# test_jsonpickle()
# test_pymarshaler()
test_dacite()
可以参考https://mp.weixin.qq.com/s/tNCphYBgtaBLesEDPHyzLA,写Python的时候要考虑三个等级:
第一个等级不需要用类,直接使用list, dict这些,写个方法就搞定了。
第二个等级,如果为了更直观的表达和存储数据(数据容器) ,可以用dataclass。
第三个等级,要表达的事物既有状态,又有很多行为,那就用普通的类
网友评论