美文网首页Python
渣翻marshmallow文档

渣翻marshmallow文档

作者: 杨酥饼 | 来源:发表于2017-11-15 22:54 被阅读263次

    marshmallow

    marshmallow是一个用来将复杂的orm对象与python原生数据类型之间相互转换的库,简而言之,就是实现object -> dictobjects -> liststring -> dictstring -> list

    要用到marshmallow,首先需要一个用于序列化和反序列化的类:

    import datetime as dt
    
    class User(object):
        def __init__(self, name, email):
            self.name = name
            self.email = email
            self.created_at = dt.datetime.now()
    
        def __repr__(self):
            return '<User(name={self.name!r})>'.format(self=self)
    

    Schema

    要对一个类或者一个json数据实现相互转换(即序列化和反序列化,序列化的意思是将数据转化为可存储或可传输的数据类型),需要一个中间载体,这个载体就是Schema。除了转换以外,Schema还可以用来做数据校验。每个需要转换的类,都需要一个对应的Schema:

    from marshmallow import Schema, fields
    
    class UserSchema(Schema):
        name = fields.Str()
        email = fields.Email()
        created_at = fields.DateTime()
    

    Serializing(序列化)

    序列化使用schema中的dump()dumps()方法,其中,dump() 方法实现obj -> dictdumps()方法实现 obj -> string,由于Flask能直接序列化dict(使用jsonify),而且你肯定还会对dict进一步处理,没必要现在转化成string,所以通常Flask与Marshmallow配合序列化时,用 dump()方法即可:

    from marshmallow import pprint
    
    user = User(name="Monty", email="monty@python.org")
    schema = UserSchema()
    result = schema.dump(user)
    pprint(result.data)
    # {"name": "Monty",
    #  "email": "monty@python.org",
    #  "created_at": "2014-08-17T14:54:16.049594+00:00"}
    

    过滤输出

    当然你不需要每次都输出对象中所有字段,可以使用only参数来指定你需要输出的字段,这个在实际场景中很常见:

    summary_schema = UserSchema(only=('name', 'email'))
    summary_schema.dump(user).data
    # {"name": "Monty Python", "email": "monty@python.org"}
    

    你也可以使用exclude字段来排除你不想输出的字段。

    Deserializing(序列化)

    相对dump()的方法就是load()了,可以将字典等类型转换成应用层的数据结构,即orm对象:

    from pprint import pprint
    
    user_data = {
        'created_at': '2014-08-11T05:26:03.869245',
        'email': u'ken@yahoo.com',
        'name': u'Ken'
    }
    schema = UserSchema()
    result = schema.load(user_data)
    pprint(result.data)
    # {'name': 'Ken',
    #  'email': 'ken@yahoo.com',
    #  'created_at': datetime.datetime(2014, 8, 11, 5, 26, 3, 869245)},
    

    对反序列化而言,将传入的dict变成object更加有意义。在Marshmallow中,dict -> object的方法需要自己实现,然后在该方法前面加上一个decoration:post_load即可,即:

    from marshmallow import Schema, fields, post_load
    
    class UserSchema(Schema):
        name = fields.Str()
        email = fields.Email()
        created_at = fields.DateTime()
    
        @post_load
        def make_user(self, data):
            return User(**data)
    

    这样每次调用load()方法时,会按照make_user的逻辑,返回一个User类对象:

    user_data = {
        'name': 'Ronnie',
        'email': 'ronnie@stones.com'
    }
    schema = UserSchema()
    result = schema.load(user_data)
    result.data  # => <User(name='Ronnie')>
    

    tips: 相对于dumps(),也存在loads()方法,用于string -> object,有些简单场景可以用。

    Objects <-> List

    上面的序列化和反序列化,是针对一个object而言的,对于objects的处理,只需在schema中增加一个参数:many=True,即:

    user1 = User(name="Mick", email="mick@stones.com")
    user2 = User(name="Keith", email="keith@stones.com")
    users = [user1, user2]
    
    # option 1:
    schema = UserSchema(many=True)
    result = schema.dump(users)
    
    # Option 2:
    schema = UserSchema()
    result = schema.dump(users, many=True)
    result.data
    
    # [{'name': u'Mick',
    #   'email': u'mick@stones.com',
    #   'created_at': '2014-08-17T14:58:57.600623+00:00'}
    #  {'name': u'Keith',
    #   'email': u'keith@stones.com',
    #   'created_at': '2014-08-17T14:58:57.600623+00:00'}]
    

    Validation

    Schema.load()loads()方法会在返回值中加入验证错误的dictionary,例如emailURL都有内建的验证器。

    data, errors = UserSchema().load({'email': 'foo'})
    errors  # => {'email': ['"foo" is not a valid email address.']}
    # OR, equivalently
    result = UserSchema().load({'email': 'foo'})
    result.errors  # => {'email': ['"foo" is not a valid email address.']}
    

    当验证一个集合时,返回的错误dictionary会以错误序号对应错误信息的key:value形式保存:

    class BandMemberSchema(Schema):
        name = fields.String(required=True)
        email = fields.Email()
    
    user_data = [
        {'email': 'mick@stones.com', 'name': 'Mick'},
        {'email': 'invalid', 'name': 'Invalid'},  # invalid email
        {'email': 'keith@stones.com', 'name': 'Keith'},
        {'email': 'charlie@stones.com'},  # missing "name"
    ]
    
    result = BandMemberSchema(many=True).load(user_data)
    result.errors
    # {1: {'email': ['"invalid" is not a valid email address.']},
    #  3: {'name': ['Missing data for required field.']}}
    

    你可以向内建的field中传入validate 参数来定制验证的逻辑,validate的值可以是函数,匿名函数lambda,或者是定义了__call__的对象:

    class ValidatedUserSchema(UserSchema):
        # NOTE: This is a contrived example.
        # You could use marshmallow.validate.Range instead of an anonymous function here
        age = fields.Number(validate=lambda n: 18 <= n <= 40)
    
    in_data = {'name': 'Mick', 'email': 'mick@stones.com', 'age': 71}
    result = ValidatedUserSchema().load(in_data)
    result.errors  # => {'age': ['Validator <lambda>(71.0) is False']}
    

    如果你传入的函数中定义了ValidationError,当它触发时,错误信息会得到保存:

    from marshmallow import Schema, fields, ValidationError
    
    def validate_quantity(n):
        if n < 0:
            raise ValidationError('Quantity must be greater than 0.')
        if n > 30:
            raise ValidationError('Quantity must not be greater than 30.')
    
    class ItemSchema(Schema):
        quantity = fields.Integer(validate=validate_quantity)
    
    in_data = {'quantity': 31}
    result, errors = ItemSchema().load(in_data)
    errors  # => {'quantity': ['Quantity must not be greater than 30.']}
    

    注意:
    如果你需要执行多个验证,你应该传入可调用的验证器的集合(list, tuple, generator)

    注意2:
    Schema.dump() 也会返回错误信息dictionary,也会包含序列化时的所有ValidationErrors。但是required, allow_none, validate, @validates, 和 @validates_schema 只用于反序列化,即Schema.load()

    Field Validators as Methods

    把生成器写成方法可以提供极大的便利。使用validates 装饰器就可以注册一个验证方法:

    from marshmallow import fields, Schema, validates, ValidationError
    class ItemSchema(Schema):
        quantity = fields.Integer()
    
        @validates('quantity')
        def validate_quantity(self, value):
            if value < 0:
                raise ValidationError('Quantity must be greater than 0.')
            if value > 30:
                raise ValidationError('Quantity must not be greater than 30.')
    

    strict Mode

    如果将strict=True传入Schema构造器或者classMeta参数里,则仅会在传入无效数据是报错。可以使用ValidationError.messages变量来获取验证错误的dictionary

    Required Fields

    你可以在field中传入required=True.当Schema.load()的输入缺少某个字段时错误会记录下来。
    如果需要定制required fields的错误信息,可以传入一个error_messages参数,参数的值为以required为键的键值对。

    class UserSchema(Schema):
        name = fields.String(required=True)
        age = fields.Integer(
            required=True,
            error_messages={'required': 'Age is required.'}
        )
        city = fields.String(
            required=True,
            error_messages={'required': {'message': 'City required', 'code': 400}}
        )
        email = fields.Email()
    
    data, errors = UserSchema().load({'email': 'foo@bar.com'})
    errors
    # {'name': ['Missing data for required field.'],
    #  'age': ['Age is required.'],
    #  'city': {'message': 'City required', 'code': 400}}
    

    Partial Loading

    按照RESTful架构风格的要求,更新数据使用HTTP方法中的PUTPATCH方法,使用PUT方法时,需要把完整的数据全部传给服务器,使用PATCH方法时,只需把需要改动的部分数据传给服务器即可。因此,当使用PATCH方法时,由于之前设定的required,传入数据存在无法通过Marshmallow 数据校验的风险,为了避免这种情况,需要借助Partial Loading功能。

    实现Partial Loadig只要在schema构造器中增加一个partial参数即可:

    class UserSchema(Schema):
        name = fields.String(required=True)
        age = fields.Integer(required=True)
    
    data, errors = UserSchema().load({'age': 42}, partial=('name',))
    # OR UserSchema(partial=('name',)).load({'age': 42})
    data, errors  # => ({'age': 42}, {})
    

    Schema.validate

    如果你只是想用Schema验证数据,而不生成对象,可以使用Schema.validate().

    errors = UserSchema().validate({'name': 'Ronnie', 'email': 'invalid-email'})
    errors  # {'email': ['"invalid-email" is not a valid email address.']}
    

    Specifying Attribute Names

    Schemas默认会编列传入对象和自身定义的fields相同的属性,然而你也会有需求使用不同的fields和属性名。在这种情况下,你需要明确定义这个fields将从什么属性名取值:

    class UserSchema(Schema):
        name = fields.String()
        email_addr = fields.String(attribute="email")
        date_created = fields.DateTime(attribute="created_at")
    
    user = User('Keith', email='keith@stones.com')
    ser = UserSchema()
    result, errors = ser.dump(user)
    pprint(result)
    # {'name': 'Keith',
    #  'email_addr': 'keith@stones.com',
    #  'date_created': '2014-08-17T14:58:57.600623+00:00'}
    

    Specifying Deserialization Keys

    Schemas默认会反编列传入字典和输出字典中相同的字段名。如果你觉得数据不匹配你的schema,你可以传入load_from参数指定需要增加load的字段名(原字段名也能load,且优先load原字段名):

    class UserSchema(Schema):
        name = fields.String()
        email = fields.Email(load_from='emailAddress')
    
    data = {
        'name': 'Mike',
        'emailAddress': 'foo@bar.com'
    }
    s = UserSchema()
    result, errors = s.load(data)
    #{'name': u'Mike',
    # 'email': 'foo@bar.com'}   
    

    Specifying Serialization Keys

    如果你需要编列一个field成一个不同的名字时,可以使用dump_to,逻辑和load_from类似:

    class UserSchema(Schema):
        name = fields.String(dump_to='TheName')
        email = fields.Email(load_from='CamelCasedEmail', dump_to='CamelCasedEmail')
    
    data = {
        'name': 'Mike',
        'email': 'foo@bar.com'
    }
    s = UserSchema()
    result, errors = s.dump(data)
    #{'TheName': u'Mike',
    # 'CamelCasedEmail': 'foo@bar.com'}
    

    “Read-only” and “Write-only” Fields

    可以指定某些字段只能够dump()load():

    class UserSchema(Schema):
        name = fields.Str()
        # password is "write-only"
        password = fields.Str(load_only=True)
        # created_at is "read-only"
        created_at = fields.DateTime(dump_only=True)
    

    Nesting Schemas

    当你的模型含有外键,那这个外键的对象在Schemas如何定义。举个例子,Blog就具有User对象作为它的外键:

    
    Use a Nested field to represent the relationship, passing in a nested schema class.
    import datetime as dt
    
    class User(object):
        def __init__(self, name, email):
            self.name = name
            self.email = email
            self.created_at = dt.datetime.now()
            self.friends = []
            self.employer = None
    
    class Blog(object):
        def __init__(self, title, author):
            self.title = title
            self.author = author  # A User object
    

    使用Nested field表示外键对象:

    from marshmallow import Schema, fields, pprint
    
    class UserSchema(Schema):
        name = fields.String()
        email = fields.Email()
        created_at = fields.DateTime()
    
    class BlogSchema(Schema):
        title = fields.String()
        author = fields.Nested(UserSchema)
    

    这样序列化blog就会带上user信息了:

    user = User(name="Monty", email="monty@python.org")
    blog = Blog(title="Something Completely Different", author=user)
    result, errors = BlogSchema().dump(blog)
    pprint(result)
    # {'title': u'Something Completely Different',
    # {'author': {'name': u'Monty',
    #             'email': u'monty@python.org',
    #             'created_at': '2014-08-17T14:58:57.600623+00:00'}}
    

    如果field 是多个对象的集合,定义时可以使用many参数:

    collaborators = fields.Nested(UserSchema, many=True)
    

    如果外键对象是自引用,则Nested里第一个参数为'self'

    Specifying Which Fields to Nest

    如果你想指定外键对象序列化后只保留它的几个字段,可以使用Only参数:

    class BlogSchema2(Schema):
        title = fields.String()
        author = fields.Nested(UserSchema, only=["email"])
    
    schema = BlogSchema2()
    result, errors = schema.dump(blog)
    pprint(result)
    # {
    #     'title': u'Something Completely Different',
    #     'author': {'email': u'monty@python.org'}
    # }
    

    如果需要选择外键对象的字段层次较多,可以使用"."操作符来指定:

    class SiteSchema(Schema):
        blog = fields.Nested(BlogSchema2)
    
    schema = SiteSchema(only=['blog.author.email'])
    result, errors = schema.dump(site)
    pprint(result)
    # {
    #     'blog': {
    #         'author': {'email': u'monty@python.org'}
    #     }
    # }
    

    Note

    如果你往Nested是多个对象的列表,传入only可以获得这列表的指定字段。

    class UserSchema(Schema):
        name = fields.String()
        email = fields.Email()
        friends = fields.Nested('self', only='name', many=True)
    # ... create ``user`` ...
    result, errors = UserSchema().dump(user)
    pprint(result)
    # {
    #     "name": "Steve",
    #     "email": "steve@example.com",
    #     "friends": ["Mike", "Joe"]
    # }
    

    这种情况,也可以使用exclude 去掉你不需要的字段。同样这里也可以使用"."操作符。

    Two-way Nesting

    如果有两个对象需要相互包含,可以指定Nested对象的类名字符串,而不需要类。这样你可以包含一个还未定义的对象:

    class AuthorSchema(Schema):
        # Make sure to use the 'only' or 'exclude' params
        # to avoid infinite recursion
        books = fields.Nested('BookSchema', many=True, exclude=('author', ))
        class Meta:
            fields = ('id', 'name', 'books')
    
    class BookSchema(Schema):
        author = fields.Nested(AuthorSchema, only=('id', 'name'))
        class Meta:
            fields = ('id', 'title', 'author')
    

    举个例子,Author类包含很多books,而BookAuthor也有多对一的关系。

    from marshmallow import pprint
    from mymodels import Author, Book
    
    author = Author(name='William Faulkner')
    book = Book(title='As I Lay Dying', author=author)
    book_result, errors = BookSchema().dump(book)
    pprint(book_result, indent=2)
    # {
    #   "id": 124,
    #   "title": "As I Lay Dying",
    #   "author": {
    #     "id": 8,
    #     "name": "William Faulkner"
    #   }
    # }
    
    author_result, errors = AuthorSchema().dump(author)
    pprint(author_result, indent=2)
    # {
    #   "id": 8,
    #   "name": "William Faulkner",
    #   "books": [
    #     {
    #       "id": 124,
    #       "title": "As I Lay Dying"
    #     }
    #   ]
    # }
    

    Nesting A Schema Within Itself

    如果需要自引用,"Nested"构造时传入"self" (包含引号)即可:

    class UserSchema(Schema):
        name = fields.String()
        email = fields.Email()
        friends = fields.Nested('self', many=True)
        # Use the 'exclude' argument to avoid infinite recursion
        employer = fields.Nested('self', exclude=('employer', ), default=None)
    
    user = User("Steve", 'steve@example.com')
    user.friends.append(User("Mike", 'mike@example.com'))
    user.friends.append(User('Joe', 'joe@example.com'))
    user.employer = User('Dirk', 'dirk@example.com')
    result = UserSchema().dump(user)
    pprint(result.data, indent=2)
    # {
    #     "name": "Steve",
    #     "email": "steve@example.com",
    #     "friends": [
    #         {
    #             "name": "Mike",
    #             "email": "mike@example.com",
    #             "friends": [],
    #             "employer": null
    #         },
    #         {
    #             "name": "Joe",
    #             "email": "joe@example.com",
    #             "friends": [],
    #             "employer": null
    #         }
    #     ],
    #     "employer": {
    #         "name": "Dirk",
    #         "email": "dirk@example.com",
    #         "friends": []
    #     }
    # }
    

    相关文章

      网友评论

        本文标题:渣翻marshmallow文档

        本文链接:https://www.haomeiwen.com/subject/cikkvxtx.html