美文网首页
scrapy_redis中序列化源码及其在程序设计中的应用

scrapy_redis中序列化源码及其在程序设计中的应用

作者: Python之战 | 来源:发表于2019-03-17 21:32 被阅读0次

    序列化 (Serialization)是将对象的状态信息转换为可以存储或传输的形式的过程。在序列化期间,对象将其当前状态写入到临时或持久性存储区。以后,可以通过从存储区中读取或反序列化对象的状态,重新创建该对象。

    在scrapy_redis中,一个Request对象先经过DupeFilter去重,然后递交给scheduler调度储存在Redis中,这就面临一个问题,Request是一个对象,Redis不能存储该对象,这时就需要将request序列化储存。

    scrapy中序列化模块如下:

    from scrapy_redis import picklecompat

    """A pickle wrapper module with protocol=-1 by default."""
    
    try:
        import cPickle as pickle  # PY2
    except ImportError:
        import pickle
    
    def loads(s):
        return pickle.loads(s)
    
    def dumps(obj):
        return pickle.dumps(obj, protocol=-1)
    
    

    当然python3直接使用pickle模块, 已经没有cPickle,该模块最为重要的两个方法,序列化与反序列化如上,通过序列化后的对象我们可以存储在数据库、文本等文件中,并快速恢复。

    同时模式设计中的备忘录模式通过这种方式达到最佳效果《python设计模式(十九):备忘录模式》;可序列化的对象和数据类型如下:

    • None, True,False
    • 整数,长整数,浮点数,复数
    • 普通字符串和Unicode字符串
    • 元组、列表、集合和字典,只包含可选择的对象。
    • 在模块顶层定义的函数
    • 在模块顶层定义的内置函数
    • 在模块的顶层定义的类。
    • 这些类的实例

    尝试对不可序列化对象进行操作,将引发PicklingError异常;发生这种情况时,可能已经将未指定的字节数写入基础文件。尝试选择高度递归的数据结构可能会超过最大递归深度,RuntimeError在这种情况下会被提起。

    模块API

    pickle.dump(obj, file[, protocol])

    • Write a pickled representation of obj to the open file object file. This is equivalent to Pickler(file,``protocol).dump(obj).
      If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.
      *Changed in version 2.3: *Introduced the protocol parameter.
      file must have a write() method that accepts a single string argument. It can thus be a file object opened for writing, a StringIO object, or any other custom object that meets this interface.
    • pickle.load(file)
    • Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to Unpickler(file).load().
      file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, a StringIO object, or any other custom object that meets this interface.
      This function automatically determines whether the data stream was written in binary mode or not.
    • pickle.dumps(obj[, protocol])
    • Return the pickled representation of the object as a string, instead of writing it to a file.
      If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.
      *Changed in version 2.3: *The protocol parameter was added.
    • pickle.loads(string)
    • Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.

    至于应用场景,比较常见的有如下几种:

    程序重启时恢复上次的状态、会话存储、对象的网络传输。

    image

    相关文章

      网友评论

          本文标题:scrapy_redis中序列化源码及其在程序设计中的应用

          本文链接:https://www.haomeiwen.com/subject/zcypmqtx.html