Base64

作者: 犯色戒的和尚 | 来源:发表于2019-05-17 17:02 被阅读0次

    Base64是网络上最常见的用于传输8Bit的编码方式之一,Base64就是一种基于64个可打印字符来表示二进制数据的方法。可查看RFC2045
    ~RFC2049
    ,上面有MIME的详细规范。

    自从iOS7之后苹果系统中添加了Base64编码解码的方法

    @interface NSData (NSDataBase64Encoding)
    
    /* Create an NSData from a Base-64 encoded NSString using the given options. By default, returns nil when the input is not recognized as valid Base-64.
    */
    - (nullable instancetype)initWithBase64EncodedString:(NSString *)base64String options:(NSDataBase64DecodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
    
    /* Create a Base-64 encoded NSString from the receiver's contents using the given options.
    */
    - (NSString *)base64EncodedStringWithOptions:(NSDataBase64EncodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
    
    /* Create an NSData from a Base-64, UTF-8 encoded NSData. By default, returns nil when the input is not recognized as valid Base-64.
    */
    - (nullable instancetype)initWithBase64EncodedData:(NSData *)base64Data options:(NSDataBase64DecodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
    
    /* Create a Base-64, UTF-8 encoded NSData from the receiver's contents using the given options.
    */
    - (NSData *)base64EncodedDataWithOptions:(NSDataBase64EncodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
    
    @end
    

    但是为了更好的理解,我们还是需要了解编码解码的原理,并根据源码实现进行分析理解。

    Base64 转码原理

    Base64其实是一种编码方式,按照 每三个8Bit的字节转换为四个6Bit的字节 的编码规则对二进制数据进行编码。

    例如: s 1 3

    • 转码前: s 1 3

    • 对应的ascii :115 49 51

    • 对应二进制: 01110011 00110001 00110011 (三个8Bit的字节)

    • 转换 每三个8Bit的字节转换为四个6Bit的字节:
      前: 01110011 00110001 00110011
      后: 011100 110011 000100 110011

    • 字节补位: 由于计算机一个字节占8位,不够就自动补两个高位0

    • 补位后: 00011100 00110011 00000100 00110011

    • 转换后ascii: 28 51 4 51

    • 根据对应表:c z E z

    所以编码之后的数据为: czEz

    s13 ====base64===> caEz

    以上是Base64的编码原理,解码的原理就是对上述步骤进行逆序操作。

    编码字符集

    标准的Base64 编码集

    static const char *kBase64EncodeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
    
    

    但是标准的Base64并不适合直接放在URL里传输,因为URL编码器会把标准Base64中的“/”和“+”字符变为形如“%XX”的形式,而这些“%”号在存入数据库时还需要再进行转换,因为ANSI SQL中已将“%”号用作通配符。

    为解决此问题,可采用一种用于URL的改进Base64编码,它在末尾填充'='号,并将标准Base64中的“+”和“/”分别改成了“-”和“_”,这样就免去了在URL编解码和数据库存储时所要作的转换,避免了编码信息长度在此过程中的增加,并统一了数据库、表单等处对象标识符的格式。

    static const char *kWebSafeBase64EncodeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
    
    

    转码规则

    ① 把3个字符变成4个字符。
    ② 每76个字符加一个换行符。
    ③ 最后的结束符也要处理。

    源码分析:

    编码核心的几个方法

    +(NSData *)encodeData:(NSData *)data {
        return [self baseEncode:[data bytes]
                         length:[data length]
                        charset:kBase64EncodeChars
                         padded:YES];
    }
    
    +(NSData *)baseEncode:(const void *)bytes
                   length:(NSUInteger)length
                  charset:(const char *)charset
                   padded:(BOOL)padded {
        // how big could it be?
        NSUInteger maxLength = CalcEncodedLength(length, padded);
        // make space
        NSMutableData *result = [NSMutableData data];
        [result setLength:maxLength];
        // do it
        NSUInteger finalLength = [self baseEncode:bytes
                                           srcLen:length
                                        destBytes:[result mutableBytes]
                                          destLen:[result length]
                                          charset:charset
                                           padded:padded];
        if (finalLength) {
            _GTMDevAssert(finalLength == maxLength, @"how did we calc the length wrong?");
        } else {
            // shouldn't happen, this means we ran out of space
            result = nil;
        }
        return result;
    }
    

    根据上面的代码,我们可以看到 获取转换之后数据大小

    // how big could it be?
        NSUInteger maxLength = CalcEncodedLength(length, padded);
    

    我们看一下具体的计算方法

    GTM_INLINE NSUInteger CalcEncodedLength(NSUInteger srcLen, BOOL padded) {
        NSUInteger intermediate_result = 8 * srcLen + 5;
        NSUInteger len = intermediate_result / 6;
        if (padded) {
            len = ((len + 3) / 4) * 4;
        }
        return len;
    }
    

    具体编码实现 此处为了方便说明将代码分割成了多个部分

    +(NSUInteger)baseEncode:(const char *)srcBytes
                     srcLen:(NSUInteger)srcLen
                  destBytes:(char *)destBytes
                    destLen:(NSUInteger)destLen
                    charset:(const char *)charset
                     padded:(BOOL)padded {
        if (!srcLen || !destLen || !srcBytes || !destBytes) {
            return 0;
        }
        
        char *curDest = destBytes;
        const unsigned char *curSrc = (const unsigned char *)(srcBytes);
    

    下面部分代码对每一个编码单元进行编码 (把3个字符变成4个字符)

        // Three bytes of data encodes to four characters of cyphertext.
        // So we can pump through three-byte chunks atomically.
        while (srcLen > 2) {
            // space?
            _GTMDevAssert(destLen >= 4, @"our calc for encoded length was wrong");
            curDest[0] = charset[curSrc[0] >> 2];
            curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
            curDest[2] = charset[((curSrc[1] & 0x0f) << 2) + (curSrc[2] >> 6)];
            curDest[3] = charset[curSrc[2] & 0x3f];
            
            curDest += 4;
            curSrc += 3;
            srcLen -= 3;
            destLen -= 4;
        }
    

    下面部分对剩余数据进行 添加 = 处理

        // now deal with the tail (<=2 bytes)
        switch (srcLen) {
            case 0:
                // Nothing left; nothing more to do.
                break;
            case 1:
                // One byte left: this encodes to two characters, and (optionally)
                // two pad characters to round out the four-character cypherblock.
                _GTMDevAssert(destLen >= 2, @"our calc for encoded length was wrong");
                curDest[0] = charset[curSrc[0] >> 2];
                curDest[1] = charset[(curSrc[0] & 0x03) << 4];
                curDest += 2;
                destLen -= 2;
                if (padded) {
                    _GTMDevAssert(destLen >= 2, @"our calc for encoded length was wrong");
                    curDest[0] = kBase64PaddingChar;
                    curDest[1] = kBase64PaddingChar;
                    curDest += 2;
                }
                break;
            case 2:
                // Two bytes left: this encodes to three characters, and (optionally)
                // one pad character to round out the four-character cypherblock.
                _GTMDevAssert(destLen >= 3, @"our calc for encoded length was wrong");
                curDest[0] = charset[curSrc[0] >> 2];
                curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
                curDest[2] = charset[(curSrc[1] & 0x0f) << 2];
                curDest += 3;
                destLen -= 3;
                if (padded) {
                    _GTMDevAssert(destLen >= 1, @"our calc for encoded length was wrong");
                    curDest[0] = kBase64PaddingChar;
                    curDest += 1;
                }
                break;
        }
        // return the length
        return (curDest - destBytes);
    }
    
    

    编码单元

            curDest[0] = charset[curSrc[0] >> 2];
            curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
            curDest[2] = charset[((curSrc[1] & 0x0f) << 2) + (curSrc[2] >> 6)];
            curDest[3] = charset[curSrc[2] & 0x3f];
    

    完成的操作就是上文中提到的

    • 转换 每三个8Bit的字节转换为四个6Bit的字节:
    • 字节补位: 由于计算机一个字节占8位,不够就自动补两个高位0
    • 补位后: 00011100 00110011 00000100 00110011
    • 转换后ascii: 28 51 4 51
    • 根据对应表:c z E z

    上面的代码中用到了 位运算 位移

    相关文章

      网友评论

          本文标题:Base64

          本文链接:https://www.haomeiwen.com/subject/simoaqtx.html