Base64是网络上最常见的用于传输8Bit的编码方式之一,Base64就是一种基于64个可打印字符来表示二进制数据的方法。可查看RFC2045
~RFC2049,上面有MIME的详细规范。
自从iOS7之后苹果系统中添加了Base64编码解码的方法
@interface NSData (NSDataBase64Encoding)
/* Create an NSData from a Base-64 encoded NSString using the given options. By default, returns nil when the input is not recognized as valid Base-64.
*/
- (nullable instancetype)initWithBase64EncodedString:(NSString *)base64String options:(NSDataBase64DecodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
/* Create a Base-64 encoded NSString from the receiver's contents using the given options.
*/
- (NSString *)base64EncodedStringWithOptions:(NSDataBase64EncodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
/* Create an NSData from a Base-64, UTF-8 encoded NSData. By default, returns nil when the input is not recognized as valid Base-64.
*/
- (nullable instancetype)initWithBase64EncodedData:(NSData *)base64Data options:(NSDataBase64DecodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
/* Create a Base-64, UTF-8 encoded NSData from the receiver's contents using the given options.
*/
- (NSData *)base64EncodedDataWithOptions:(NSDataBase64EncodingOptions)options API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
@end
但是为了更好的理解,我们还是需要了解编码解码的原理,并根据源码实现进行分析理解。
Base64 转码原理
Base64其实是一种编码方式,按照 每三个8Bit的字节转换为四个6Bit的字节 的编码规则对二进制数据进行编码。
例如: s 1 3
-
转码前: s 1 3
-
对应的ascii :115 49 51
-
对应二进制: 01110011 00110001 00110011 (三个8Bit的字节)
-
转换 每三个8Bit的字节转换为四个6Bit的字节:
前: 01110011 00110001 00110011
后: 011100 110011 000100 110011 -
字节补位: 由于计算机一个字节占8位,不够就自动补两个高位0
-
补位后: 00011100 00110011 00000100 00110011
-
转换后ascii: 28 51 4 51
-
根据对应表:c z E z
所以编码之后的数据为: czEz
s13 ====base64===> caEz
以上是Base64的编码原理,解码的原理就是对上述步骤进行逆序操作。
编码字符集
标准的Base64 编码集
static const char *kBase64EncodeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
但是标准的Base64并不适合直接放在URL里传输,因为URL编码器会把标准Base64中的“/”和“+”字符变为形如“%XX”的形式,而这些“%”号在存入数据库时还需要再进行转换,因为ANSI SQL中已将“%”号用作通配符。
为解决此问题,可采用一种用于URL的改进Base64编码,它在末尾填充'='号,并将标准Base64中的“+”和“/”分别改成了“-”和“_”,这样就免去了在URL编解码和数据库存储时所要作的转换,避免了编码信息长度在此过程中的增加,并统一了数据库、表单等处对象标识符的格式。
static const char *kWebSafeBase64EncodeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
转码规则
① 把3个字符变成4个字符。
② 每76个字符加一个换行符。
③ 最后的结束符也要处理。
源码分析:
- Objective-C 实现 摘自https://github.com/r258833095/GTMBase64
中的实现代码)
编码核心的几个方法
+(NSData *)encodeData:(NSData *)data {
return [self baseEncode:[data bytes]
length:[data length]
charset:kBase64EncodeChars
padded:YES];
}
+(NSData *)baseEncode:(const void *)bytes
length:(NSUInteger)length
charset:(const char *)charset
padded:(BOOL)padded {
// how big could it be?
NSUInteger maxLength = CalcEncodedLength(length, padded);
// make space
NSMutableData *result = [NSMutableData data];
[result setLength:maxLength];
// do it
NSUInteger finalLength = [self baseEncode:bytes
srcLen:length
destBytes:[result mutableBytes]
destLen:[result length]
charset:charset
padded:padded];
if (finalLength) {
_GTMDevAssert(finalLength == maxLength, @"how did we calc the length wrong?");
} else {
// shouldn't happen, this means we ran out of space
result = nil;
}
return result;
}
根据上面的代码,我们可以看到 获取转换之后数据大小
// how big could it be?
NSUInteger maxLength = CalcEncodedLength(length, padded);
我们看一下具体的计算方法
GTM_INLINE NSUInteger CalcEncodedLength(NSUInteger srcLen, BOOL padded) {
NSUInteger intermediate_result = 8 * srcLen + 5;
NSUInteger len = intermediate_result / 6;
if (padded) {
len = ((len + 3) / 4) * 4;
}
return len;
}
具体编码实现 此处为了方便说明将代码分割成了多个部分
+(NSUInteger)baseEncode:(const char *)srcBytes
srcLen:(NSUInteger)srcLen
destBytes:(char *)destBytes
destLen:(NSUInteger)destLen
charset:(const char *)charset
padded:(BOOL)padded {
if (!srcLen || !destLen || !srcBytes || !destBytes) {
return 0;
}
char *curDest = destBytes;
const unsigned char *curSrc = (const unsigned char *)(srcBytes);
下面部分代码对每一个编码单元进行编码 (把3个字符变成4个字符)
// Three bytes of data encodes to four characters of cyphertext.
// So we can pump through three-byte chunks atomically.
while (srcLen > 2) {
// space?
_GTMDevAssert(destLen >= 4, @"our calc for encoded length was wrong");
curDest[0] = charset[curSrc[0] >> 2];
curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
curDest[2] = charset[((curSrc[1] & 0x0f) << 2) + (curSrc[2] >> 6)];
curDest[3] = charset[curSrc[2] & 0x3f];
curDest += 4;
curSrc += 3;
srcLen -= 3;
destLen -= 4;
}
下面部分对剩余数据进行 添加 = 处理
// now deal with the tail (<=2 bytes)
switch (srcLen) {
case 0:
// Nothing left; nothing more to do.
break;
case 1:
// One byte left: this encodes to two characters, and (optionally)
// two pad characters to round out the four-character cypherblock.
_GTMDevAssert(destLen >= 2, @"our calc for encoded length was wrong");
curDest[0] = charset[curSrc[0] >> 2];
curDest[1] = charset[(curSrc[0] & 0x03) << 4];
curDest += 2;
destLen -= 2;
if (padded) {
_GTMDevAssert(destLen >= 2, @"our calc for encoded length was wrong");
curDest[0] = kBase64PaddingChar;
curDest[1] = kBase64PaddingChar;
curDest += 2;
}
break;
case 2:
// Two bytes left: this encodes to three characters, and (optionally)
// one pad character to round out the four-character cypherblock.
_GTMDevAssert(destLen >= 3, @"our calc for encoded length was wrong");
curDest[0] = charset[curSrc[0] >> 2];
curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
curDest[2] = charset[(curSrc[1] & 0x0f) << 2];
curDest += 3;
destLen -= 3;
if (padded) {
_GTMDevAssert(destLen >= 1, @"our calc for encoded length was wrong");
curDest[0] = kBase64PaddingChar;
curDest += 1;
}
break;
}
// return the length
return (curDest - destBytes);
}
编码单元
curDest[0] = charset[curSrc[0] >> 2];
curDest[1] = charset[((curSrc[0] & 0x03) << 4) + (curSrc[1] >> 4)];
curDest[2] = charset[((curSrc[1] & 0x0f) << 2) + (curSrc[2] >> 6)];
curDest[3] = charset[curSrc[2] & 0x3f];
完成的操作就是上文中提到的
- 转换 每三个8Bit的字节转换为四个6Bit的字节:
- 字节补位: 由于计算机一个字节占8位,不够就自动补两个高位0
- 补位后: 00011100 00110011 00000100 00110011
- 转换后ascii: 28 51 4 51
- 根据对应表:c z E z
上面的代码中用到了 位运算 与 位移
网友评论