美文网首页
[转]NSStringEncoding关于文字编码问题的解决方法

[转]NSStringEncoding关于文字编码问题的解决方法

作者: MichaelLedger | 来源:发表于2018-04-27 10:08 被阅读1078次

    今天看见一个很棒的博客,只是无法粉丝之,就转载一下几篇很好用的博文吧
    今天在尝试抓取起点中文网首页的时候遇到了一个问题 — 如果编码没有用对的话是没办法读取任何东西的.
    这也算是C#用的太多养成的坏习惯, 以前基本没怎么考虑过编码问题. 应该说, C#里面就算编码错了, 也能读进来东西,
    只是一片乱码而已. Cocoa里面就狠了点, 直接抛异常了.
    下面是刚开始写的一段代码, 把起点中文网的主页下载到一个字符串中.

    NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
    NSError *error;
    NSString *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
    if(xml == nil) { 
        NSLog(@"Error reading url at %@", [error localizedFailureReason]); 
    } else {
        [result setString:xml]; 
    }
    

    死活下载失败, 错误信息就是编码不对. 好吧, 我打开了帮助查看了下所有的编码:

    enum {
    NSASCIIStringEncoding = 1,
    NSNEXTSTEPStringEncoding = 2,
    NSJapaneseEUCStringEncoding = 3,
    NSUTF8StringEncoding = 4,
    NSISOLatin1StringEncoding = 5,
    NSSymbolStringEncoding = 6,
    NSNonLossyASCIIStringEncoding = 7,
    NSShiftJISStringEncoding = 8,
    NSISOLatin2StringEncoding = 9,
    NSUnicodeStringEncoding = 10,
    NSWindowsCP1251StringEncoding = 11,
    NSWindowsCP1252StringEncoding = 12,
    NSWindowsCP1253StringEncoding = 13,
    NSWindowsCP1254StringEncoding = 14,
    NSWindowsCP1250StringEncoding = 15,
    NSISO2022JPStringEncoding = 21,
    NSMacOSRomanStringEncoding = 30,
    NSUTF16StringEncoding = NSUnicodeStringEncoding,
    NSUTF16BigEndianStringEncoding = 0x90000100,
    NSUTF16LittleEndianStringEncoding = 0x94000100,
    NSUTF32StringEncoding = 0x8c000100,
    NSUTF32BigEndianStringEncoding = 0x98000100,
    NSUTF32LittleEndianStringEncoding = 0x9c000100,
    };
    

    我一个一个的试,
    居然全都不行! 崩溃了, 这都什么年代了, 难道Cocoa还不支持中文? 不可能啊.
    估计是上面那份文档里面只是列出了最常用的几种编码
    我就写了下面这段代码输出了所有支持的编码:

    const NSStringEncoding *encodings = [NSString availableStringEncodings];
    NSMutableString *str = [[NSMutableString alloc] init];
    NSStringEncoding encoding;
    while ((encoding = *encodings++) != 0)
    {
    [str appendFormat: @"%@ === %in", [NSString localizedNameOfStringEncoding:encoding], encoding]; }
    [result setString: str];
    

    好家伙, 果然被我猜中了, 下面就是所有支持的编码列表

    Western (Mac OS Roman) === 30
    Japanese (Mac OS) === -2147483647
    Traditional Chinese (Mac OS) === -2147483646
    Korean (Mac OS) === -2147483645
    Arabic (Mac OS) === -2147483644
    Hebrew (Mac OS) === -2147483643
    Greek (Mac OS) === -2147483642
    Cyrillic (Mac OS) === -2147483641
    Devanagari (Mac OS) === -2147483639
    Gurmukhi (Mac OS) === -2147483638
    Gujarati (Mac OS) === -2147483637
    Thai (Mac OS) === -2147483627
    Simplified Chinese (Mac OS) === -2147483623
    Tibetan (Mac OS) === -2147483622
    Central European (Mac OS) === -2147483619
    Symbol (Mac OS) === 6
    Dingbats (Mac OS) === -2147483614
    Turkish (Mac OS) === -2147483613
    Croatian (Mac OS) === -2147483612
    Icelandic (Mac OS) === -2147483611
    Romanian (Mac OS) === -2147483610
    Celtic (Mac OS) === -2147483609
    Gaelic (Mac OS) === -2147483608
    Keyboard Symbols (Mac OS) === -2147483607
    Farsi (Mac OS) === -2147483508
    Cyrillic (Mac OS Ukrainian) === -2147483496
    Inuit (Mac OS) === -2147483412
    Unicode (UTF-32LE) === -1677721344
    Unicode (UTF-8) === 4
    Unicode (UTF-16) === 10
    Unicode (UTF-16BE) === -1879047936
    Unicode (UTF-16LE) === -1811939072
    Unicode (UTF-32) === -1946156800
    Unicode (UTF-32BE) === -1744830208
    Western (ISO Latin 1) === 5
    Central European (ISO Latin 2) === 9
    Western (ISO Latin 3) === -2147483133
    Central European (ISO Latin 4) === -2147483132
    Cyrillic (ISO 8859-5) === -2147483131
    Arabic (ISO 8859-6) === -2147483130
    Greek (ISO 8859-7) === -2147483129
    Hebrew (ISO 8859-8) === -2147483128
    Turkish (ISO Latin 5) === -2147483127
    Nordic (ISO Latin 6) === -2147483126
    Thai (ISO 8859-11) === -2147483125
    Baltic Rim (ISO Latin 7) === -2147483123
    Celtic (ISO Latin) === -2147483122
    Western (ISO Latin 9) === -2147483121
    Romanian (ISO Latin 10) === -2147483120
    Latin-US (DOS) === -2147482624
    Greek (DOS) === -2147482619
    Baltic Rim (DOS) === -2147482618
    Western (DOS Latin 1) === -2147482608
    Greek (DOS Greek 1) === -2147482607
    Central European (DOS Latin 2) === -2147482606
    Cyrillic (DOS) === -2147482605
    Turkish (DOS) === -2147482604
    Portuguese (DOS) === -2147482603
    Icelandic (DOS) === -2147482602
    Hebrew (DOS) === -2147482601
    Canadian French (DOS) === -2147482600
    Arabic (DOS) === -2147482599
    Nordic (DOS) === -2147482598
    Cyrillic (DOS) === -2147482597
    Greek (DOS Greek 2) === -2147482596
    Thai (Windows, DOS) === -2147482595
    Japanese (Windows, DOS) === 8
    Simplified Chinese (Windows, DOS) === -2147482591
    Korean (Windows, DOS) === -2147482590
    Traditional Chinese (Windows, DOS) === -2147482589
    Western (Windows Latin 1) === 12
    Central European (Windows Latin 2) === 15
    Cyrillic (Windows) === 11
    Greek (Windows) === 13
    Turkish (Windows Latin 5) === 14
    Hebrew (Windows) === -2147482363
    Arabic (Windows) === -2147482362
    Baltic Rim (Windows) === -2147482361
    Vietnamese (Windows) === -2147482360
    Western (ASCII) === 1
    Japanese (Shift JIS X0213) === -2147482072
    Chinese (GBK) === -2147482063
    Chinese (GB 18030) === -2147482062
    Japanese (ISO 2022-JP) === 21
    Korean (ISO 2022-KR) === -2147481536
    Japanese (EUC) === 3
    Simplified Chinese (EUC) === -2147481296
    Traditional Chinese (EUC) === -2147481295
    Korean (EUC) === -2147481280
    Japanese (Shift JIS) === -2147481087
    Cyrillic (KOI8-R) === -2147481086
    Traditional Chinese (Big 5) === -2147481085
    Western (Mac Mail) === -2147481084
    Simplified Chinese (HZ GB 2312) === -2147481083
    Traditional Chinese (Big 5 HKSCS) === -2147481082
    Ukrainian (KOI8-U) === -2147481080
    Traditional Chinese (Big 5-E) === -2147481079
    Western (NextStep) === 2
    Non-lossy ASCII === 7
    Western (EBCDIC Latin 1) === -2147480574
    

    终于看到了熟悉的 GBK 编码, 对应的代码是 -2147482063. Ok, 更改一下最开始的代码

    NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
    NSError *error;
    NSStringEncoding encoder;
    NSString *xml = [NSString stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];
    if(xml == nil){ 
        NSLog(@"Error reading url at %@", [error localizedFailureReason]); 
    } else {
        [result setString:xml];
    }
    

    终于搞定了! 看到熟悉的中文真是激动了.


    HTTP请求头

    Accept:浏览器可接受的MIME类型 
    Accept-Charset:浏览器可接受的字符集 
    Accept-Encoding:浏览器能够进行解码的数据编码方式,如gzip. 
    Accept-Language:浏览器所希望的语言种类 
    Authorization:授权信息 
    Connection:表示是否需要持久连接 
    Content-Length:表示请求消息正文的长度 
    Cookie:请求头信息 
    From:请求发送者的email地址。 
    Host:初始URL中的主机和端口 
    If-Modified-Since:只有当所请求的内容在指定日期之后又经过修改才返回它,否者返回Not Modified 应答 
    Pragma:指定”no-cache”值 表示服务器必须返回一个刷新后的文档,即使他有代理服务器而且已经有叶面的本地拷贝 
    Referer:包含一个URL,用户从该URL代表的页面出发反问当前请求的页面 
    User-Agent:浏览器的类型 
    UA-Pixels,UA-Color,UA-OS和UA-CPU:非标准的请求头,表示屏幕大小,颜色深度,操作系统和CPU类型等。 
    

    HTTP应答头

    setContentType: 设置Content-Type头。大多数Servlet都要用到这个方法。 
    setContentLength:设置Content-Length头。对于支持持久HTTP连接的浏览器来说,这个函数是很有用的。 
    addCookie:设置一个Cookie 
    Allow:服务器支持那些请求方法 
    Content-Encoding:文档的编码方法 
    Content-Length: 
    Content-Type 表示后面的文档属于什么MIME类型。 
    Date:当前的GMT时间 
    Expired:应该在什么时候文档已经过期,从而不再缓存了。 
    Last-Modified:文档的最后改动时间。 
    Location:表示客户应当到哪里去提取文档。Location通常不是直接设置的 而是通过HttpServletResponse 中的 serRedirect()方法,同时设置状态码为302 
    Refresh:表示浏览器应该在多少时间之后刷新页面。 
    Server: 服务器名。 
    Set-Cookie:设置和叶面相关的Cookie 
    www-Authenticate 客户应该在Authenticate 投中应该提供什么类型的授权信息.
    

    技巧:不采用硬编码UTF8的方式,我们从应答中获取适当的编码.


    笔者是解码下面的文件出现类似的问题的:

    // Unity/SDKHeader/MIL/FBXParse.h
    
    #if _MSC_VER // this is defined when compiling with Visual Studio
        #ifdef  FBX_PARSE_EXPORT
        #define FBX_PARSE_EXPORT_API(x)     extern "C" __declspec(dllexport) x 
        #else 
        #define FBX_PARSE_EXPORT_API(x)     extern "C" __declspec(dllimport) x  
        #endif
    //#define FBX_PARSE_EXPORT_API __declspec(dllexport) // Visual Studio needs annotating exported functions with this
    #else
    #define FBX_PARSE_EXPORT_API(x) x// XCode does not need annotating exported functions, so define is empty
    #endif
    
    //BASE64Œª¥ÌŒÛ¬Î
    enum
    {
        MXR_SUCCESS,
    
        MXR_ERROR_BUFFER_TOOSMALL       =-0x0010,
        MXR_ERROR_INVALIDDATA           =-0x0012,
    };
    
    
    
    //º”√‹Œ™.mxrfbxŒƒº˛
    FBX_PARSE_EXPORT_API (bool) EncryptFBXFile(const wchar_t* pcFBXOrgPath);
    
    //Ω‚√‹.mxrfbxŒƒº˛,∑µªÿΩ‚√‹◊÷∑˚¥Æµƒ≥§∂»
    //–˵˜”√¡Ω¥Œ,µ⁄“ª¥Œ∑µªÿŒƒº˛¥Û–°£¨µ⁄∂˛¥ŒÕ®π˝pRet∑µªÿΩ‚√‹∫Ûµƒ◊÷∑˚¥Æ
    FBX_PARSE_EXPORT_API (int)  DecryptFBXFile(const wchar_t* pcFBXEncrypt, char* pRet=0);
    
    //º”√‹◊÷∑˚¥Æ
    FBX_PARSE_EXPORT_API(char*) Encryption(const char *pSndBuf, int &iSize, bool bEncryption);
    
    //º”√‹◊÷∑˚¥Æ
    FBX_PARSE_EXPORT_API(int) Encryption2(const char *pSndBuf, int iSize, char*pDst, bool bEncryption);
    
    //º”√‹◊÷∑˚¥Æ£¨≤¢±£¥ÊµΩŒƒº˛
    FBX_PARSE_EXPORT_API(bool) EncryptionFile(const char *pSndBuf, int iSize, const wchar_t *pFullPath);
    
    // Õ∑≈ª∫≥Â
    FBX_PARSE_EXPORT_API(void)  FreeEncryption(char* pEncryptBuffer);
    
    //Ω‚√‹◊÷∑˚¥Æ
    FBX_PARSE_EXPORT_API(bool) Decryption(char * &pBuffer, unsigned int uSize);
    
    //Ω‚√‹◊÷∑˚¥Æ
    FBX_PARSE_EXPORT_API(int) Decryption2(char *pSrc, int iSize, char*pDst);
    
    //Windows Encrypt////////////////////////////////////////////////
    FBX_PARSE_EXPORT_API(bool) Windows_Encrypt(const char* szName, const char* strSrc, char* strDest);
    FBX_PARSE_EXPORT_API(bool) Windows_Decrypt(const char* szName, const char* strSrc, char* strDest);
    
    //BASE64
    FBX_PARSE_EXPORT_API(int) Encrypt_Base64(const unsigned char *pSrc,int iSlen,unsigned char *pDst,int *iDlen);
    FBX_PARSE_EXPORT_API(int) Decrypt_Base64(const unsigned char *pSrc,int iSlen,unsigned char *pDst,int *iDlen);
    
    /*
    *𶃋£∫     —È÷§º§ªÓ¬Î°£
    *𶃋√Ë ˆ£∫µ√µΩ—È÷§–≈œ¢£¨±ÿ–Α⁄≥Öړª‘À–– ±µ˜”√£¨»∑±£‘⁄∆‰À¸Ω”ø⁄µ˜”√÷Æ«∞£¨◊Óœ»µ˜”√°£
    *≤Œ ˝Àµ√˜£∫
    *           pGUID£∫GUID∫≈£ª
    *           pPath£∫º§ªÓ¬Î–≈œ¢Œƒº˛(mxr.dat)¬∑æ∂£ª
    *           bForceGetActivateCode:«ø÷∆µ√µΩº§ªÓ¬Î ˝æ›£¨ƒ¨»œŒ™≤ª«ø÷∆£ª
    *∑µªÿ÷µ£∫   0£∫ ‘”√∞Ê£®”–ÀÆ”°£©£®º”√‹¬ÎŒƒº˛Œfi–ߪÚ≤ª¥Ê‘⁄£©£ª
                1£∫≥ÖÚ≤ªƒ‹‘À––£®º”√‹¬ÎŒƒº˛”––ߣ¨µ´‘À–– ±º‰Œfi–ߪÚπ˝∆⁄£©;
                2: ∑¢≤º∞Ê£®’˝ Ω∞Ê£©
    */
    FBX_PARSE_EXPORT_API(int) VerifyActivateCode(const char* pGUID, const wchar_t* pPath, bool bForceGetActivateCode = false);
    
    /*
    *𶃋£∫      «∑Ò «∑¢≤º∞ʱ氣
    *𶃋√Ë ˆ£∫ «∑Ò «∑¢≤º∞ʱ氣
    *≤Œ ˝Àµ√˜£∫Œfi
    *∑µªÿ÷µ£∫   TRUEŒ™∑¢≤º∞ʱ棨FALSEŒ™ ‘”√∞Ê°£
    */
    FBX_PARSE_EXPORT_API(bool) IsPublishVersion();
    
    /*
    *𶃋£∫      «∑Òœ‘ æLOGO°£
    *𶃋√Ë ˆ£∫ «∑Òœ‘ æLOGO°£
    *≤Œ ˝Àµ√˜£∫Œfi
    *∑µªÿ÷µ£∫   TRUEŒ™œ‘ 棨FALSEŒ™≤ªœ‘ æ°£
    */
    FBX_PARSE_EXPORT_API(bool) IsShowLogo();
    
    /*
    *𶃋£∫     µ√µΩLOGOŒª÷√°£
    *𶃋√Ë ˆ£∫µ√µΩLOGOŒª÷√°£
    *≤Œ ˝Àµ√˜£∫Œfi
    *∑µªÿ÷µ£∫   1~9£®1Œ™…œ◊Û£¨2Œ™…œ÷–£¨3Œ™…œ”“£¨4Œ™÷–◊Û£¨5Œ™÷–÷–£¨6Œ™÷–”“£¨7Œ™œ¬◊Û£¨8Œ™œ¬÷–£¨9Œ™œ¬”“£©
    */
    FBX_PARSE_EXPORT_API(int) GetLogoPosition();
    
    /*
    *𶃋£∫     µ√µΩLOGO±£¥Ê»´¬∑æ∂°£
    *𶃋√Ë ˆ£∫µ√µΩLOGO±£¥Ê»´¬∑æ∂£¨1£©≤Œ ˝…Ë÷√Œ™ø’£¨µ˜”√¥ÀΩ”ø⁄µ√µΩ¬∑æ∂≥§∂»£¨2£©‘Ÿµ˜”√¥ÀΩ”ø⁄£¨µ√µΩ¬∑æ∂£ª
    *≤Œ ˝Àµ√˜£∫
    *           pDestFullPathFile£∫±£¥Ê¬∑æ∂£¨¥Àø’º‰¥Û–°“™◊„πª≥§°£
    *∑µªÿ÷µ£∫   ¬∑æ∂≥§∂»°£
    */
    FBX_PARSE_EXPORT_API(int) GetLogoFile(char* pDestFullPathFile);
    
    //…˙≥…º§ªÓ¬ÎŒƒº˛
    FBX_PARSE_EXPORT_API(bool) CreateEncryptFile(const char* pData, const wchar_t* pPath);
    
    /*
    *𶃋£∫     ¥¥Ω®QRCODE
    *𶃋√Ë ˆ£∫¥¥Ω®QRCODE
    *≤Œ ˝Àµ√˜£∫
    *           pEncodeInfo£∫       ∂˛Œ¨¬Î–≈œ¢£¨ø…“‘Œ™ø’
    *           pFullPicturePath£∫  …˙≥…∂˛Œ¨¬Î¬∑æ∂      
    *∑µªÿ÷µ£∫   trueŒ™≥…𶣨falseŒ™ ß∞‹
    */
    FBX_PARSE_EXPORT_API(bool) GenerateQRCodePicture(const wchar_t* pEncodeInfo, const wchar_t* pFullPicturePath);
    

    转自:
    http://www.cnblogs.com/zhwl/archive/2012/12/31/2840746.html

    相关文章

      网友评论

          本文标题:[转]NSStringEncoding关于文字编码问题的解决方法

          本文链接:https://www.haomeiwen.com/subject/nsfblftx.html