美文网首页
JS函数charCodeAt的Lua实现

JS函数charCodeAt的Lua实现

作者: 不李不外的 | 来源:发表于2017-03-25 10:16 被阅读0次

    JS函数charCodeAt的Lua实现

    charCodeAt by Lua

    @(Lua JavaScript charCodeAt)

    I wanted to have a function charCodeAt in Lua ,and it should works exactly like javascript
    but with Lua5.1 ,UTF8 and Unicode are not supported,

    1: how charCodeAt works in javascript

    to show Console press F12 in Chrome( MAC:CMD+alt+J)

    [
    '你'.charCodeAt(0),
    'ñ'.charCodeAt(0),
    'n'.charCodeAt(0)
    ]
    

    it will output [20320, 241, 110] ,it means the numeric value of Unicode , '你'=20320 , 'ñ'=241, 'n'=110.

    The charCodeAt() method returns the numeric Unicode value of the character at the given index (except for unicode codepoints > 0x10000).

    according to alexander-yakushev we can know how many bytes one UTF8 word takes using function utf8.charbytes
    [https://github.com/alexander-yakushev/awesompd/blob/master/utf8.lua]

    function utf8.charbytes (s, i)
       -- argument defaults
       i = i or 1
       local c = string.byte(s, i) 
       -- determine bytes needed for character, based on RFC 3629
       if c > 0 and c <= 127 then
          -- UTF8-1 byte
          return 1
       elseif c >= 194 and c <= 223 then
          -- UTF8-2 byte
          return 2
       elseif c >= 224 and c <= 239 then
          -- UTF8-3 byte
          return 3
       elseif c >= 240 and c <= 244 then
          -- UTF8-4 byte
          return 4
       end
    end
    

    Unicode & UTF8 convert method

    Unicode code range UTF-8 code example
    hex code binary code char
    0000 0000-0000 007F 0xxxxxxx n(alphabet)
    0000 0000-0000 007F 110xxxxx 10xxxxxx ñ
    0000 0080-0000 07FF 1110xxxx 10xxxxxx 10xxxxxx (most CJK)
    0001 0000-0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx other chars

    but we should pay attention to 4 bytes UTF8[emoji], it works not that simple

    special Method

    javascript engine using UTF16,characters in Basic Multilingual Plane were the same with unicode, but if the characters were in Supplementary Plane it should use the formula below,usually we encounter Supplementary Plane emoji like😝 (4 byte UTF8 character)

    -- formula 1
    H = Math.floor((c-0x10000) / 0x400)+0xD800 
    L = (c - 0x10000) % 0x400 + 0xDC00
    

    code is here

    https://github.com/lilien1010/lua-bit

    Feedback & Bug Report


    Thank you for reading this , if you got any better idea, share it.

    相关文章

      网友评论

          本文标题:JS函数charCodeAt的Lua实现

          本文链接:https://www.haomeiwen.com/subject/jbzqottx.html