美文网首页
ltrim 遇到的一点问题

ltrim 遇到的一点问题

作者: 小东班吉 | 来源:发表于2019-08-13 20:34 被阅读0次

    ltrim 遇到的一点问题

    ltrim(str, character_mask) 删除字符串开头的空白字符(或其他字符),character_mask指的就是其他字符,也支持字符范围。

    今天遇到一个小问题,去除固定字符串的,举个栗子:

    php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
    string(11) "gc/abc.jpeg"
    php > var_dump (ltrim('upload/ugc/abc.jpeg',"upload"));
    string(11) "/ugc/abc.jpeg"
    

    刚一开始感觉比较疑惑,想了下第一个是在匹配到upload后边的/和u符合所匹配的字符,匹配到g的时候发现不一样,停止匹配,返回删除后的字符串。

    php内部实现源码如下:

    PHP_FUNCTION(ltrim)
    {
        php_do_trim(INTERNAL_FUNCTION_PARAM_PASSTHRU, 1);
    }
    static zend_always_inline zend_string *php_trim_int(zend_string *str, char *what, size_t what_len, int mode)
    {
       const char *start = ZSTR_VAL(str);
       const char *end = start + ZSTR_LEN(str);
       char mask[256];
       if (what) {
          if (what_len == 1) {
             char p = *what;
             if (mode & 1) {
                while (start != end) {
                   if (*start == p) {
                      start++;
                   } else {
                      break;
                   }
                }
             }
             if (mode & 2) {
                while (start != end) {
                   if (*(end-1) == p) {
                      end--;
                   } else {
                      break;
                   }
                }
             }
          } else {
             php_charmask((unsigned char*)what, what_len, mask);
    
             if (mode & 1) {
                while (start != end) {
                   if (mask[(unsigned char)*start]) {
                      start++;
                   } else {
                      break;
                   }
                }
             }
             if (mode & 2) {
                while (start != end) {
                   if (mask[(unsigned char)*(end-1)]) {
                      end--;
                   } else {
                      break;
                   }
                }
             }
          }
       } else {
          if (mode & 1) {
             while (start != end) {
                unsigned char c = (unsigned char)*start;
    
                if (c <= ' ' &&
                    (c == ' ' || c == '\n' || c == '\r' || c == '\t' || c == '\v' || c == '\0')) {
                   start++;
                } else {
                   break;
                }
             }
          }
          if (mode & 2) {
             while (start != end) {
                unsigned char c = (unsigned char)*(end-1);
    
                if (c <= ' ' &&
                    (c == ' ' || c == '\n' || c == '\r' || c == '\t' || c == '\v' || c == '\0')) {
                   end--;
                } else {
                   break;
                }
             }
          }
       }
    
       if (ZSTR_LEN(str) == end - start) {
          return zend_string_copy(str);
       } else if (end - start == 0) {
          return ZSTR_EMPTY_ALLOC();
       } else {
          return zend_string_init(start, end - start, 0);
       }
    }
    
    static inline int php_charmask(const unsigned char *input, size_t len, char *mask)
    {
        //ltrim('/upload/ugc/abc.jpeg',"/upload")
        //ltrim('peg',"a..b") +3
        //ltrim('peg',"...b") +3
        //ltrim('peg',"..") +1
        //ltrim('peg',"..a") +1
        //ltrim('peg',"a..") +1
        //ltrim('peg',".a") +1   v
        //ltrim('peg',"a.") +1   v
        const unsigned char *end;
        unsigned char c;
        int result = SUCCESS;
    
        memset(mask, 0, 256);
        for (end = input+len; input < end; input++) {
            c=*input;
            if ((input+3 < end) && input[1] == '.' && input[2] == '.'
                    && input[3] >= c) {
                memset(mask+c, 1, input[3] - c + 1); // mask的值0and 1表示标示是否存在
                input+=3;
            } else if ((input+1 < end) && input[0] == '.' && input[1] == '.') {
                /* Error, try to be as helpful as possible:
                   (a range ending/starting with '.' won't be captured here) */
                if (end-len >= input) { /* there was no 'left' char */
                    php_error_docref(NULL, E_WARNING, "Invalid '..'-range, no character to the left of '..'");
                    result = FAILURE;
                    continue;
                }
                if (input+2 >= end) { /* there is no 'right' char */
                    php_error_docref(NULL, E_WARNING, "Invalid '..'-range, no character to the right of '..'");
                    result = FAILURE;
                    continue;
                }
                if (input[-1] > input[2]) { /* wrong order */
                    php_error_docref(NULL, E_WARNING, "Invalid '..'-range, '..'-range needs to be incrementing");
                    result = FAILURE;
                    continue;
                }
                /* FIXME: better error (a..b..c is the only left possibility?) */
                php_error_docref(NULL, E_WARNING, "Invalid '..'-range");
                result = FAILURE;
                continue;
            } else {
                mask[c]=1;
            }
        }
        return result;
    }
    

    what是我们指定的要去除的字符串,what_len是长度,mod区分是哪一边这里是左边,1代表左边,2代表右边,3两边。根据我的例子上面会走到php_charmask这个函数这里,函数里判断了what是不是区间匹配,如果不是只是在mask标志的位置做了一个简单的标记。用0和1区分,while循环里去匹配字符,一旦搜索的不存在就break。

    这也说明了上面的匹配模式是以字符串的内容为主的,不管需要去除的字符串是什么,只要遍历的字符串内容没结束就一直匹配,直到遇到匹配不到的才跳出。

    php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
    string(11) "gc/abc.jpeg"
    //1.把要去除的字符串/upload放进一个数组mask里,key为字符,值为1
    //2.遍历被去除内容的字符串的每一个字符,与mask里的字符匹配,匹配到就继续匹配下一个字符,遇到匹配不到的就结束。
    

    这也说明了要匹配删除的字符串的顺序并不重要,比如这样结果也是一样的。

    php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
    string(11) "gc/abc.jpeg"
    

    范围匹配删除是有判断的,举一些栗子:

    php > var_dump (ltrim('upload/ugc/abc.jpeg',"u.."));
    Warning: ltrim(): Invalid '..'-range, no character to the right of '..' in php shell code on line 1
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('upload/ugc/abc.jpeg',".u"));
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('upload/ugc/abc.jpeg',"..u"));
    Warning: ltrim(): Invalid '..'-range, no character to the left of '..' in php shell code on line 1
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('upload/ugc/abc.jpeg',"u.."));
    Warning: ltrim(): Invalid '..'-range, no character to the right of '..' in php shell code on line 1
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('upload/ugc/abc.jpeg',"u."));
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('up.load/ugc/abc.jpeg',"u."));
    string(19) "p.load/ugc/abc.jpeg"
    php > var_dump (ltrim('u.pload/ugc/abc.jpeg',"u."));
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('u.pload/ugc/abc.jpeg',".u"));
    string(18) "pload/ugc/abc.jpeg"
    php > var_dump (ltrim('u.pload/ugc/abc.jpeg',"...u"));
    string(0) ""
    php > var_dump (ltrim('upload/ugc/abc.jpeg',".."));
    Warning: ltrim(): Invalid '..'-range, no character to the left of '..' in php shell code on line 1
    string(19) "upload/ugc/abc.jpeg"
    

    有兴趣的可以根据源码仔细看看是为什么,这里我也只是做个记录,有不对的地方,多多指出。

    相关文章

      网友评论

          本文标题:ltrim 遇到的一点问题

          本文链接:https://www.haomeiwen.com/subject/nbhijctx.html