ltrim 遇到的一点问题
ltrim(str, character_mask) 删除字符串开头的空白字符(或其他字符),character_mask指的就是其他字符,也支持字符范围。
今天遇到一个小问题,去除固定字符串的,举个栗子:
php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
string(11) "gc/abc.jpeg"
php > var_dump (ltrim('upload/ugc/abc.jpeg',"upload"));
string(11) "/ugc/abc.jpeg"
刚一开始感觉比较疑惑,想了下第一个是在匹配到upload后边的/和u符合所匹配的字符,匹配到g的时候发现不一样,停止匹配,返回删除后的字符串。
php内部实现源码如下:
PHP_FUNCTION(ltrim)
{
php_do_trim(INTERNAL_FUNCTION_PARAM_PASSTHRU, 1);
}
static zend_always_inline zend_string *php_trim_int(zend_string *str, char *what, size_t what_len, int mode)
{
const char *start = ZSTR_VAL(str);
const char *end = start + ZSTR_LEN(str);
char mask[256];
if (what) {
if (what_len == 1) {
char p = *what;
if (mode & 1) {
while (start != end) {
if (*start == p) {
start++;
} else {
break;
}
}
}
if (mode & 2) {
while (start != end) {
if (*(end-1) == p) {
end--;
} else {
break;
}
}
}
} else {
php_charmask((unsigned char*)what, what_len, mask);
if (mode & 1) {
while (start != end) {
if (mask[(unsigned char)*start]) {
start++;
} else {
break;
}
}
}
if (mode & 2) {
while (start != end) {
if (mask[(unsigned char)*(end-1)]) {
end--;
} else {
break;
}
}
}
}
} else {
if (mode & 1) {
while (start != end) {
unsigned char c = (unsigned char)*start;
if (c <= ' ' &&
(c == ' ' || c == '\n' || c == '\r' || c == '\t' || c == '\v' || c == '\0')) {
start++;
} else {
break;
}
}
}
if (mode & 2) {
while (start != end) {
unsigned char c = (unsigned char)*(end-1);
if (c <= ' ' &&
(c == ' ' || c == '\n' || c == '\r' || c == '\t' || c == '\v' || c == '\0')) {
end--;
} else {
break;
}
}
}
}
if (ZSTR_LEN(str) == end - start) {
return zend_string_copy(str);
} else if (end - start == 0) {
return ZSTR_EMPTY_ALLOC();
} else {
return zend_string_init(start, end - start, 0);
}
}
static inline int php_charmask(const unsigned char *input, size_t len, char *mask)
{
//ltrim('/upload/ugc/abc.jpeg',"/upload")
//ltrim('peg',"a..b") +3
//ltrim('peg',"...b") +3
//ltrim('peg',"..") +1
//ltrim('peg',"..a") +1
//ltrim('peg',"a..") +1
//ltrim('peg',".a") +1 v
//ltrim('peg',"a.") +1 v
const unsigned char *end;
unsigned char c;
int result = SUCCESS;
memset(mask, 0, 256);
for (end = input+len; input < end; input++) {
c=*input;
if ((input+3 < end) && input[1] == '.' && input[2] == '.'
&& input[3] >= c) {
memset(mask+c, 1, input[3] - c + 1); // mask的值0and 1表示标示是否存在
input+=3;
} else if ((input+1 < end) && input[0] == '.' && input[1] == '.') {
/* Error, try to be as helpful as possible:
(a range ending/starting with '.' won't be captured here) */
if (end-len >= input) { /* there was no 'left' char */
php_error_docref(NULL, E_WARNING, "Invalid '..'-range, no character to the left of '..'");
result = FAILURE;
continue;
}
if (input+2 >= end) { /* there is no 'right' char */
php_error_docref(NULL, E_WARNING, "Invalid '..'-range, no character to the right of '..'");
result = FAILURE;
continue;
}
if (input[-1] > input[2]) { /* wrong order */
php_error_docref(NULL, E_WARNING, "Invalid '..'-range, '..'-range needs to be incrementing");
result = FAILURE;
continue;
}
/* FIXME: better error (a..b..c is the only left possibility?) */
php_error_docref(NULL, E_WARNING, "Invalid '..'-range");
result = FAILURE;
continue;
} else {
mask[c]=1;
}
}
return result;
}
what是我们指定的要去除的字符串,what_len是长度,mod区分是哪一边这里是左边,1代表左边,2代表右边,3两边。根据我的例子上面会走到php_charmask这个函数这里,函数里判断了what是不是区间匹配,如果不是只是在mask标志的位置做了一个简单的标记。用0和1区分,while循环里去匹配字符,一旦搜索的不存在就break。
这也说明了上面的匹配模式是以字符串的内容为主的,不管需要去除的字符串是什么,只要遍历的字符串内容没结束就一直匹配,直到遇到匹配不到的才跳出。
php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
string(11) "gc/abc.jpeg"
//1.把要去除的字符串/upload放进一个数组mask里,key为字符,值为1
//2.遍历被去除内容的字符串的每一个字符,与mask里的字符匹配,匹配到就继续匹配下一个字符,遇到匹配不到的就结束。
这也说明了要匹配删除的字符串的顺序并不重要,比如这样结果也是一样的。
php > var_dump (ltrim('/upload/ugc/abc.jpeg',"/upload"));
string(11) "gc/abc.jpeg"
范围匹配删除是有判断的,举一些栗子:
php > var_dump (ltrim('upload/ugc/abc.jpeg',"u.."));
Warning: ltrim(): Invalid '..'-range, no character to the right of '..' in php shell code on line 1
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('upload/ugc/abc.jpeg',".u"));
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('upload/ugc/abc.jpeg',"..u"));
Warning: ltrim(): Invalid '..'-range, no character to the left of '..' in php shell code on line 1
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('upload/ugc/abc.jpeg',"u.."));
Warning: ltrim(): Invalid '..'-range, no character to the right of '..' in php shell code on line 1
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('upload/ugc/abc.jpeg',"u."));
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('up.load/ugc/abc.jpeg',"u."));
string(19) "p.load/ugc/abc.jpeg"
php > var_dump (ltrim('u.pload/ugc/abc.jpeg',"u."));
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('u.pload/ugc/abc.jpeg',".u"));
string(18) "pload/ugc/abc.jpeg"
php > var_dump (ltrim('u.pload/ugc/abc.jpeg',"...u"));
string(0) ""
php > var_dump (ltrim('upload/ugc/abc.jpeg',".."));
Warning: ltrim(): Invalid '..'-range, no character to the left of '..' in php shell code on line 1
string(19) "upload/ugc/abc.jpeg"
有兴趣的可以根据源码仔细看看是为什么,这里我也只是做个记录,有不对的地方,多多指出。
网友评论