System.out.println(Arrays.asList("&&".split("&")));
System.out.println(Arrays.asList(StringUtils.split("&&","&")));
输出:
[]
[]
改为以下方式:
System.out.println(Arrays.asList("&&".split("&",-1)));
System.out.println(Arrays.asList(StringUtils.splitPreserveAllTokens("&&","&")));
输出:
[, , ]
[, , ]
说明:
1、字符串的split方法
limit传0会丢弃末尾的空字符串
System.out.println(Arrays.asList("&&".split("&")));
System.out.println(Arrays.asList("&&a".split("&")));
输出:
[]
[, , a]
![](https://img.haomeiwen.com/i10074549/51ccbb01c467e50a.png)
public String[] split(String regex) {
return split(regex, 0);
}
/**
* Splits this string around matches of the given
* <a href="../util/regex/Pattern.html#sum">regular expression</a>.
*
* <p> The array returned by this method contains each substring of this
* string that is terminated by another substring that matches the given
* expression or is terminated by the end of the string. The substrings in
* the array are in the order in which they occur in this string. If the
* expression does not match any part of the input then the resulting array
* has just one element, namely this string.
*
* <p> When there is a positive-width match at the beginning of this
* string then an empty leading substring is included at the beginning
* of the resulting array. A zero-width match at the beginning however
* never produces such empty leading substring.
*
* <p> The {@code limit} parameter controls the number of times the
* pattern is applied and therefore affects the length of the resulting
* array. If the limit <i>n</i> is greater than zero then the pattern
* will be applied at most <i>n</i> - 1 times, the array's
* length will be no greater than <i>n</i>, and the array's last entry
* will contain all input beyond the last matched delimiter. If <i>n</i>
* is non-positive then the pattern will be applied as many times as
* possible and the array can have any length. If <i>n</i> is zero then
* the pattern will be applied as many times as possible, the array can
* have any length, and trailing empty strings will be discarded.
*
* <p> The string {@code "boo:and:foo"}, for example, yields the
* following results with these parameters:
*
* <blockquote><table cellpadding=1 cellspacing=0 summary="Split example showing regex, limit, and result">
* <tr>
* <th>Regex</th>
* <th>Limit</th>
* <th>Result</th>
* </tr>
* <tr><td align=center>:</td>
* <td align=center>2</td>
* <td>{@code { "boo", "and:foo" }}</td></tr>
* <tr><td align=center>:</td>
* <td align=center>5</td>
* <td>{@code { "boo", "and", "foo" }}</td></tr>
* <tr><td align=center>:</td>
* <td align=center>-2</td>
* <td>{@code { "boo", "and", "foo" }}</td></tr>
* <tr><td align=center>o</td>
* <td align=center>5</td>
* <td>{@code { "b", "", ":and:f", "", "" }}</td></tr>
* <tr><td align=center>o</td>
* <td align=center>-2</td>
* <td>{@code { "b", "", ":and:f", "", "" }}</td></tr>
* <tr><td align=center>o</td>
* <td align=center>0</td>
* <td>{@code { "b", "", ":and:f" }}</td></tr>
* </table></blockquote>
*
* <p> An invocation of this method of the form
* <i>str.</i>{@code split(}<i>regex</i>{@code ,} <i>n</i>{@code )}
* yields the same result as the expression
*
* <blockquote>
* <code>
* {@link java.util.regex.Pattern}.{@link
* java.util.regex.Pattern#compile compile}(<i>regex</i>).{@link
* java.util.regex.Pattern#split(java.lang.CharSequence,int) split}(<i>str</i>, <i>n</i>)
* </code>
* </blockquote>
*
*
* @param regex
* the delimiting regular expression
*
* @param limit
* the result threshold, as described above
*
* @return the array of strings computed by splitting this string
* around matches of the given regular expression
*
* @throws PatternSyntaxException
* if the regular expression's syntax is invalid
*
* @see java.util.regex.Pattern
*
* @since 1.4
* @spec JSR-51
*/
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
2、org.apache.commons.lang3.StringUtils.split方法
preserveAllTokens:为true表示相邻分隔符被视为空token分隔符,为false表示相邻的分隔符被视为一个分隔符。
![](https://img.haomeiwen.com/i10074549/5bdbdd72d993a896.png)
/**
* <p>Splits the provided text into an array, separators specified.
* This is an alternative to using StringTokenizer.</p>
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtils.split(null, *) = null
* StringUtils.split("", *) = []
* StringUtils.split("abc def", null) = ["abc", "def"]
* StringUtils.split("abc def", " ") = ["abc", "def"]
* StringUtils.split("abc def", " ") = ["abc", "def"]
* StringUtils.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String input
*/
public static String[] split(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, false);
}
/**
* Performs the logic for the {@code split} and
* {@code splitPreserveAllTokens} methods that return a maximum array
* length.
*
* @param str the String to parse, may be {@code null}
* @param separatorChars the separate character
* @param max the maximum number of elements to include in the
* array. A zero or negative value implies no limit.
* @param preserveAllTokens if {@code true}, adjacent separators are
* treated as empty token separators; if {@code false}, adjacent
* separators are treated as one separator.
* @return an array of parsed Strings, {@code null} if null String input
*/
private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
// Performance tuned for 2.0 (JDK1.4)
// Direct code is quicker than StringTokenizer.
// Also, StringTokenizer uses isSpace() not isWhitespace()
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final List<String> list = new ArrayList<>();
int sizePlus1 = 1;
int i = 0, start = 0;
boolean match = false;
boolean lastMatch = false;
if (separatorChars == null) {
// Null separator means use whitespace
while (i < len) {
if (Character.isWhitespace(str.charAt(i))) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else if (separatorChars.length() == 1) {
// Optimise 1 character case
final char sep = separatorChars.charAt(0);
while (i < len) {
if (str.charAt(i) == sep) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else {
// standard case
while (i < len) {
if (separatorChars.indexOf(str.charAt(i)) >= 0) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
}
if (match || preserveAllTokens && lastMatch) {
list.add(str.substring(start, i));
}
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
/**
* <p>Splits the provided text into an array, separators specified,
* preserving all tokens, including empty tokens created by adjacent
* separators. This is an alternative to using StringTokenizer.</p>
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtils.splitPreserveAllTokens(null, *) = null
* StringUtils.splitPreserveAllTokens("", *) = []
* StringUtils.splitPreserveAllTokens("abc def", null) = ["abc", "def"]
* StringUtils.splitPreserveAllTokens("abc def", " ") = ["abc", "def"]
* StringUtils.splitPreserveAllTokens("abc def", " ") = ["abc", "", def"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef:", ":") = ["ab", "cd", "ef", ""]
* StringUtils.splitPreserveAllTokens("ab:cd:ef::", ":") = ["ab", "cd", "ef", "", ""]
* StringUtils.splitPreserveAllTokens("ab::cd:ef", ":") = ["ab", "", cd", "ef"]
* StringUtils.splitPreserveAllTokens(":cd:ef", ":") = ["", cd", "ef"]
* StringUtils.splitPreserveAllTokens("::cd:ef", ":") = ["", "", cd", "ef"]
* StringUtils.splitPreserveAllTokens(":cd:ef:", ":") = ["", cd", "ef", ""]
* </pre>
*
* @param str the String to parse, may be {@code null}
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.1
*/
public static String[] splitPreserveAllTokens(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, true);
}
网友评论