一、概念
正则表达式也叫做匹配模式(Pattern),它由一组具有特定含义的字符串组成,通常用于匹配和替换文本。
二、匹配单个字符
1.匹配固定单个字符
所有的单个大小写字母、数字以及特殊字符,都是一个正则表达式,它们只能匹配单个字符,且这个字符与它本身相同:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("e", Pattern.CASE_INSENSITIVE);
String content = "hello Android, Hello Java, HELLO Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:25:31.765 3018-3018/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 14:25:31.766 3018-3018/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 2, group: e
2020-09-09 14:25:31.766 3018-3018/com.tomorrow.target28 D/MainActivity: zwm, start: 16, end: 17, group: e
2020-09-09 14:25:31.766 3018-3018/com.tomorrow.target28 D/MainActivity: zwm, start: 28, end: 29, group: E
将多个固定单个字符进行组合就构成了一个匹配固定字符串的表达式:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("ello", Pattern.CASE_INSENSITIVE);
String content = "hello Android, Hello Java, HELLO Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:30:02.406 7928-7928/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 14:30:02.407 7928-7928/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 5, group: ello
2020-09-09 14:30:02.407 7928-7928/com.tomorrow.target28 D/MainActivity: zwm, start: 16, end: 20, group: ello
2020-09-09 14:30:02.407 7928-7928/com.tomorrow.target28 D/MainActivity: zwm, start: 28, end: 32, group: ELLO
2.匹配任意单个字符
“.”可以匹配任意的单个字符、英文字母、数字,以及它本身:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("he.", Pattern.CASE_INSENSITIVE);
String content = "hello Android, Hello Java, HELLO Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:37:00.984 12212-12212/? D/MainActivity: zwm, testMethod
2020-09-09 14:37:00.985 12212-12212/? D/MainActivity: zwm, start: 0, end: 3, group: hel
2020-09-09 14:37:00.985 12212-12212/? D/MainActivity: zwm, start: 15, end: 18, group: Hel
2020-09-09 14:37:00.985 12212-12212/? D/MainActivity: zwm, start: 27, end: 30, group: HEL
“.”可以连续使用:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("he..", Pattern.CASE_INSENSITIVE);
String content = "hello Android, Hello Java, HELLO Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:38:52.403 12709-12709/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 14:38:52.404 12709-12709/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 4, group: hell
2020-09-09 14:38:52.404 12709-12709/com.tomorrow.target28 D/MainActivity: zwm, start: 15, end: 19, group: Hell
2020-09-09 14:38:52.404 12709-12709/com.tomorrow.target28 D/MainActivity: zwm, start: 27, end: 31, group: HELL
3.匹配“.”元字符
有的时候,我们不想让“.”去匹配任何的字符,仅仅想让它匹配“.”这一单个字符,也就是仅匹配它本身,此时,可以对它进行转义:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("ello\\.", Pattern.CASE_INSENSITIVE);
String content = "hello. Android, Hello. Java, HELLO. Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:47:01.208 13177-13177/? D/MainActivity: zwm, testMethod
2020-09-09 14:47:01.209 13177-13177/? D/MainActivity: zwm, start: 1, end: 6, group: ello.
2020-09-09 14:47:01.209 13177-13177/? D/MainActivity: zwm, start: 17, end: 22, group: ello.
2020-09-09 14:47:01.210 13177-13177/? D/MainActivity: zwm, start: 30, end: 35, group: ELLO.
如果要匹配“\”,可以对它进行转义:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("ello\\\\", Pattern.CASE_INSENSITIVE);
String content = "hello\\ Android, Hello\\ Java, HELLO\\ Kotlin";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 14:52:45.112 13852-13852/? D/MainActivity: zwm, testMethod
2020-09-09 14:52:45.113 13852-13852/? D/MainActivity: zwm, start: 1, end: 6, group: ello\
2020-09-09 14:52:45.113 13852-13852/? D/MainActivity: zwm, start: 17, end: 22, group: ello\
2020-09-09 14:52:45.114 13852-13852/? D/MainActivity: zwm, start: 30, end: 35, group: ELLO\
4.匹配字符组
字符组的基本语法
“.”过于灵活了,它可以匹配几乎所有的单个字符。有的时候,我们只希望匹配有限个字符中的某一个。这个时候,可以使用字符组:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile(".ea[td]", Pattern.CASE_INSENSITIVE);
String content = "head heat heavy";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:07:39.830 13950-13950/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:07:39.831 13950-13950/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 4, group: head
2020-09-09 15:07:39.831 13950-13950/com.tomorrow.target28 D/MainActivity: zwm, start: 5, end: 9, group: heat
在字符组中使用字符区间
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city[1-3]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city0.jpg city1.jpg city2.jpg city3.jpg city4.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:11:49.980 14383-14383/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:11:49.981 14383-14383/com.tomorrow.target28 D/MainActivity: zwm, start: 10, end: 19, group: city1.jpg
2020-09-09 15:11:49.981 14383-14383/com.tomorrow.target28 D/MainActivity: zwm, start: 20, end: 29, group: city2.jpg
2020-09-09 15:11:49.981 14383-14383/com.tomorrow.target28 D/MainActivity: zwm, start: 30, end: 39, group: city3.jpg
同样的道理,我们可以写出“[a-z]”来匹配所有的小写字母,“[A-Z]”匹配所有的大写字母。
如果要在字符组(“[” “]”内)中匹配“-”,需要使用转义符; 而在“[” “]”以外,“-”变成了一个普通字符,无需再进行转义:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city[1-3\\-]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city0.jpg city1.jpg city2.jpg city3.jpg city-.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:15:35.466 15390-15390/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:15:35.470 15390-15390/com.tomorrow.target28 D/MainActivity: zwm, start: 10, end: 19, group: city1.jpg
2020-09-09 15:15:35.470 15390-15390/com.tomorrow.target28 D/MainActivity: zwm, start: 20, end: 29, group: city2.jpg
2020-09-09 15:15:35.470 15390-15390/com.tomorrow.target28 D/MainActivity: zwm, start: 30, end: 39, group: city3.jpg
2020-09-09 15:15:35.470 15390-15390/com.tomorrow.target28 D/MainActivity: zwm, start: 40, end: 49, group: city-.jpg
反义字符组
有的时候,我们需要匹配“除了某些字符以外”的其他字符,这时候,我们可以使用反义字符组:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city[^1-3]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city0.jpg city1.jpg city2.jpg city3.jpg city-.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:21:37.471 16146-16146/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:21:37.472 16146-16146/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 9, group: city0.jpg
2020-09-09 15:21:37.472 16146-16146/com.tomorrow.target28 D/MainActivity: zwm, start: 40, end: 49, group: city-.jpg
5.匹配特殊字符
匹配元字符
元字符是在正则表达式中具有特殊含义的字符:
- “.”就是一个元字符,它用来匹配任意单个字符。当我们要匹配字符“.”本身的时候,需要对它进行转义。
- “\”也是一个元字符,它叫做转义符。当我们需要匹配字符“\”的时候,需要对它进行转义。
- “[”和“]”也是元字符,当我们需要匹配“[”和“]”字符的时候,需要对它进行转义。
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city\\[9\\]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city0.jpg city1.jpg city2.jpg city3.jpg city[9].jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:35:27.230 16548-16548/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:35:27.230 16548-16548/com.tomorrow.target28 D/MainActivity: zwm, start: 40, end: 51, group: city[9].jpg
匹配特定字符类型
- 对于“.”和“[”等来说,它们本身就是元字符,而当给它们前面加上转义字符“\”的时候,它们才代表一个普通字符。
- 对于“r”和“n”等来说,它们本身只是普通字符,而只有当加上转义字符“\”的时候(变成了“\r”和“\n”),它们才代表着元字符:“\r”匹配空字符回车,“\n”匹配空字符换行。
匹配数字类型:
元字符 | 匹配描述 |
---|---|
\d | 所有单个数字,与 [ 0-9 ] 相同 |
\D | 所有非数字,与 [ ^0-9 ] 相同 |
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile(" city[\\d]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = " city0.jpg city1.jpg city2.jpg cityX.jpg cityY.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 15:53:12.300 18819-18819/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 15:53:12.301 18819-18819/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 11, group: city0.jpg
2020-09-09 15:53:12.301 18819-18819/com.tomorrow.target28 D/MainActivity: zwm, start: 13, end: 23, group: city1.jpg
2020-09-09 15:53:12.301 18819-18819/com.tomorrow.target28 D/MainActivity: zwm, start: 23, end: 33, group: city2.jpg
匹配字母、数字、下划线:
元字符 | 匹配描述 |
---|---|
\w | 所有单个大小写字母、数字、下划线,与 [ a-zA-Z0-9_ ] 相同 |
\W | 所有单个非大小写字母、非数字、非下划线,与 [ ^a-zA-Z0-9_ ] 相同 |
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile(" city[\\w]\\.jpg", Pattern.CASE_INSENSITIVE);
String content = " city0.jpg city1.jpg city2.jpg cityX.jpg city-.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:06:45.379 19095-19095/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 16:06:45.384 19095-19095/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 11, group: city0.jpg
2020-09-09 16:06:45.384 19095-19095/com.tomorrow.target28 D/MainActivity: zwm, start: 13, end: 23, group: city1.jpg
2020-09-09 16:06:45.384 19095-19095/com.tomorrow.target28 D/MainActivity: zwm, start: 23, end: 33, group: city2.jpg
2020-09-09 16:06:45.384 19095-19095/com.tomorrow.target28 D/MainActivity: zwm, start: 33, end: 43, group: cityX.jpg
三、匹配多个字符
1.匹配一个或多个
正则表达式中,可以在单个字符、字符组、特定字符类型、单个任意字符后面加“+”,来表示匹配一个或多个字符组成的字符串:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city\\w+\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city.jpg city2.jpg cityX.jpg city-.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:21:46.482 24054-24054/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 16:21:46.483 24054-24054/com.tomorrow.target28 D/MainActivity: zwm, start: 9, end: 18, group: city2.jpg
2020-09-09 16:21:46.483 24054-24054/com.tomorrow.target28 D/MainActivity: zwm, start: 19, end: 28, group: cityX.jpg
2.匹配零个或多个字符
正则表达式中,可以在单个字符、字符组、特定字符类型、单个任意字符后面加“*”,来表示匹配零个或多个字符组成的字符串:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("city\\w*\\.jpg", Pattern.CASE_INSENSITIVE);
String content = "city.jpg city2.jpg cityX.jpg city-.jpg";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:22:59.815 24456-24456/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 16:22:59.815 24456-24456/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 8, group: city.jpg
2020-09-09 16:22:59.816 24456-24456/com.tomorrow.target28 D/MainActivity: zwm, start: 9, end: 18, group: city2.jpg
2020-09-09 16:22:59.816 24456-24456/com.tomorrow.target28 D/MainActivity: zwm, start: 19, end: 28, group: cityX.jpg
3.匹配零个或一个字符串
正则表达式中,使用“?”来匹配零个或一个字符。其使用方式与“+”和“*”相同:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("flowers?", Pattern.CASE_INSENSITIVE);
String content = "flower flowers flow";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:37:45.893 24812-24812/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 16:37:45.894 24812-24812/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 6, group: flower
2020-09-09 16:37:45.895 24812-24812/com.tomorrow.target28 D/MainActivity: zwm, start: 7, end: 14, group: flowers
注意:“+”、“*”、“?”都是元字符,如果要对它们进行匹配,需要使用“\”进行转义。
4.匹配指定数目字符
匹配固定数目的字符
正则表达式中,可以在单个字符、字符组、特定字符类型、单个任意字符后面加“{数字}”,来表示匹配零个或多个字符组成的字符串:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("1[2-5]\\d{9}", Pattern.CASE_INSENSITIVE);
String content = "15012345678 13712345678 1234567890";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:47:02.508 25875-25875/? D/MainActivity: zwm, testMethod
2020-09-09 16:47:02.509 25875-25875/? D/MainActivity: zwm, start: 0, end: 11, group: 15012345678
2020-09-09 16:47:02.509 25875-25875/? D/MainActivity: zwm, start: 12, end: 23, group: 13712345678
匹配区间以内数目的字符
正则表达式中,使用“{最小数目,最大数目}”的语法来实现:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("1[2-5]\\d{7,9}", Pattern.CASE_INSENSITIVE);
String content = "15012345678 13712345678 1234567890";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 16:48:50.741 26407-26407/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 16:48:50.742 26407-26407/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 11, group: 15012345678
2020-09-09 16:48:50.742 26407-26407/com.tomorrow.target28 D/MainActivity: zwm, start: 12, end: 23, group: 13712345678
2020-09-09 16:48:50.742 26407-26407/com.tomorrow.target28 D/MainActivity: zwm, start: 24, end: 34, group: 1234567890
注意两个特例:
- 最小数目可以是0,所以 “{0,1}”,相当于 “?”。
- 如果不限制最大数目,可以将最大数目设为空,所以“\d{1,}”相当于“+”;而“{0,}”相当于“*”。
注意:“{”和“}”也是元字符,当我们需要对它们进行匹配的时候,使用“\”进行转义。
5.贪婪匹配和惰性匹配
- 贪婪匹配(greedy):它会匹配尽可能多的字符。它首先看整个字符串,如果不匹配,对字符串进行收缩;遇到可能匹配的文本,停止收缩,对文本进行扩展,当发现匹配的文本时,它不着急将该匹配保存到匹配集合中,而是对文本继续扩展,直到无法继续匹配 或者 扩展完整个字符串,然后将前面最后一个符合匹配的文本(也是最长的)保存起来到匹配集合中。所以说它是贪婪的。
- 惰性匹配(lazy):它会匹配尽可能少的字符,它从第一个字符开始找起,一旦符合条件,立刻保存到匹配集合中,然后继续进行查找。所以说它是懒惰的。
贪婪匹配与惰性匹配的语法表:
贪婪匹配 | 惰性匹配 | 匹配描述 |
---|---|---|
? | ?? | 匹配0个或1个 |
+ | +? | 匹配1个或多个 |
* | *? | 匹配0个或多个 |
{n} | {n}? | 匹配n个 |
{n,m} | {n,m}? | 匹配n个或m个 |
{n,} | {n,}? | 匹配n个或多个 |
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("start.*end", Pattern.CASE_INSENSITIVE);
String content = "start hello Android end start hello Java end";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 17:29:25.564 26746-26746/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 17:29:25.565 26746-26746/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 44, group: start hello Android end start hello Java end
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("start.*?end", Pattern.CASE_INSENSITIVE);
String content = "start hello Android end start hello Java end";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-09 17:30:26.096 27158-27158/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-09 17:30:26.097 27158-27158/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 23, group: start hello Android end
2020-09-09 17:30:26.097 27158-27158/com.tomorrow.target28 D/MainActivity: zwm, start: 24, end: 44, group: start hello Java end
有两个匹配模式比较有意思:
- 一个是“{n}”,对于这种形式的匹配,由于它精确地要求匹配n个字符,所以无所谓贪婪还是惰性,尽管“{n}?”也是正确的匹配写法,但它的匹配结果总是与“{n}”相同。
- 还有一个就是“??”,它看上去比较古怪且不好理解,因为通常我们使用贪婪匹配的时候都是匹配多个,也就是“*”或者“+”之类的匹配,而这里是0个或1个,"?"是贪婪匹配,会匹配尽可能多的字符,"??"是惰性匹配,会匹配尽可能少的字符。
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("flowers?", Pattern.CASE_INSENSITIVE);
String content = "I like the flowers";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 08:52:05.264 2040-2040/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 08:52:05.264 2040-2040/com.tomorrow.target28 D/MainActivity: zwm, start: 11, end: 18, group: flowers
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("flowers??", Pattern.CASE_INSENSITIVE);
String content = "I like the flowers";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 08:52:40.019 3076-3076/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 08:52:40.020 3076-3076/com.tomorrow.target28 D/MainActivity: zwm, start: 11, end: 17, group: flower
四、匹配边界
1.匹配单词边界
正则表达式中,可以在字符前加“\b”,来匹配其后面的字符位于字符串首位的字符。
字符,指:单个字符(比如“j”)、字符组(比如“[abcde]”)、特定字符类型(比如“\d”)、转义过的特殊字符“\ [”或者单个任意字符(即“.”)。
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("\\blike", Pattern.CASE_INSENSITIVE);
String content = " like alike like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:16:37.781 21019-21019/? D/MainActivity: zwm, testMethod
2020-09-10 09:16:37.782 21019-21019/? D/MainActivity: zwm, start: 1, end: 5, group: like
2020-09-10 09:16:37.782 21019-21019/? D/MainActivity: zwm, start: 12, end: 16, group: like
正则表达式中,可以在字符后加“\b”,来匹配其前面的字符位于字符串末位的字符:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("like\\b", Pattern.CASE_INSENSITIVE);
String content = " like alike like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:19:10.476 21094-21094/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 09:19:10.477 21094-21094/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 5, group: like
2020-09-10 09:19:10.477 21094-21094/com.tomorrow.target28 D/MainActivity: zwm, start: 7, end: 11, group: like
2020-09-10 09:19:10.477 21094-21094/com.tomorrow.target28 D/MainActivity: zwm, start: 12, end: 16, group: like
为了精确地匹配,我们需要在前后都加上字符边界:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("\\blike\\b", Pattern.CASE_INSENSITIVE);
String content = " like alike like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:21:16.969 21563-21563/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 09:21:16.970 21563-21563/com.tomorrow.target28 D/MainActivity: zwm, start: 1, end: 5, group: like
2020-09-10 09:21:16.970 21563-21563/com.tomorrow.target28 D/MainActivity: zwm, start: 12, end: 16, group: like
2.边界及其相对性
边界的定义
通常情况下,以 空格、段落首行、段落末尾、逗号、句号 等符号作为边界,值得注意的是,分隔符“-”也可以作为边界:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("\\blike\\b", Pattern.CASE_INSENSITIVE);
String content = " like a-like like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:28:55.432 22111-22111/? D/MainActivity: zwm, testMethod
2020-09-10 09:28:55.433 22111-22111/? D/MainActivity: zwm, start: 1, end: 5, group: like
2020-09-10 09:28:55.433 22111-22111/? D/MainActivity: zwm, start: 8, end: 12, group: like
2020-09-10 09:28:55.433 22111-22111/? D/MainActivity: zwm, start: 13, end: 17, group: like
边界的相对性
请牢牢记住边界的这个特点:
- 当你对一个普通字符,比如“s”,设定边界的时候,它的边界是诸如空格、分隔符、逗号、句号等。
- 当你对一个边界,比如分隔符“-”或者“,”等,设定边界的时候,它的边界是普通字符。
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("\\,\\b", Pattern.CASE_INSENSITIVE);
String content = " like,a-like,like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:35:43.082 22882-22882/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 09:35:43.083 22882-22882/com.tomorrow.target28 D/MainActivity: zwm, start: 5, end: 6, group: ,
2020-09-10 09:35:43.083 22882-22882/com.tomorrow.target28 D/MainActivity: zwm, start: 12, end: 13, group: ,
3.匹配非单词边界
和匹配特定类型字符有些相似,有了“\b”,自然有“\B”,它用来匹配不在边界的字符:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("\\Blike\\B", Pattern.CASE_INSENSITIVE);
String content = " like alike9 like";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:42:57.777 23250-23250/? D/MainActivity: zwm, testMethod
2020-09-10 09:42:57.778 23250-23250/? D/MainActivity: zwm, start: 7, end: 11, group: like
4.匹配文本边界
匹配文本首
在正则表达式中,可以在匹配模式的第一个字符前添加“^”,以匹配满足模式且位于全部文本之首的字符串:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("^hello", Pattern.CASE_INSENSITIVE);
String content = "hello Android hello Java";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 09:49:35.382 23502-23502/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 09:49:35.383 23502-23502/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 5, group: hello
这里有个值得注意的地方:如果我们在第一行开头添加一个空格,就破坏了这个匹配,还需要添加对空字符的处理:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("^\\s*hello", Pattern.CASE_INSENSITIVE);
String content = " hello Android hello Java";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 10:14:58.607 28036-28036/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 10:14:58.608 28036-28036/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 6, group: hello
匹配文本末
在正则表达式中,可以在匹配模式的最后一个字符后添加“$”,以匹配满足模式且位于全部文本之末的字符串:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("hello\\s*$", Pattern.CASE_INSENSITIVE);
String content = " hello Android hello ";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 10:21:21.040 28468-28468/? D/MainActivity: zwm, testMethod
2020-09-10 10:21:21.041 28468-28468/? D/MainActivity: zwm, start: 15, end: 21, group: hello
五、匹配子模式
我们之前介绍的所有匹配模式(例如“+”、“*”、“{n,m}”),都是针对于某种单个字符的。
1.子模式
在正则表达式中,可以使用“(”和“)”将模式中的子字符串括起来,以形成一个子模式。将子模式视为一个整体时,那么它就相当于一个单个字符:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(hello){2,}", Pattern.CASE_INSENSITIVE);
String content = " helloAndroid hellohellohello";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 10:36:13.333 28848-28848/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 10:36:13.334 28848-28848/com.tomorrow.target28 D/MainActivity: zwm, start: 14, end: 29, group: hellohellohello
2.“或”匹配
在正则表达式中,可以使用“|”将一个表达式拆分成两部分“reg1|reg2”,它的意思是:匹配所有符合表达式reg1的文本 或者 符合表达式reg2的文本:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("hello|Android", Pattern.CASE_INSENSITIVE);
String content = " hello Android hello Java";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 10:43:14.778 29103-29103/? D/MainActivity: zwm, testMethod
2020-09-10 10:43:14.778 29103-29103/? D/MainActivity: zwm, start: 1, end: 6, group: hello
2020-09-10 10:43:14.779 29103-29103/? D/MainActivity: zwm, start: 7, end: 14, group: Android
2020-09-10 10:43:14.779 29103-29103/? D/MainActivity: zwm, start: 15, end: 20, group: hello
3.在子模式中使用“或”匹配
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(19|20)\\d{2}", Pattern.CASE_INSENSITIVE);
String content = "1998 1999 1883 2020 2100";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 10:47:09.321 29722-29722/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 10:47:09.322 29722-29722/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 4, group: 1998
2020-09-10 10:47:09.322 29722-29722/com.tomorrow.target28 D/MainActivity: zwm, start: 5, end: 9, group: 1999
2020-09-10 10:47:09.323 29722-29722/com.tomorrow.target28 D/MainActivity: zwm, start: 15, end: 19, group: 2020
4.嵌套子模式
子模式可以再继续嵌套子模式,产生更加功能强大的匹配能力:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(hell|hi(99|88)){2,}", Pattern.CASE_INSENSITIVE);
String content = "hellhell hi99hi99 hj88hj88 hillhill he66he66";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 11:14:12.815 3726-3726/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 11:14:12.816 3726-3726/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 8, group: hellhell
2020-09-10 11:14:12.816 3726-3726/com.tomorrow.target28 D/MainActivity: zwm, start: 9, end: 17, group: hi99hi99
六、后向引用
正则表达式中,使用“\数字”来进行后向引用,数字表示这里引用的是前面的第几个子模式:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(\\w+) \\1", Pattern.CASE_INSENSITIVE);
String content = "Is the cost of of of gasline going up up up? Look up of the TV, your mobile phone is there.";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 11:29:19.991 7883-7883/? D/MainActivity: zwm, testMethod
2020-09-10 11:29:19.992 7883-7883/? D/MainActivity: zwm, start: 12, end: 17, group: of of
2020-09-10 11:29:19.992 7883-7883/? D/MainActivity: zwm, start: 35, end: 40, group: up up
七、文本替换
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(\\w+) \\1", Pattern.CASE_INSENSITIVE);
String content = "Is the cost of of of gasline going up up up? Look up of the TV, your mobile phone is there.";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
String newContent = matcher.replaceAll("replace");
Log.d(TAG, "zwm, newContent: " + newContent);
}
2020-09-10 13:22:23.413 15603-15603/? D/MainActivity: zwm, testMethod
2020-09-10 13:22:23.413 15603-15603/? D/MainActivity: zwm, start: 12, end: 17, group: of of
2020-09-10 13:22:23.414 15603-15603/? D/MainActivity: zwm, start: 35, end: 40, group: up up
2020-09-10 13:22:23.414 15603-15603/? D/MainActivity: zwm, newContent: Is the cost replace of gasline going replace up? Look up of the TV, your mobile phone is there.
八、预查和非获取匹配
1.正向预查
正向预查的语法是在子模式内部 前面 加“?=”,表示的意思是:首先,要匹配的文本必须满足此子模式 前面 的表达式;其次,此子模式不参与匹配:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("Http(?=[\\d.]+\\b)", Pattern.CASE_INSENSITIVE);
String content = "Http1.0 Http1.1 HttpX.X";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 14:35:20.708 22257-22257/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 14:35:20.709 22257-22257/com.tomorrow.target28 D/MainActivity: zwm, start: 0, end: 4, group: Http
2020-09-10 14:35:20.710 22257-22257/com.tomorrow.target28 D/MainActivity: zwm, start: 8, end: 12, group: Http
“非获取匹配”仅仅起一个限制作用,不参与匹配。你可以将正向预查理解成为自定义的边界(\b),这个边界位于表达式末。
2.反向预查
反向预查的语法是在子模式内部 前面 加“?<=”,表示的意思是:首先,要匹配的文本必须满足此子模式 后面 的表达式;其次,此子模式不参与匹配:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(?<=Http)[\\d.]+\\b", Pattern.CASE_INSENSITIVE);
String content = "Http1.0 Http1.1 HttpX.X";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 14:53:38.750 22743-22743/? D/MainActivity: zwm, testMethod
2020-09-10 14:53:38.751 22743-22743/? D/MainActivity: zwm, start: 4, end: 7, group: 1.0
2020-09-10 14:53:38.752 22743-22743/? D/MainActivity: zwm, start: 12, end: 15, group: 1.1
你可以将反向预查理解成为自定义的边界(\b),这个边界位于表达式首。
3.正向、反向预查组合
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(?<=Http)[\\d.]+(?=Http)", Pattern.CASE_INSENSITIVE);
String content = "Http1.0Http Http1.1Http HttpX.XHttp";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 15:11:42.184 25941-25941/? D/MainActivity: zwm, testMethod
2020-09-10 15:11:42.185 25941-25941/? D/MainActivity: zwm, start: 4, end: 7, group: 1.0
2020-09-10 15:11:42.185 25941-25941/? D/MainActivity: zwm, start: 16, end: 19, group: 1.1
4.负正向预查、负反向预查
负正向预查
如同\b有与之相对的\B一样,正向预查也有它的逆过程,称之为负正向预查。在正则表达式中,可以在子模式内部 前面 加 “?!” 来形成一个 负正向预查,它的效果与 “?=” 相反:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("Http(?!\\d+[\\d.]+\\b)", Pattern.CASE_INSENSITIVE);
String content = "Http1.0 Http1.1 HttpX.X hello9.9";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 15:44:26.990 29721-29721/? D/MainActivity: zwm, testMethod
2020-09-10 15:44:26.992 29721-29721/? D/MainActivity: zwm, start: 16, end: 20, group: Http
负反向预查
在正则表达式中,可以在子模式内部 前面 加 “?<!” 来形成一个 负反向预查,它的效果与“?<=” 相反:
private void testMethod() {
Log.d(TAG, "zwm, testMethod");
Pattern pattern = Pattern.compile("(?<!Http)\\d+[\\d.]+\\b", Pattern.CASE_INSENSITIVE);
String content = "Http1.0 Http1.1 HttpX.X hello9.9";
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
Log.d(TAG, "zwm, start: " + matcher.start() + ", end: " + matcher.end() + ", group: " + matcher.group());
}
}
2020-09-10 15:40:33.566 28871-28871/com.tomorrow.target28 D/MainActivity: zwm, testMethod
2020-09-10 15:40:33.568 28871-28871/com.tomorrow.target28 D/MainActivity: zwm, start: 29, end: 32, group: 9.9
网友评论