正则表达式

作者: 零点知晨 | 来源:发表于2021-07-13 14:40 被阅读0次

作为前端开发，基础的正则是需要学习的，不能完全不会啊！不然大厂面试会死的很惨

官方权威学习地址：
正则表达式 - JavaScript | MDN (mozilla.org)

正则在js中的实现方式

有以下2种

正则字面量。let reg = /abc/
正则构造函数RegExp。let reg = new RegExp('abc')

建议使用字面量来创建，因为在script脚本加载之后，js脚本执行之前 字面量就会被处理

正则的匹配模式

正则匹配的简单模式：直接匹配固定的字符串 比如/abc/ 只有完全等于abc的字符串才是满足条件的
正则匹配的特殊模式：需要匹配一个不确定的字符串时（需要搭配各种限定字符）

正则匹配的特殊模式

正则的特殊字符类型：

断言
字符类
组和范围
量词
unicode转译

注意：正则的匹配规则基本都是针对单个字符来的！！！

先看几个常用的
: 转译字符
^: 以什么字符开头，放在中括号中表示，反向字符集合
$:以什么字符结尾，
*：匹配前一个字符0次或多次，等价于 {1,}
+：匹配前面一个表达式 1 次或者多次。等价于 {1,}。
?:匹配前面一个表达式 0 次或者 1 次。等价于 {0,1}。

按照分类来

断言类 Assertions

断言的组成之一是边界。对于文本、词或模式，边界可以用来表明它们的起始或终止部分（如向前断言，向后断言以及条件表达式）

a.边界类断言

^ 匹配输入的开头(注意和\b的区别，这个是整个输入的字符串的开头)。
$ 匹配输入的结束。 /.vue$/ 以.vue结尾的字符串
\b 匹配一个单词的边界，不是整个输入的字符串的边界。单词的开头或者单词的结尾 /\bm/ 可以匹配'moon mabc' 可以匹配到2个m
\B 匹配非单词边界

正则表达式在console.log输出的时候我们可以看到几个参数

image.png

input就是整个输入的字符串,

b.其他断言

x(?=y) 向前断言:x 被 y 跟随时匹配 x。例如，对于/Jack(?=Sprat)/，“Jack”在跟有“Sprat”的情况下才会得到匹配．/Jack(?=Sprat|Frost)/ “Jack”后跟有“Sprat”或“Frost”的情况下才会得到匹配。不过，匹配结果不包括“Sprat”或“Frost”。
x(?!y) 向前否定断言: x 没有被 y 紧随时匹配 x。例如，对于/\d+(?!.)/，数字后没有跟随小数点的情况下才会得到匹配。/\d+(?!.)/.exec(3.141)，匹配‘141’而不是‘3’。
(?<=y)x 向后断言: x 跟随 y 的情况下匹配 x。例如，对于/(?<=Jack)Sprat/，“Sprat”紧随“Jack”时才会得到匹配。
(?<!y)x 向后否定断言: x 不跟随 y 时匹配 x。例如，对于/(?<!-)\d+/，数字不紧随-符号的情况下才会得到匹配。对于/(?<!-)\d+/.exec(3) ，“3”得到匹配。

Demo:

const text = 'A quick fox';

const regexpLastWord = /\w+$/;
console.log(text.match(regexpLastWord));
// expected output: Array ["fox"]

const regexpWords = /\b\w+\b/g;
console.log(text.match(regexpWords));
// expected output: Array ["A", "quick", "fox"]

const regexpFoxQuality = /\w+(?= fox)/;
console.log(text.match(regexpFoxQuality));
// expected output: Array ["quick"]

其他练习：

1.过滤以A开头的水果

let fruits = ["Apple", "Watermelon", "Orange", "Avocado", "Strawberry"];
// 使用正则 /^A/ 选择以'A'开头的水果.
// 这里的 '^' 只有一种含义: 匹配输入的开头.
let fruitsStartsWithA = fruits.filter(fruit => /^A/.test(fruit));

2.选择以en和ed结尾的单词(不是$整个字符串的结尾!!!)

let fruitsWithDescription = ["Red apple", "Orange orange", "Green Avocado"];
// 选择包含以 “en” 或 “ed” 结尾的单词的描述:
let enEdSelection = fruitsWithDescription.filter(descr => /(en|ed)\b/.test(descr));

3.匹配没有被小数点跟随且至少有一位的数字

console.log(/\d+(?!\.)/g.exec('3.141')); // [ '141', index: 2, input: '3.141' ]

不同含义的'?!'：( 断言和范围的组合用法")
断言中的含义：/x(?!y)/ 是向前匹配非
组合中的含义：[^?!]

let orangeNotLemon = "Do you want to have an orange? Yes, I do not want to have a lemon!";

let selectNotLemonRegex = /[^?!]+have(?! a lemon)[^?!]+[?!]/gi
console.log(orangeNotLemon.match(selectNotLemonRegex)); // [ 'Do you want to have an orange?' ]

let selectNotOrangeRegex = /[^?!]+have(?! an orange)[^?!]+[?!]/gi
console.log(orangeNotLemon.match(selectNotOrangeRegex)); // [ ' Yes, I do not want to have a lemon!' ]

字符类：Characters

. 匹配除行终止符之外的任何单个字符: \n, \r, \u2028 or \u2029. 在字符集中就变成点了[.],没有特殊含义
\d 匹配任何数字(阿拉伯数字)。相当于 [0-9]。例如, /\d/ 或 /[0-9]/ 匹配 “B2is the suite number”中的“2”。
\D 匹配任何非数字(阿拉伯数字)的字符。相当于[^0-9]. 例如, /\D/ or /[^0-9]/ 在 "B2 is the suite number" 中匹配 "B". 疑问❓为啥只匹配出一个字符
自己试了一下，确实这样

image.png
\w 匹配基本拉丁字母中的任何字母数字字符，包括下划线。相当于 [A-Za-z0-9_]. 例如, /\w/ 在 "apple" 匹配 "a" , "5" in "$5.28"
这个也是一样的，只匹配1个结果就不去找下一个了
\W 匹配任何不是来自基本拉丁字母的单词字符。相当于 [^A-Za-z0-9_]. 例如, /\W/ or /[^A-Za-z0-9_]/ 匹配 "%" 在 "50%"
这个可以用来匹配特殊字符
\s 匹配一个空白字符，包括空格、制表符、换页符和换行符。等价于[ \f\n\r\t\ ...省略unicode部分]。例如 /\s\w*/ 匹配"foo bar."中的' bar'。空格或者换行后面跟着任意多个字符
\S 匹配一个非空白字符。例如，/\S\w*/ 匹配"foo bar."中的'foo'。总之结果不能包含空格，匹配到1个就完事
其他几个\f\n\r\t...

demo:

// 1.匹配下：任意字符跟着一个数组
const chessStory = 'He played the King in a8 and she moved her Queen in c2.';
const regexpCoordinates = /\w\d/g;
console.log(chessStory.match(regexpCoordinates));
// expected output: Array [ 'a8', 'c2']

// 2.过滤emoji表情
const moods = 'happy 🙂, confused 😕, sad 😢';
const regexpEmoticons = /[\u{1F600}-\u{1F64F}]/gu;
console.log(moods.match(regexpEmoticons));
// expected output: Array ['🙂', '😕', '😢']

3.查找4位数字

var randomData = "015 354 8787 687351 3512 8735";
var regexpFourDigits = /\b\d{4}\b/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// \d{4} indicates a digit, four times
// \b indicates another boundary (i.e. do not end matching in the middle of a word)

console.table(randomData.match(regexpFourDigits));
// ['8787', '3512', '8735']

查找单词(\b一般就是找单词),a或者A开始并且后面至少跟一个[a-zA-z0-9_]字符

var aliceExcerpt = "I’m sure I’m not Ada,’ she said, ‘for her hair goes in such long ringlets, and mine doesn’t go in ringlets at all.";
var regexpWordStartingWithA = /\b[aA]\w+/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// [aA] indicates the letter a or A
// \w+ indicates any character *from the latin alphabet*, multiple times

console.table(aliceExcerpt.match(regexpWordStartingWithA));
// ['Ada', 'and', 'at', 'all']

组和范围

注意：很多单独使用的字符，在字符集里面就会有其他含义。比如^和[],?!和[?!]

字符集：前面的都是匹配字符，现在这个是字符集了！！！

x|y 匹配 "x" 或 "y" 任意一个字符。例如， /green|red/ 在 "green apple" 里匹配 "green"，且在 "red apple" 里匹配 "red" 。
[xyz]或者[a-z] 字符集。匹配任何一个包含的字符。例如, [\w-] 是字符集 \w 和 “-”（连字符）的并集，与这种写法一样： [A-Za-z0-9_-].。
[^xyz]或者[^a-z]
(x) 捕获组: 匹配x并记住匹配项。例如，/(foo)/匹配并记住“foo bar”中的“foo”
(?<Name>x) 具名捕获组: 匹配"x"并将其存储在返回的匹配项的groups属性中，该属性位于<Name>指定的名称下。尖括号(< 和 >) 用于组名。例如，使用正则 /-(?<customName>\w)/ 匹配 “web-doc” 中的 “d”,可以用来转驼峰
(?:x) 非捕获组: 匹配 “x”，但不记得匹配。不能从结果数组的元素中收回匹配的子字符串

demo

const aliceExcerpt = 'The Caterpillar and Alice looked at each other';
const regexpWithoutE = /\b[a-df-z]+\b/ig;
console.log(aliceExcerpt.match(regexpWithoutE));
// expected output: Array ["and", "at"]

// 匹配图片的宽和高
const imageDescription = 'This image has a resolution of 1440×900 pixels.';
const regexpSize = /([0-9]+)×([0-9]+)/;
const match = imageDescription.match(regexpSize);
console.log(`Width: ${match[1]} / Height: ${match[2]}.`);
// expected output: "Width: 1440 / Height: 900."

计算元音个数

var aliceExcerpt = "There was a long silence after this, and Alice could only hear whispers now and then.";
var regexpVowels = /[aeiouy]/g;

console.log("Number of vowels:", aliceExcerpt.match(regexpVowels).length);
// Number of vowels: 25

使用组来记录匹配到的数据

let personList = `First_Name: John, Last_Name: Doe
First_Name: Jane, Last_Name: Smith`;

let regexpNames =  /First_Name: (\w+), Last_Name: (\w+)/mg;
let match = regexpNames.exec(personList);
do {
  console.log(`Hello ${match[1]} ${match[2]}`);
} while((match = regexpNames.exec(personList)) !== null);
// Hello John Doe
// Hello Jane Smith

使用命名组

let users= `姓氏: 李, 名字: 雷
姓氏: 韩, 名字: 梅梅`;

let regexpNames =  /姓氏: (?<first>.+), 名字: (?<last>.+)/mg;
let match = regexpNames.exec(users);

do {
  console.log(`Hello ${match.groups.first} ${match.groups.last}`);
} while((match = regexpNames.exec(users)) !== null);

// Hellow 李 雷
// Hellow 韩 梅梅

量词：示要匹配的字符或表达式的数量

字符的数量限定：通过大括号{}

X* 将前面的项“x”匹配0次或更多次。等价于{0,}。
X+ 将前一项“x”匹配1次或更多次。等价于{1,}。
X? 将前面的项“x”匹配0或1次。等价于{0,1}。
X{n} 与前一项“x”的n次匹配。例如，/a{2}/ 不匹配“candy”中的“a”，但它匹配“caandy”中的所有“a”，以及“caaandy”中的前两个“a”。
X{n,} 至少匹配“n”次
X{n,m}
image.png

注意下？的使用，放在尾部让正则只匹配一个就可以了

补充下，+，？的内容
：匹配前一个字符0次或多次，等价于 {1,}。ps: /abc/ 代表匹配_{以a开头，以c结尾的，并且中间可以多个b字符的字符串} 去匹配一个单独的 "a" 后面跟了零个或者多个 "b"，同时后面跟着 "c" 的字符串
例如：'tabbbcd' -> 这个就可以匹配到abbbc这个字符串
+：匹配前面一个表达式 1 次或者多次。等价于 {1,}。和前一个的匹配非常的类似。例如，/a+/ 会匹配 "candy" 中的 'a' 和 "caaaaaaandy" 中所有的 'a'，
?:匹配前面一个表达式 0 次或者 1 次。等价于 {0,1}。例如，/e?le?/ 匹配 "angel" 中的 'el'、"angle" 中的 'le' 以及 "oslo' 中的 'l'。也就是说e?l -> 要么匹配el出来要么就没有e
.:（小数点）默认匹配除换行符之外的任何单个字符。例如，/.n/ 将会匹配 "nay, an apple is on the tree" 中的 'an' 和 'on'，但是不会匹配 'nay'。

const text = 'A quick fox';
const reg = /\w+$/; // 匹配最后一个单词
const reg = /\b\w+\b/g; // 匹配所有的单词
console.log(text.match(reg));
// expected output: Array ["fox"]

注意

1.大部分正则匹配都是单个字符的匹配！！！
2.正则的匹配一般是匹配到了就终止，不会全局都查一遍，如果需要需要加额外的/g参数(后面会说)
3.很多特殊含义的字符一旦放到组合里面，含义就变了。比如^和[], 比如? 和[?=]

通过标志进行高级搜索: /giu

我们平常使用比较多的就是/g 全局，和/u按照unicode进行匹配

image.png

最后：常用的正则

疑问：正则的汉字算几个字符

不允许输入空格：/\s+/
正整数(正确的数字)：/^\d+ $/ 只能输入数字(+-都可以)： /^\-?[0-9]\d*.?\d*$ /
不允许输入汉字： /[\u4E00-\u9FA5]/g
限制发票8位：
行驶证只能13位： /^[\d|a-zA-Z|-]{13} $/ 车牌号：/^[京津沪渝冀豫云辽黑湘皖鲁新苏浙赣鄂桂甘晋蒙陕吉闽贵粤青藏川宁琼使领A-Z]{1}[A-Z]{1}[A-Z0-9]{4}[A-Z0-9挂学警港澳]{1}$ /
身份证18位(有x)：
网址(URL)：[a-zA-z]+://[^\s]*
IP地址(IP Address): ((2[0-4]\d|25[0-5]|[01]?\d\d?).){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)
电子邮件(Email):\w+([-+.]\w+)@\w+([-.]\w+).\w+([-.]\w+)*
HTML标记(包含内容或自闭合):<(.)(.)>.</\1>|<(.) />
汉字(字符):[\u4e00-\u9fa5]
中国大陆固定电话号码:(\d{4}-|\d{3}-)?(\d{8}|\d{7})
6-18位密码：/^[a-z0-9_-]{6,18}$/

更复杂的点：小括号的作用

捕获组，但是具体怎么用，再补充

正则表达式

正则在js中的实现方式

正则的匹配模式

正则匹配的特殊模式

断言类 Assertions

a.边界类断言

b.其他断言

字符类：Characters

组和范围

量词：示要匹配的字符或表达式的数量

注意

通过标志进行高级搜索: /giu

最后：常用的正则

更复杂的点：小括号的作用

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读