grep

基础正则表达式

下载练习文件wget http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt

$ vim regular_express.txt 
$ grep -n 'the' regular_express.txt 
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
# 将关键字显示颜色
$ grep -n  --color=auto 'the' regular_express.txt 
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.
# 每次都加上--color=auto显得麻烦，使用alias来处理
$ alias grep='grep --color=auto'
$ source ~/.bashrc
# 反向选择
$ grep -vn 'the' regular_express.txt
# 取得不论大小写的the
$ grep -in 'the' regular_express.txt

一些字符串的意义
1.[^]:反向选择 ^[ ]：定位在行首的意义 ^[ ^ ]

[^[:lower:]] 与 [^a-z] 意义相同

找出行尾结束为小数点(.)的哪一行

#小数点具有其它的意义，所以需要( \ )进行转意,消除特殊意义
$ grep '\.$' regular_express.txt 
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
go! go! Let's go.

找出空白的一行
grep -n '^$' regular_express.txt
. 小数点：代表一定有一个任意字符的意思
(* 星号 )：代表重复前一个0到无穷多次的意思，为组合形态
“ .* "就代表 0个或者多个任意字符的意思

sed（流编辑器）

NAME
       sed - stream editor for filtering and transforming text

SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
# 马哥： sed 'AddressCommand' file ...
# Address:
1. startLine ，Endline，比如：1，100；$：最后一行；$-1，倒数第二行。
2. /RegExp/，比如：/^root/
$ sed '/oot/d' /etc/fstab
3. /pattern1/,/pattern2/;第一行被pattern1匹配到的行开始，至第一次被pattern2匹配到的行结束，这中间的所有行
4. LineNumber:指定的行
5. StartLine，+N：从startLine开始，向后的N行；

sed： Stream EDitor；行编辑器（全屏编辑器：vi）
sed：模式空间，默认不编辑源文件，仅对模式空间中的进行编辑。

先认识一下sed的基本参数及功能

niaoge.PNG
鸟哥.PNG

以行为单位的新增/删除功能

#将 /etc/passwd 的内容列出且打印行号，同时，请将第 2~5 行删
除！
ubuntu@VM-0-3-ubuntu:~$ nl /etc/passwd | sed '2,5d'
     1  root:x:0:0:root:/root:/bin/bash
     6  games:x:5:60:games:/usr/games:/usr/sbin/nologin
     7  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
     8  lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin

# 只删除第二行
ubuntu@VM-0-3-ubuntu:~$ nl /etc/passwd | sed '2d'
     1  root:x:0:0:root:/root:/bin/bash
     3  bin:x:2:2:bin:/bin:/usr/sbin/nologin

# 若是要删除第 3 到最后一行，则是『nl /etc/passwd | sed '3,$d' 』啦，那个钱字号『$ 』代表最后一行！

# 在第二行后面加入两行字，例如『Drink tea or .....』『drink beer?』
ubuntu@VM-0-3-ubuntu:~$ nl /etc/passwd | sed '2a Drink tea or ......\
> > drink beer ?'
     1  root:x:0:0:root:/root:/bin/bash
     2  daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
Drink tea or ......
> drink beer ?
     3  bin:x:2:2:bin:/bin:/usr/sbin/nologin
# 重点是『我们可以新增不只一行喔！可以新增好几行』但是每一行之间都必须要以反斜杠『\ 』来进行新增的行！所以，上面的例子中，我们可以仅现在第一行癿最后面就有 \ 存在啦！
# \n：可用于换行

部分数据的查找并替换的功能
sed 's/要替换的/新的内容/g',通常配合正则表达式进行学习。
-p:打印的意思，配合-n：禁模
-r FILE：将指定的文件的内容添加至符合条件的行处
w FILE:将地址指定范围内的内容另存至...
s/pattern/string/:查找并替换，默认只替换第一次被模式匹配到的字符串；可以加修饰：g：全局替换；i：忽略大小写；[ s/ / /；s# # #；s@ @ @是一个意思，只是后面两个无需转义]
&：引用模式

ubuntu@VM-0-3-ubuntu:~$ nano sed.sh
ubuntu@VM-0-3-ubuntu:~$ cat sed.sh 
hello,like
hi,my love
# 将like-->liker;love-->lover
ubuntu@VM-0-3-ubuntu:~$ sed 's#l..e#&r#' sed.sh 
hello,liker
hi,my lover
# 或者后向引用：
ubuntu@VM-0-3-ubuntu:~$ sed 's#\(l..e\)#\1r#' sed.sh 
hello,liker
hi,my lover
# 将like-->Like;love-->Love
ubuntu@VM-0-3-ubuntu:~$ sed 's#l\(..e\)#L\1#g' sed.sh 
hello,Like
hi,my Love
ubuntu@VM-0-3-ubuntu:~$ sed 's#l\(..e\)#L\1#' sed.sh 
hello,Like
hi,my Love

awk

相较与 sed 常常作用于一整个行的处理， awk 则比较倾向于一行当中分成数个『字段』来处理。因此，awk 相当的适合处理小型的数据处理！awk 通常运作的模式：
`awk '条件类型 1{动作 1} 条件类型 2{动作 2} ...' filename

ubuntu@VM-0-3-ubuntu:~$ last -n 5 
ubuntu   pts/0        218.70.16.106    Wed Nov  7 14:40   still logged in
ubuntu   pts/1        119.86.113.106   Fri Nov  2 23:18 - 02:05  (02:47)
ubuntu   pts/0        113.205.193.254  Fri Nov  2 21:17 - 01:26  (04:08)
ubuntu   pts/0        218.70.16.106    Fri Nov  2 18:50 - 18:54  (00:04)
ubuntu   pts/0        218.70.16.106    Fri Nov  2 15:40 - 18:47  (03:07)

wtmp begins Fri Nov  2 13:13:25 2018

# 选去第一行、第三列
ubuntu@VM-0-3-ubuntu:~$ last -n 5 | awk '{$1 "\t" $3}'
# 忘了加上print
ubuntu@VM-0-3-ubuntu:~$ last -n 5 | awk '{print $1 "\t" $3}'
ubuntu  218.70.16.106
ubuntu  119.86.113.106
ubuntu  113.205.193.254
ubuntu  218.70.16.106
ubuntu  218.70.16.106
# 默认的字符的分隔符为空格键或[TAB]键

awk的逻辑判断

# 在 /etc/passwd 当中是以冒号 ":"来作为字段癿分隔，
# 该档案中第一字段为账号，第三字段则是 UID。
# 那假设我要查阅，第三栏小于 10 以下的数据，并且仅列出账号的第三栏， 那么可以这样做：
ubuntu@VM-0-3-ubuntu:~$ cat /etc/passwd | awk '{FS=":"} $3 < 10 {print $1 "\t" $3}'
root:x:0:0:root:/root:/bin/bash 
daemon  1
bin 2
sys 3
sync    4
games   5
man 6
lp  7
mail    8
news    9
# 第一行并未正确显示：
# 这是因为我们读入第一行的时候，那些发数 $1,$2... 默认还是以空格键为分隔的，所以虽然我们定义了 FS=":" 了， 但是即仅能在第二行后才开始生效。那么怎么办呢？我们可以预先设定 awk 的变量， 利用 BEGIN 这个关键：
ubuntu@VM-0-3-ubuntu:~$ cat /etc/passwd | awk 'BEGIN {FS=":"} $3 < 10 {print $1 "\t" $3}'
root    0
daemon  1
bin 2
sys 3
sync    4
games   5
man 6
lp  7
mail    8
news    9