2019-05-28每周学习笔记

作者: 李章文 | 来源:发表于2019-05-28 23:13 被阅读0次

2019-05-28每周学习笔记
Java初探学习笔记（W1）
如何在私塾“行走江湖”
2019-05-28 观《照相师》有感
《分布式机器学习》笔记1-机器学习基础
《新学霸社群》规则
六项精进打卡day290-2019.05.29
又进一步
听李玫瑾教授《从心理扶养到早期教育》课后感❤️
沪江网校体验报告

续上周：

In Bash scripts, subshells (written with parentheses) are convenient ways to group commands. A common example is to temporarily move to a different working directory, e.g.

    # do something in current dir
    (cd /some/other/dir && other-command)
    # continue in orignial dir

In Bash, note there are lots of kinds of variable expansion. Checking a variable exitsts: ${name:?error message}. For example, if a Bash script requires a single argument, just write input_file=${1:?usage: $0 input_file}. Using a default value if a variable is empty: ${name:-default}. If you want to have an additional (optional) parameter added to the previous example, you can use something like output_file=${2:-logfile}. If $2 is omitted and thus empty, output_file will be set to logfile. Arithmetic expansion: i=$(( (i+1) % 5 )). Sequences: {1..10}. Trimming of strings: ${var%suffix} and ${var#prefix}. For example if var=foo.pdf, then echo ${var%.pdf}.txt prints foo.txt.
Brace expansion using {...} can reduce having to re-type similar text and automate combinations of items. This is helpful in examples like mv foo.{txt,pdf} some-dir (which moves both files), cp somefile{,.bak} (which expands to cp somefile somefile.bak) or mkdir -p test-{a,b,c}/subtest-{1,2,3} (which expands all possible combinations and creates a directory tree). Brace expansion is performed before any other expansion.
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion. (For example, a range like {1..20} cannot be expressed with variables using {$a..$b}. Use seq or a for loop instead, e.g., seq $a $b or for((i=a; i<=b; i++)); do ...; done.)
The output of a command can be treated like a file via <(some command) (known as process substitution). For example, compare local /etc/hosts with a remote one:

    diff /etc/hosts <(ssh somehost cat /etc/hosts)

When writing scripts you may want to put all of your code in curly braces. If the closing brace is missing, your script will be prevented from executing due to a syntax error. This makes sense when your script is going to be downloaded from the web, since it prevents partially downloaded scripts from executing:

    {
        # Your code here
    }

A "here document" allows redirection of multiple lines of input as if from a file:

cat <<EOF
input
on multiple lines
EOF

In Bash, redirect both standard output and standard error via: some-command >logfile 2>&1 or some-command &>logfile. Often, to ensure a command does not leave an open file handle to standard input, tying it to the terminal you are in, it is also good practice to add </dev/null.
Use man ascii for a good ASCII table, with hex and decimal values. For general encoding info, man unicode, man utf-8, and man latin1 are helpful.

-Use screen or [tmux](https://tmux.github.io) to multiplex the screen, especially useful on remote ssh sessions and to detach and re-attach to a session. byobu can enhance screen or tmux by providing more information and easier management. A more minimal alternative for session persistence only is [dtach](https://github.com/bogner/dtach).

In ssh, knowing how to port tunnel with -L or -D (and occasionally -R) is useful, e.g. to access web sites from a remote server.
It can be useful to make a few optimizations to your ssh configuration; for example, this ~/.ssh/config contains settings to avoid dropped connections in certain network environments, uses compression (which is helpful with scp over low-bandwidth connections), and multiplex channaels to the same server with a local control file:

    TCPKeepAlive=yes
    ServerAliveInterval=15
    ServerAliveCountMax=6
    Compression=yes
    ControlMaster auto
    ControlPath /tmp/%r@%h:%p
    ControlPersist yes

A few other options relevant to ssh are security sensitive and should be enables with care, e.g. per subnet or host or in trusted networks: StrictHostKeyChecking=no, ForwardAgent=yes
Consider [mosh](https://mosh.mit.edu) an alternative to ssh that uses UDP, avoiding dropped connecitons and adding convenience on the road (requires server-side setup).
To get the permissions on a file in octal form, which is useful for system configuration but not available in ls and easy to bungle, use something like

    stat -c '%A %a %n' /etc/timezone

For interactive selection of values from the output of another command, use [percol](https://github.com/mooz/percol) or [fzf](https://github.com/junegunn/fzf).
For interaction with files based on the output of another command (like git), use fpp (PathPicker).
For a simple web server for all files in the current directory (and subdirs), available to anyone on your network, use: python -m SimpleHTTPServer 7777 (for port 7777 and Python 2) and python -m http.server 7777 (for port 7777 and Python 3).
For running a command as another user, use sudo. Defaults to running as root; use -u to specify another user. Use -i to login as that user (you will be asked for your password).
For switching the shell to another user, use su username or su - username. The latter with "-" gets an environment as if another user just logged in. Omitting the username defaults to root. You will be asked for the password of the user you are switching to.
Know about the 128K limit on command lines. This "Argument list too long" error is common when wildcard matching large numbers of files. (When this happens alternatives liek find and xargs may help.)
For a basic calculator (and of course access to Python in general), use the python interperter. For example,

>>> 2+3
5

Processing files and data

To locate a file by name in the current directory, find . -iname '*something*' (or similar). To find a file anywhere by name, use locate something (but bear in mind updatedb may not have indexed recently created files).
For general searching through source or data files, there are several options more advanced or faster than grep -r, including （in rough order from older to newer) [ack](https://github.com/beyondgrep/ack2), [ag](ttps://github.com/ggreer/the_silver_searcher) ("the silver searcher"), and [rg](https://github.com/BurntSushi/ripgrep) (ripgrep).
To convert HTML to text: lynx -dump -stdin
For Markdown, HTML, and all kinds of document conversion, try [pandoc](http://pandoc.org/). For example, to convert a Markdown document to Word format: pandoc README.md --from markdown --to docx -o temp.docx
If you must handle XML, xmlstarlet is old but good.
For JSON, use [jq](http://stedolan.github.io/jq/). For interactive use, also see [jid](https://github.com/simeji/jid) and [jiq](https://github.com/fiatjaf/jiq).
For YAML, use [shyaml](https://github.com/0k/shyaml).
For Excel or CSV files, [csvkit](https://github.com/onyxfish/csvkit) provides in2csv, csvcut, csvjoin, csvgrep, etc.
For Amazon S3, [s3cmd](https://github.com/s3tools/s3cmd) is convenient and [s4cmd](https://github.com/bloomreach/s4cmd) is faster. Amazon's [aws](https://github.com/aws/aws-cli) and the improved [saws](https://github.com/donnemartin/saws) are essential for other AWS-related tasks.
Know about sort and uniq, including uniq's -u and -d options -- see one-liners below. See also comm.
Know about cut, paste, and join to manipulate text files. Many people use cut but forget about join.
Know about wc to count newlines (-l), characters (-m), words (-w) and bytes(-c).
Know about tee to copy from stdin to a file and also to stdout, as in ls -al | tee file.txt.
For more complex calculations, including grouping, reversing fields, and statistical calculations. consider [datamash](https://www.gnu.org/software/datamash/).
Know that locale affects a lot of command line tools in subtle ways, including sorting order (collation) and performance. Most Linux installations will set LANG or other locale variables to a local setting like US English. But be aware sorting will change if you change locale. And know i18n routines can make sort or other commands run many times slower. In some situations (such as the set operations or uniqueness operations below) you can safely ignore slow i18n routines entirely and use traditional byte-based sort order, using export LC_ALL=C.
You can set a specific command's environment by prefixing its invocation with the environment variable settings, as in TZ=Pacific/Fiji date.
Know basic awk and sed for simple data munging. See [One-liners](#one-liners) for examples.
To replace all occurrences of a string in place, in one or more files:

    perl -pi.bak -e 's/old-string/new-string/g' my-files-*.txt

To rename multiple files and/or search and replace within files, try [repren](https://github.com/jlevy/repren). (In some cases the rename command also allows multiple renames, but be careful as its functionality is not the same on all Linux distributions.)

    # Full rename of filenames, directories, and contents foo -> bar:
    repren --full --preserve-case --from foo --to bar .
    # Recover backup files whatever.bak -> whatever:
    repren --renames --from '(.*)\.bak' --to '\1' *.bak
    # Same as above, using rename, if available:
    rename 's/\.bak$//' *.bak

As the man page says, rsync really is a fast and extraordinarily versatile file copying tool. It's known for synchronizing between machines but is equally useful locally. When security restrictions allow, using rsync instead of scp allows recovery of a transfer without restarting from scratch. It also is among the fastest ways to delete large numbers of files:

mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir

For monitoring progress when processing files, use [pv](http://www.ivarch.com/programs/pv.shtml), [pycp](https://github.com/dmerejkowsky/pycp), [pmonitor](https://github.com/dspinellis/pmonitor), [progress](https://github.com/Xfennec/progress), rsync --progress, or, for block-level copying, dd status=progress.
Use shuf to shuffle or select random lines from a file.
Know sort's options. For numbers, use -n, or -h for handling human-readable numbers (e.g. from du -h). Know how keys work (-t and -k). In particular, watch out that you need to write -k1,1 to sort by only the first field; -k1 means sort according to the whole line. Stable sort (sort -s) can be useful. For example, to sort first by field 2, then secondarily by field 1, you can use sort -k1,1 | sort -s -k2,2.
If you ever need to write a tab literal in a command line in Bash (e.g. for the -t argument to sort), press ctrl-v [Tab] or write $'\t' (the latter is better as you can copy/paste it).
The standard tools for patching source code are diff and patch. See also diffstat for summary statistics of a diff and sdiff for a side-by-side diff. Note diff -r works for entire directories. Use diff -r tree1 tree2 | diffstat for a summary of changes. Use vimdiff to compare and edit files.
For binary files, use hd, hexdump or xxd for simple hex dumps and bvi, hexedit or biew for binary editing.
Also for binary files, strings (plus grep, etc.) lets you find bits of text.
For binary diffs (delta compression), use xdelta3.
To convert text encodings, try iconv. Or uconv for more advanced use; it supports some advanced Unicode things. For example:

    # Displays hex codes or actual names of characters (useful for debugging):
    uconv -f utf-8 -t utf-8 -x '::Any-Hex;' < input.txt
    uconv -f utf-8 -t utf-8 -x '::Any-Name;' < input.txt
    # Lowercase and removes all accents (by expanding and dropping them):
    uconv -f utf-8 -t utf-8 -x '::Any-Lower; :: Any-NFD; [:Nonspacing Mark:]>; ::Any-NFC;' < input.txt > output.txt

To split files into pieces, see split (to split by size) and csplit (to split by a pattern).
Date and time: To get the current date and time in the helpful ISO 8601 format, use date -u +"%Y-%m-%dT%H:%M:%SZ" (other options are problematic). To manipulate date and time expressions, use dateadd, datediff, strptime etc. from deteutils.
Use zless, zmore, zcat, and zgrep to operate on compressed files.
File attributes are settable via chattr and offer a lower-level alternative to file permissions. For example, to protect against accidental file deletion the immutable flag: sudo chattr +i /critical/directory/or/file
Use getfacl and setfacl to save and restore file permissions. For example:

    getfacl -R /some/path > permission.txt
    setfacl --restore=permissions.txt

To create empty files quickly, use truncate (creates sparse file), fallocate (ext4, xfs, btrfs and ocfs2 filesystems), xfs_mkfile (almost any filesystems, comes in xfsprogs package), mkfile (for Unix-like systems like Solaris, Mac OS).

System debugging

For web debugging, curl and curl -I are handy, or their wget equivalents, or the more modern httpie.
To know current cpu/disk status, the classic tools are top (or the better htop), iostat, and iotop. Use iostat -mxz 15 for basic CPU and detailed per-partition disk stats and performance insight.
For network connection details, use netstat and ss.
For a quick overview of what's happening on a system, dstat is especially useful. For broadest overview with details, use glances.
To know memory status, run and understand the output of free and vmstat. In particular, be aware the "cached" value is memory held by the Linux kernel as file cache, so effectively counts toward the "free" value.
Java system debugging is a different kettle of fish, but a simple trick on Oracle's and some other JVMs is that you can run kill -3 <pid> and a full stack trace and heap summary(including generational garbage collection details, which can be highly informative) will be dumped to stderr/logs. The JDK's jps, jstat, jstack, jmap are useful. SJK tools are more advanced.
Use mtr as a better traceroute, to identify network issues.
For looking at why a disk is full, ncdu saves time over the usual commands like du -sh *.
To find which socket or process is using bandwidth, try iftop or nethogs.
The ab tool (comes with Apache) is helpful for quick-and-dirty checking of web server performance. For more complex load testing, try siege.
For more serious network debugging, wireshark, tshark, or ngrep.
Know about strace and ltrace. These can be helpful if a program is failing, hanging, or crashing, and you don't know why, or if you want to get a general idea of perfomance. Note the profiling option (-c), and the ability to attach to a running process (-p). Use trace child option (-f) to avoid missing important calls.
Know about ldd to check shared libraries etc - but never run it on untrusted files.
Know how to connect to a running process with gdb and get its stack traces.
Use /proc. It's amazingly helpful sometimes when debugging live problems. Examples: /proc/cpuinfo, /proc/meminfo, /proc/cmdline, /proc/xxx/cwd, /proc/xxx/exe, /proc/xxx/fd/, /proc/xxx/smaps (where xxx is the process id or pid).

2019-05-28每周学习笔记
续上周： In Bash scripts, subshells (written with parentheses...
Java初探学习笔记（W1）
从本周开始写java学习笔记，每周总结一次，总结时间是周日，计划的是每周日不学新的知识，复习总结本周学习的...
如何在私塾“行走江湖”
私塾每周会针对学习内容布置一个作业，大家周日上交。优选会作为【蚂蚁私塾】的每周推送，供一起大家交流学习。读书笔记...
2019-05-28 观《照相师》有感
2019-05-28 星期二晴转雨【早睡早起】昨晚22:40，4：55起床。【早起学习】1.内部文字课程学习...
《分布式机器学习》笔记1-机器学习基础
今天开始更新《分布式机器学习》的系列笔记，保证每周2-3更，大家一起学习啊~~ 第一次笔记是机器学习基础，就简单的...
《新学霸社群》规则
社群名称：新学霸社群社群规则： 1、每周完成一篇学习笔记（读书、课程学习均可，内容不限）。截止时间：周日 2、...
六项精进打卡day290-2019.05.29
2019-05-28 20:54 一、学习与实践 1、付出不亚于任何人的努力。 2、要谦虚，不要骄傲。 3、要每天...
又进一步
2019-05-28 【日精进打卡第419天】【今日学习】诵读《道德经》0遍共5遍《卓越的管理者》《水浒...
听李玫瑾教授《从心理扶养到早期教育》课后感❤️
虽然疫情还未结束，但我们的学习从未中断，园长妈妈每周会在群里发学习内容，学习完然后再写学习笔记，至今仍在持续...
沪江网校体验报告
1.学习报告按自然周显示用户所有的学习记录，包括课程、笔记和提问。这样用户可以看到自己每周的学习成果。点击某个学...

2019-05-28每周学习笔记

Processing files and data

System debugging

相关文章

2019-05-28每周学习笔记

Java初探学习笔记（W1）

如何在私塾“行走江湖”

2019-05-28 观《照相师》有感

《分布式机器学习》笔记1-机器学习基础

《新学霸社群》规则

六项精进打卡day290-2019.05.29

又进一步

听李玫瑾教授《从心理扶养到早期教育》课后感❤️

沪江网校体验报告

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读