clojure 二进制与编码

作者: onedam | 来源:发表于2020-04-11 17:48 被阅读0次

clojure 二进制与编码
Python正式课第十三天
数字逻辑系统总结
python第二章字符编码
13.Python之字符编码
使用runtime 进行归档和解档
object-c 基础十九【NSData】二进制
数据的机器级表示与处理（一）
ARM汇编学习笔记-第一章基础知识
转换流

(fact "二进制: 以0开头的数字是8进制" :binary
      (Integer/toBinaryString 2) => "10"
      (Integer/toBinaryString 010) => "1000"
      (Integer/toBinaryString 0xf) => "1111"
      (Integer/toBinaryString 2r11) => "11"
      2r11 => 3
      [0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf] => [7 8 9 10 11 12 13 14 15]
      (map #(Integer/toBinaryString %) [0 1 2 3 4 5 6 7]) => '("0" "1" "10" "11" "100" "101" "110" "111")
      (map #(format "%03d" (read-string (Integer/toBinaryString %))) [0 1 2 3 4 5 6 7])
      => '("000" "001" "010" "011" "100" "101" "110" "111")
      ;'("000" \坤 "001" \艮 "010" \坎 "011" \巽 "100" \震 "101" \离 "110" \兑 "111" \乾)
      (clojure.pprint/cl-format nil "~{~2r~^ ~}~%" (range 10)) => "0 1 10 11 100 101 110 111 1000 1001\r\n"

      (byte 0x43) => 67
      (byte 0x11) => 17
      (map byte "11ab0") => '(49 49 97 98 48)
      (map #(Integer/toBinaryString %) (map byte "11ab0")) => '("110001" "110001" "1100001" "1100010" "110000")
      (map byte "ascii") => '(97 115 99 105 105)
      (map #(Integer/toBinaryString %) (.getBytes "feng")) => '("1100110" "1100101" "1101110" "1100111")
      (map #(Integer/toHexString %) (.getBytes "feng")) => '("66" "65" "6e" "67")
      (new java.math.BigInteger (.getBytes "1")) => 49
      (hexify "11") => "3131"
      (unhexify "3131") => "11"
      (bit-and 2r1100 2r1001) => 8
      ;00111100 => 37440
      (Integer/toBinaryString 37440) => "1001001001000000"
      (Integer/toBinaryString 11) => "1011"
      (Integer/toBinaryString 1) => "1"
      (Integer/toBinaryString 0) => "0"
      (Integer/toBinaryString (int \a)) => "1100001" (count "1100001") => 7
      (Integer/toBinaryString 2r11) => "11"
      (Integer/toBinaryString 01111) => "1001001001"
      (Integer/toBinaryString 1111) => "10001010111"
      2r110010 => 50
      2r0000110010 => 50
      [2r1 2r10 2r11 2r100 2r101 2r110 2r111 2r1000 2r1001 2r1010] => (range 1 11)
      [(char 2r1010111) (char  2r101000110101111)] => [\W \冯] (count "101000110101111") => 15
      ;utf-8就是Unicode最重要的实现方式之一。另外还有utf-16、utf-32等。UTF-8不是固定字长编码的，而是一种变长的编码方式。
      ; 它可以使用1~4个字节表示一个符号，根据不同的符号而变化字节长度。这是种比较巧妙的设计，
      ; 如果一个字节的第一位是0，则这个字节单独就是一个字符；如果第一位是1，则连续有多少个1，就表示当前字符占用多少个字节。
      (Integer/toBinaryString 229) => "11100101"
      (Integer/toBinaryString 134) => "10000110"
      (Integer/toBinaryString 175) => "10101111"
      ;utf 8 是东亚字符用3个byte保存的 从iotest 中 读取assii码.txt中的需要用utf8解码
      (String. (byte-array [229 134 175]) "utf-8") => "冯" ;确实用了三个字节 在 ast.iotest 52 中读取出来的自己将就是
      (String. (byte-array [229 134 175])) => "冯" ;java启动的时候我指定 默认就是utf-8
      (String. (byte-array [-27 -122 -81])) => "冯"
      (Integer/toBinaryString (long (first (.toCharArray "冯")))) => "101000110101111"
      (vec (.getBytes "冯" "utf-16")) => [-2 -1 81 -81]
      (vec (.getBytes "冯" "utf-32")) => [0 0 81 -81]
      (vec (.getBytes "冯" )) => [-27 -122 -81]
      (vec (.getBytes "f" )) => [102]
      ;上面 getbytes 为负数的时候  0xff & b
      [(bit-and 0xff -27) (bit-and 0xff -122) (bit-and 0xff -81)] => [229 134 175]
      (map #(Integer/toBinaryString %) (.getBytes "冯")) => '("11111111111111111111111111100101"
                                                             "11111111111111111111111110000110"
                                                             "11111111111111111111111110101111")
      (vec (.getBytes "冯" "UTF-8")) => [-27 -122 -81]
      (vec (.getBytes "冯" "gbk")) => [-73 -21]
      (vec (.getBytes "1")) => [49]
      (Integer/toBinaryString 1) => "1"
      (Integer/toBinaryString (first (.getBytes "1"))) => "110001"
      (Integer/toBinaryString (int \1)) => "110001"
      (Integer/toBinaryString -27) => "11111111111111111111111111100101"
      (Integer/toBinaryString -5) => "11111111111111111111111111111011"
      (count "11111111111111111111111111100101") => 32
      (type 1111) => java.lang.Long
      (type 01111) => java.lang.Long
      (type 2r1111) => java.lang.Long
      (Integer/toBinaryString 00111100) => "1001001001000000"
      )

;正数是以正码的形式存储
;负数是以补码的形式存储
;假设有一个 int （32位）类型的数，值为5，那么，我们知道它在计算机内存中表示为：
;00000000 00000000 00000000 00000101
;在硬件底层，只有加法器，没有减法器，为什么呢？因为减法在计算机底层也是加法运算，原因就在于补码可以直接运算。
;所以Java中Integer.toBinaryString(-5)结果为11111111111111111111111111111011. Integer是32位(bit)的.
;java byte:1个字节 8位 -128~127
;short ：2个字节 16位
;int ：4个字节 32位
;long：8个字节 64位
;浮点型：
;float：4个字节 32 位
;double ：8个字节 64位
;char 的话需要考虑编码接着，要分清内码（internal encoding）和外码（external encoding）。
;内码 :某种语言运行时，其char和string在内存中的编码方式。
;外码 :除了内码，皆是外码。
;要注意的是，源代码编译产生的目标代码文件（可执行文件或class文件）中的编码方式属于外码。
; 总结：
;java中内码（运行内存）中的char使用UTF16的方式编码，一个char占用两个字节，但是某些字符需要两个char来表示。
; 所以，一个字符会占用2个或4个字节。
;java中外码中char使用UTF8的方式编码，一个字符占用1～6个字节。
;UTF16编码中，英文字符占两个字节；绝大多数汉字（尤其是常用汉字）占用两个字节，个别汉字（在后期加入unicode编码的汉字，一般是极少用到的生僻字）占用四个字节。
;UTF8编码中，英文字符占用一个字节；绝大多数汉字占用三个字节，个别汉字占用四个字节。