Storage layer:
Two types of storage (not b+ tree, that's 2-3 steps up for indexing)
- HDD
process to read data from HDD
- move disk arm to connect cylinder (1-3ms)
- wait for bits we want to rotate under disk heads. (10^4 rpm / 60 seconds * 2 ~ 1/300 seconds ~3ms)
- read the data (mostly costless! seq scan @ 100MB/sec)
--- the first bit takes millions time longer than the following sequential scan.
Because HDDs have physical limitations, SSDs get popular.
- SSD
- all electronic, no moving parts;
- 5GB/sec seq transfer rate at high end.
- 10K "seeks" (refers to 1. 2. in HDD) per second high end (0.1ms).
- cost e.g. Samsung drive 15 TB. $$5k to 7k
HDD ~$1/GB, SSD ~$1/GB
15 TB ~150 HDD ~1SSD
- aggregated performance are actually similar!
- Latency doesn't change much
Given that latency is a huge concern, how to mitigate?
- <hardware technique> organize data on the device into blocks (atomic units.... get 1bit, get all bits of it) "block" refers to a group of bits on the device.
- <software technique> organize blocks into pages.
*<software> put related data into pages. -- e.g. b+ tree organize data with same key (e.g. time stamp) into the same pages. - <software>Use buffering -- keep important data to the lower level of system (close to CPU)
next time topics --- buffer:
网友评论