How it works(18) Geotrellis是如何读取

作者: 默而识之者 | 来源:发表于2021-05-23 21:32 被阅读0次

How it works(18) Geotrellis是如何读取
How it works(17) Geotrellis是如何读取
How it works(16) Geotrellis是如何读取
How it works(19) Geotrellis是如何读取
How it works(20) Geotrellis是如何在S
How it works(21) Geotrellis是如何在S
How it works(23) Geotrellis是如何在S
How it works(22) Geotrellis是如何在S
How it works(25) Geotrellis是如何在S
How it works(26) Geotrellis是如何在S

1. 引入

上一篇我们讨论了Geotrellis如何设计底层的数据类型模型,Geotrellis实际上如何从tiff文件中将数据读取出来呢?

我们再次回顾下方的类结构图:

绿色的为类继承
红色的为特征实现

可以发现,UInt32GeotiffTile类中引入的特质大部分与两类行为有关:

与Segment相关的特质
与宏相关的特质

我们首先讨论与Segment相关的特质.在引入对Segment模型的解析之前,需要补充Geotiff中数据排列布局的相关知识.

2. 图像数据在tiff文件中的排布方式

官方文档对于Tiff文件的数据结构有了一定的描述,
图像数据在Tiff文件中有两种排布方式:

条带式排布(Striped)
瓦片式排布(Tiled)

这里也有更详细的描述.

2.1 条带式排布

顾名思义,在条带式排布的tiff文件中,数据的存储粒度为条:文件中的图像数据被分割为若干数据条,一个条带即定义为包含固定行数,具有一定大小的数据条.若干数据条的组合即是全部的图像数据.

条带式排布采用3个TiffTag参数描述:

RowsPerStrip:条带中包含的行数.图像中的每个条带在其中必须具有相同数量的行，但在某些情况下除外(如最后一行).
StripOffsets:偏移量表，显示每个条带在tiff文件中的起始位置.
- StripOffsets并不限制顺序,这意味着条带可以以任意顺序出现.
- 某些阅读器读取出来的Tiff文件是一条一条不连续的垃圾数据,可能就是为了加快速度,假定条带按照顺序存储,而不是根据实际偏移量表来读取的.
StripByteCounts:条带大小数组,描述每个条带以字节为单位的大小.

条带式排布有如下优点:

只需将所需要的条带读入内存,可以节省内存使用.
由于存在偏移表,可以更方便的随机访问数据.

条带式的缺点:

当读取一小部分数据,但该数据跨越量多行时,读取冗余会比较大.

2.2 瓦片式排布

Tiff 6.0引入了瓦片式排布.可以理解为具有宽度和高度的2d条带,是当前更为常见的排布方式.

瓦片式排布需要4个TiffTag参数描述:

TileWidth/TileLength:类似于RowsPerStrip.必须是16的倍数,但两者不必相等.
TileOffsets:类似于StripOffsets.
TileByteCounts:类似于StripByteCounts.

相比条带式排布,瓦片式的布局粒度更低,对于局部数据的获取成本更低,且有利于数据压缩.

2.3 两种模型的区别与联系

对于Geotiff数据,无论数据源是采用何种排布方式,其实对数据进行访问都可以归纳为一种模式,即:

根据偏移量和字节数,遍历每一个条带/瓦片.

所不同的是需要区分排布方式,制定计算具体坐标的方式,毕竟我们实际的绝大多数操作,都是针对具体位置的,而非一个条带/瓦片的全部字节流.

这就是为何Geotrellis要设计Segment数据模型.

3. Segment模型概况

在代码中我们能找到若干包含Segment命名的对象结构,它们的主要功能如下(CellType指的是不同数据类型都有对应的对象):

SegmentByte:定义了从ByteReader中读取Byte数据的功能
- LazySegmentBytes:实现惰性按需读取数据的功能
- ArraySegmentBytes:实现直接读取全部数据的功能
GeotiffSegment:定义了对Segment抽象的get/map逻辑
- CelltypeGeotiffSegment:实现了map方法
  - CellTypeWithNodataGeotiffSegment:实现了get方法
GeotiffSegmentCollection:定义了从Byte获取/遍历Segment的逻辑
- CellTypeGeotiffSegmentCollection:实现了从Byte数据解压为对应数据类型的方法
GeoTiffSegmentLayout:定义从行列号定位Segment序号的抽象逻辑
SegmentTransform:定义通过行列号定位到Segment中对应数值序号的逻辑
- StripedSegmentTransform:实现在条带式排布下的定位方法
- TiledSegmentTransform:实现在瓦片式排布下的定位方法
GeoTiffSegmentLayoutTransform:定义访问Segment中具体值的一系列逻辑

它们的包含关系大致是:

GeotiffSegmentCollection包含:
- SegmentByte
- GeotiffSegment
GeoTiffSegmentLayoutTransform包含:
- SegmentTransform
- GeoTiffSegmentLayout

我们先从实现读取Byte数据到具体类型的功能说起.

4. 实现读取Byte数据到具体类型的功能

4.1 GeotiffSegmentCollection特质

从继承结构图中,我们可以看到Uint32GeotiffTile实现了Uint32GeotiffSegmentCollection的特质,而该特质又继承自GeotiffSegmentCollection.

代码如下:

trait GeoTiffSegmentCollection {
  
  type T >: Null <: GeoTiffSegment

  val segmentBytes: SegmentBytes
  val decompressor: Decompressor
  val bandType: BandType

  // 预定义解压函数,从(Int, Array[Byte])转换为GeoTiffSegment对象
  val decompressGeoTiffSegment: (Int, Array[Byte]) => T

  // 缓存上一次的调用值
  private var _lastSegment: T = null
  private var _lastSegmentIndex: Int = -1

  // 根据SegmentIndex获取对应的Segment
  def getSegment(i: Int): T = {
    if(i != _lastSegmentIndex) {
      _lastSegment = decompressGeoTiffSegment(i, segmentBytes.getSegment(i))
      _lastSegmentIndex = i
    }
    _lastSegment
  }

  // 迭代获取Segment
  def getSegments(ids: Traversable[Int]): Iterator[(Int, T)] = {
    for { (id, bytes) <- segmentBytes.getSegments(ids) }
      yield id -> decompressGeoTiffSegment(id, bytes)
  }
}

trait UInt32GeoTiffSegmentCollection extends GeoTiffSegmentCollection {
  type T = UInt32GeoTiffSegment

  val bandType = UInt32BandType
    
  // 定义具体的解压函数
  lazy val decompressGeoTiffSegment =
    (i: Int, bytes: Array[Byte]) => new UInt32GeoTiffSegment(decompressor.decompress(bytes, i))
}

我们重点关注GeoTiffSegmentCollection中的核心方法:getSegment,它定义了一个重要的功能:通过Segment序列号,取得一个Segment对象.

通过分析语法结构,其逻辑为:

从数据源中读取原始的压缩过的Byte数据.
将其解压为未压缩的数据.
将未压缩的Byte数据装入指定类型的GeotiffSegment对象中,并返回.

每个步骤都对应一个字段/方法:

segmentBytes字段:实现了读取Byte的功能.[定义于segmentByte类]
decompressor字段:将压缩后的Byte数据解压的功能.[定义于Decompressor类]
decompressGeoTiffSegment方法:将解压后的Byte数据转换为Geotiff文件实际的类型(在本例中为Uint32),最终得到一个UInt32GeoTiffSegment对象[定义于UInt32GeoTiffSegment类]

我们先从SegmentBytes类开始,看看Geotrellis是如何实现其中的逻辑.

4.2 SegmentBytes特质

回顾一下GeotiffTile的构造函数,可见segmentBytes来自于构造函数传入的GeotiffInfo对象:

// 调用构造函数
GeoTiffTile(
    info.segmentBytes, //传入
    info.decompressor,
    info.segmentLayout,
    info.compression,
    info.cellType,
    Some(info.bandType),
    info.overviews.map(geoTiffSinglebandTile)
)

object GeoTiffTile {
  def apply(
    segmentBytes: SegmentBytes, // 定义形参
    decompressor: Decompressor,
    segmentLayout: GeoTiffSegmentLayout,
    compression: Compression,
    cellType: CellType,
    bandType: Option[BandType] = None,
    overviews: List[GeoTiffTile] = Nil
  ): GeoTiffTile = {
    bandType match {
      case Some(UInt32BandType) =>
        cellType match {
          case ct: FloatCells =>
            new UInt32GeoTiffTile(
              segmentBytes, // 传入
              decompressor,
              segmentLayout,
              compression,
              ct,
              overviews.map(applyOverview(_, compression, cellType, bandType)).collect { case gt: UInt32GeoTiffTile => gt }
            )
    // ... 省略

segmentBytes的实际赋值:

// 在GeoTiffInfo的定义中

val segmentBytes: SegmentBytes =
  if (streaming)
    LazySegmentBytes(byteReader, tiffTags)
  else
    // byteReader共用了读取tiffTag的byteReader
    // tiffTags此时已经读取完毕
    ArraySegmentBytes(byteReader, tiffTags)

我们先来看SegmentBytes特质的定义:

trait SegmentBytes extends Seq[Array[Byte]] with Serializable {
  def getSegment(i: Int): Array[Byte]
  def getSegments(indices: Traversable[Int]): Iterator[(Int, Array[Byte])]
  def getSegmentByteCount(i: Int): Int
  def apply(idx: Int): Array[Byte] = getSegment(idx)
  def iterator: Iterator[Array[Byte]] =
    getSegments(0 until length).map(_._2)
}

可以看出:

SegmentBytes可以看成字节数组的序列,每一个字节数组可以看做一个条带/瓦片,若干字节数组组成的序列就形成全部图像数据.
SegmentBytes要实现的功能是单个或迭代获取字节序列(条带/瓦片)

因为我们总是以byte字节的形式从存储介质中读取数据,因此SegmentByte是整个Segment模型的数据源头

我们再来看一下实现SegmentBytes特质的ArraySegmentBytes类和LazySegmentBytes类的定义

4.2.1 LazySegmentBytes类

LazySegmentBytes是从ByteReader中读取数据到内存中的类:

class LazySegmentBytes(
  byteReader: ByteReader,
  tiffTags: TiffTags,
  maxChunkSize: Int = 32 * 1024 * 1024,
  maxOffsetBetweenChunks: Int = 1024
) extends SegmentBytes {

  import LazySegmentBytes.Segment

  def length: Int = tiffTags.segmentCount
    
  // 通过区分两种排布方式获取对应的偏移量表和字节数表
  val (segmentOffsets, segmentByteCounts) =
    if (tiffTags.hasStripStorage) {
      val stripOffsets = tiffTags &|->
        TiffTags._basicTags ^|->
        BasicTags._stripOffsets get
      val stripByteCounts = tiffTags &|->
        TiffTags._basicTags ^|->
        BasicTags._stripByteCounts get
      (stripOffsets.get, stripByteCounts.get)
    } else {
      val tileOffsets = tiffTags &|->
        TiffTags._tileTags ^|->
        TileTags._tileOffsets get
      val tileByteCounts = tiffTags &|->
        TiffTags._tileTags ^|->
        TileTags._tileByteCounts get
      (tileOffsets.get, tileByteCounts.get)
    }

  def getSegmentByteCount(i: Int): Int = segmentByteCounts(i).toInt

  // 将Segment打包为缓冲块
  protected def chunkSegments(segmentIds: Traversable[Int]): List[List[Segment]]  = {
    {for { id <- segmentIds } yield {
      // 记录每一个Segment的起始字节位置和长度信息,但不读取实际值
      val offset = segmentOffsets(id)
      val length = segmentByteCounts(id)
      Segment(id, offset, offset + length - 1)
    }}.toSeq
      .sortBy(_.startOffset) // 因为Geotiff并没有强制要求每个数据块按顺序存储,因此需要保证按从小到大的顺序排序,以符合一般阅读逻辑
      .foldLeft((0L, List(List.empty[Segment]))) { case ((chunkSize, headChunk :: commitedChunks), seg) =>
      // chunkSize: 当前块的大小
      // headChunk: 当前块集合的第一个块,也是最新追加的块,是一个List[Segment]
      // commitedChunks: 除第一个以外的元素
      // seg:每一个传入的Segment对象

      // 是否应该开启新块的判断
      val isSegmentNearChunk =
        // 当为第一个块时,headChunk没有数据,为Nil,使用headOption比较安全
        headChunk.headOption.map { c =>
          // 检测最新添加的元素是否过大
          seg.startOffset - c.endOffset <= maxOffsetBetweenChunks
        }.getOrElse(true) // 当调用时没有数据,也认为在缓冲块内

      // 大小和偏移量都没有越界的话
      if (chunkSize + seg.size <= maxChunkSize && isSegmentNearChunk)
        // 继续往当前的最新块内追加Segment
        (chunkSize + seg.size) -> ((seg :: headChunk) :: commitedChunks)
      else
        // 开一个新块,该块内首元素就是最新的Segment
        seg.size -> ((seg :: Nil) :: headChunk :: commitedChunks)
    }
  }._2.reverse.map(_.reverse) // 这里有两个逆序:块的逆序和每个块内Segment逆序,因为都是通过首追加的方式构造的


  // 不采用块的模式,直接读取数据
  def getSegment(i: Int): Array[Byte] = {
    val startOffset = segmentOffsets(i)
    val endOffset = segmentOffsets(i) + segmentByteCounts(i) - 1
    getBytes(startOffset, segmentByteCounts(i))
  }
  
  // 读取每一个块中的每一个Segment中的Byte数据
  protected def readChunk(segments: List[Segment]): Map[Int, Array[Byte]] = {
    segments
      .map { segment =>
        segment.id -> getBytes(segment.startOffset, segment.endOffset - segment.startOffset + 1)
      }
      .toMap
  }

  // 返回一个可以遍历全部块中Byte数据的迭代器
  def getSegments(indices: Traversable[Int]): Iterator[(Int, Array[Byte])] = {
    val chunks = chunkSegments(indices)
    chunks
      .toIterator // 转换成迭代器,实现lazy模式
      .flatMap(chunk => readChunk(chunk)) // 每一个迭代读取一个chunk
  }

  // 实际读取Byte数据的方法
  private[geotrellis] def getBytes(offset: Long, length: Long): Array[Byte] = {
    byteReader.position(offset)
    byteReader.getBytes(length.toInt)
  }

}

object LazySegmentBytes {
  def apply(byteReader: ByteReader, tiffTags: TiffTags): LazySegmentBytes =
    new LazySegmentBytes(byteReader, tiffTags)

  // Segment的逻辑结构
  case class Segment(id: Int, startOffset: Long, endOffset: Long) {
    def size: Long = endOffset - startOffset + 1
  }
}

顾名思义,LazySegmentBytes类实现了一种以数据块为滑动窗口的读取形式,以懒加载的形式从文件中读取二进制流方法getSegments,其步骤可描述为:

将原始文件中的全部条带/瓦片的大小和偏移量信息记录于一个Segment对象中.
如果连续的多个Segment同时满足两个条件,即其中记录的条带/瓦片大小之和不超过32MB且首尾Segment间记录的偏移量之差也不超过1000时,就将这些Segment合并为一个块(List[Segment]),即chunk.
最终形成一个包含若干块的列表List(List[Segment]).
读取时将块列表转换为迭代器.每个迭代器返回一个块.
因为迭代器的Lazy特性,只有在获取每个迭代元素的时候才真正的执行与其相关的代码.因此将读取Byte的代码与每个迭代相关联,则读取数据也只发生在迭代到具体块时,这就能实现每次读取到内存的数据不超过块限定的最大数据(默认值为32MB).这样就能在效率和资源占用中取得一个平衡.
- 如果没有懒加载,自动将全部数据读取到内存,就会造成双倍的内存占用.浪费了资源.
- 如果不将Segment聚合为块,虽然内存节省的更多,但频繁的IO上下文切换可能会影响效率.

4.2.2 ArraySegmentBytes类

ArraySegmentBytes是直接从内存中读取数据的类,是对LazySegmentBytes类的再封装:

class ArraySegmentBytes(compressedBytes: Array[Array[Byte]]) extends SegmentBytes {

  def length = compressedBytes.length
  def getSegment(i: Int) = compressedBytes(i)
  def getSegmentByteCount(i: Int): Int = compressedBytes(i).length
  def getSegments(indices: Traversable[Int]): Iterator[(Int, Array[Byte])] =
    indices.toIterator
      .map { i => i -> compressedBytes(i) }
}

object ArraySegmentBytes {

  def apply(byteReader: ByteReader, tiffTags: TiffTags): ArraySegmentBytes = {
    // 通过LazySegmentBytes类直接将指定文件的全部数据读取到内存中
    val streaming = LazySegmentBytes(byteReader, tiffTags)
    val compressedBytes = Array.ofDim[Array[Byte]](streaming.length)
    streaming.getSegments(compressedBytes.indices).foreach {
      case (i, bytes) => compressedBytes(i) = bytes
    }
    new ArraySegmentBytes(compressedBytes)
  }
}

4.3 Decompressor类

Decompressor类定义了解压Byte数据的逻辑,也来自GeotiffTile的构造函数传入的GeotiffInfo对象.

数据一般是压缩后存入Tiff文件的,因此在实际读取时,需要先解压.在这里可以看见默认支持的压缩算法.当然我们无需去关注压缩/解压方法的具体实现,因为它们是标准的通用算法.我们只需关注它们是如何与Geotrellis的逻辑交互的.

Decompressor的构造函数如下:

object Decompressor {
  def apply(tiffTags: TiffTags, byteOrder: ByteOrder): Decompressor = {
    import geotrellis.raster.io.geotiff.tags.codes.CompressionType._

    // 检测字节序
    def checkEndian(d: Decompressor): Decompressor = {
      // ByteBuffer默认为大端序列,如果数据是小端序列,需要翻转
      if(byteOrder != ByteOrder.BIG_ENDIAN && tiffTags.bitsPerPixel > 8) {
        d.flipEndian(tiffTags.bytesPerPixel / tiffTags.bandCount)
      } else { d }
    }

    // 检测预测器
    def checkPredictor(d: Decompressor): Decompressor = {
      val predictor = Predictor(tiffTags)
      if(predictor.checkEndian)
        checkEndian(d).withPredictor(predictor)
      else { d.withPredictor(predictor) }

    val segmentCount = tiffTags.segmentCount
    val segmentSizes = Array.ofDim[Int](segmentCount)
    val bandCount = tiffTags.bandCount
    if(!tiffTags.hasPixelInterleave || bandCount == 1) {
      cfor(0)(_ < segmentCount, _ + 1) { i =>
        segmentSizes(i) = tiffTags.imageSegmentByteSize(i).toInt
      }
    } else {
      cfor(0)(_ < segmentCount, _ + 1) { i =>
        segmentSizes(i) = tiffTags.imageSegmentByteSize(i).toInt * tiffTags.bandCount
      }
    }

    // 根据元数据中定义的压缩类型选择解压器
    tiffTags.compression match {
      case Uncompressed =>
        checkEndian(NoCompression)
      case LZWCoded =>
        checkPredictor(LZWDecompressor(segmentSizes))
      case ZLibCoded | PkZipCoded =>
        checkPredictor(DeflateCompression.createDecompressor(segmentSizes))
      case PackBitsCoded => // PackBits压缩方式不支持预测器
        checkEndian(PackBitsDecompressor(segmentSizes))
      case JpegCoded => // 有损压缩,无预测器概念
        checkEndian(JpegDecompressor(tiffTags))

      // 
      case HuffmanCoded =>
        val msg = "compression type CCITTRLE is not supported by this reader."
        throw new GeoTiffReaderLimitationException(msg)
      // ... 省略若干不支持的压缩方式
    }
  }
}

有关预测器(predictor),可以在这里了解详细信息.
这里也有关于Tiff文件压缩的讨论.

4.4 GeotiffSegment类及其继承类

压缩后的数据从SegmentByte中被读取,从Decompressor中被解压为原始的Byte类型值,最终在GeotiffSegment中被转换为实际数据类型值.

根据上一篇数据模型模型,Geotrellis因为涉及到Nodata值的定义,因此有7*3+1种实际的数据类型.因此需要与Celltype对应的CelltypeGeotiffSegment.

以Float32类型为例,看一下GeotiffSegment如何实现其功能:

// 抽象的GeotiffSegment,只预定义方法,没有实现
trait GeoTiffSegment {
  def size: Int
  def getInt(i: Int): Int // 获取指定数据
  def getDouble(i: Int): Double

  def bytes: Array[Byte]

  def map(f: Int => Int): Array[Byte] 
  def mapDouble(f: Double => Double): Array[Byte]
  def mapWithIndex(f: (Int, Int) => Int): Array[Byte]
  def mapDoubleWithIndex(f: (Int, Double) => Double): Array[Byte]
}

// 针对float32类型,实现部分预定义方法
abstract class Float32GeoTiffSegment(val bytes: Array[Byte]) extends GeoTiffSegment {
  protected val buffer = ByteBuffer.wrap(bytes).asFloatBuffer
  // float32占用4字节
  val size: Int = bytes.size / 4

  // 直接获取数据
  def get(i: Int): Float = buffer.get(i)

  def getInt(i: Int): Int
  def getDouble(i: Int): Double
  protected def intToFloatOut(v: Int): Float
  protected def doubleToFloatOut(v: Double): Float

  // 实现了map操作的相关方法
  def map(f: Int => Int): Array[Byte] = {
    val arr = Array.ofDim[Float](size)
    // 以Int类型获取全部数据
    cfor(0)(_ < size, _ + 1) { i =>
      arr(i) = intToFloatOut(f(getInt(i)))
    }
    // 将结果值存回Byte数组
    val result = new Array[Byte](size * FloatConstantNoDataCellType.bytes)
    val bytebuff = ByteBuffer.wrap(result)
    bytebuff.asFloatBuffer.put(arr)
    result
  }

  def mapWithIndex(f: (Int, Int) => Int): Array[Byte] = {
    val arr = Array.ofDim[Float](size)
    cfor(0)(_ < size, _ + 1) { i =>
      arr(i) = intToFloatOut(f(i, getInt(i)))
    }
    val result = new Array[Byte](size * FloatConstantNoDataCellType.bytes)
    val bytebuff = ByteBuffer.wrap(result)
    bytebuff.asFloatBuffer.put(arr)
    result
  }
  
  // ...省略与double相关的函数定义,与int的类似

}

// 无Nodata值模式
class Float32RawGeoTiffSegment(bytes: Array[Byte]) extends Float32GeoTiffSegment(bytes) {
  def getInt(i: Int): Int = get(i).toInt
  def getDouble(i: Int): Double = get(i).toDouble

  // 直接进行数值转换即可
  protected def intToFloatOut(v: Int): Float = v.toFloat
  protected def doubleToFloatOut(v: Double): Float = v.toFloat
}

// 使用固定Nodata值模式
class Float32ConstantNoDataGeoTiffSegment(bytes: Array[Byte]) extends Float32GeoTiffSegment(bytes) {
  // 使用定义的转换方法
  // 这些方法都是宏方法,将放到后面介绍
  def getInt(i: Int): Int = f2i(get(i))
  def getDouble(i: Int): Double = f2d(get(i))

  protected def intToFloatOut(v: Int): Float = i2f(v)
  protected def doubleToFloatOut(v: Double): Float = d2f(v)
}

// 使用用户自定义Nodata值的情况
class Float32UserDefinedNoDataGeoTiffSegment(bytes: Array[Byte], val userDefinedFloatNoDataValue: Float)
    extends Float32GeoTiffSegment(bytes)
       with UserDefinedFloatNoDataConversions {

  // 使用定义的转换方法
  def getInt(i: Int): Int = udf2i(get(i))
  def getDouble(i: Int): Double = udf2d(get(i))

  protected def intToFloatOut(v: Int): Float = i2udf(v)
  protected def doubleToFloatOut(v: Double): Float = d2udf(v)
}

可以发现:

即使对于Float32格式的数据,get/map函数依旧收束为对int/double的操作.
因为涉及到Nodata值转换,所以遇到Byte数据转换与实际类型数据相互转换的操作就会按Celltype延展出分支.

至此,就能大概了解GeotiffSegmentCollection从Byte数组中读取实际类型的数据是如何实现的了.

对于GeotiffSegmentCollection来说,读取的粒度是Segment,这是一个逻辑上的结构,没有实际的物理意义,使用Segment的索引(SegmentIndex)可以遍历全部数据,但若想读取指定区域的数据,则需要一个Segment与实际行列号间的相互转换机制.这就是GeoTiffSegmentLayoutTransform存在的意义了.

5. 实现从指定位置读取数据的功能

5.1 GeoTiffSegmentLayout类

与sgemetBytes和decopressor对象一样,segmentLayout也来自于构造函数传入的GeotiffInfo对象:

// 以瓦片的形式描述Segment的布局结构
// layoutCols/Rows:一列/行能放下多少个Segment片
// tileCols/Rows:一个Segment片的一列/行有多少个像素
case class TileLayout(layoutCols: Int, layoutRows: Int, tileCols: Int, tileRows: Int)

// 通过伴随对象调用的方法
object GeoTiffSegmentLayout {
  def apply(
    totalCols: Int,
    totalRows: Int,
    storageMethod: StorageMethod,
    interleaveMethod: InterleaveMethod,
    bandType: BandType
  ): GeoTiffSegmentLayout = {
    
    val tileLayout =
      storageMethod match {
        // 瓦片式排布下改动不大
        case Tiled(blockCols, blockRows) =>
          // 计算一列/行能放下多少个
          val layoutCols = math.ceil(totalCols.toDouble / blockCols).toInt
          val layoutRows = math.ceil(totalRows.toDouble / blockRows).toInt
          TileLayout(layoutCols, layoutRows, blockCols, blockRows)
        case s: Striped =>
          val rowsPerStrip = math.min(s.rowsPerStrip(totalRows, bandType), totalRows).toInt
          // 计算一列能放下多少行
          val layoutRows = math.ceil(totalRows.toDouble / rowsPerStrip).toInt
          // 条带式排布每行只有1个Segment片
          // 条带瓦片占满整行,Segment片的宽度就是整行的宽度
          TileLayout(1, layoutRows, totalCols, rowsPerStrip)
      }
    GeoTiffSegmentLayout(totalCols, totalRows, tileLayout, storageMethod, interleaveMethod)
  }
}

// GeoTiffSegmentLayout的定义
case class GeoTiffSegmentLayout(
  totalCols: Int,
  totalRows: Int,
  tileLayout: TileLayout,
  storageMethod: StorageMethod,
  interleaveMethod: InterleaveMethod
) {
      def isTiled: Boolean =
        storageMethod match {
          case _: Tiled => true
          case _ => false
        }
      def isStriped: Boolean = !isTiled
      def hasPixelInterleave: Boolean = interleaveMethod == PixelInterleave

  // 根据给定的行列号计算所在Segmen片的序号
  private [geotiff] def getSegmentIndex(col: Int, row: Int): Int = {
    // 定位该位置在列中的位置
    val layoutCol = col / tileLayout.tileCols
    // 定位该位置在行中的位置
    val layoutRow = row / tileLayout.tileRows
    // 最终计算出具体是哪一个Segment片
    (layoutRow * tileLayout.layoutCols) + layoutCol
  }
  
  // ... 省略其他方法
}

Segment在这里与Tile是同一个东西,前者的语义更强调其在数据读取中的作用,后者则是其在布局中的作用.为了方便理解,都使用Segment片来描述.

GeoTiffSegmentLayout实现了通过行列号定位Segment片的序号,这只是知道了一个位置范围.在该Segment片中精确定位指定行列号的位置,就交给了SegmentTransform特质去实现.

5.2 SegmentTransform特质

private [geotiff] trait SegmentTransform {
  // 每一个Segment片对应一个SegmentTransform
  def segmentIndex: Int
  def segmentLayoutTransform: GeoTiffSegmentLayoutTransform
  protected def segmentLayout = segmentLayoutTransform.segmentLayout

  protected def bandCount = segmentLayoutTransform.bandCount

  protected def layoutCols: Int = segmentLayout.tileLayout.layoutCols
  protected def layoutRows: Int = segmentLayout.tileLayout.layoutRows

  protected def tileCols: Int = segmentLayout.tileLayout.tileCols
  protected def tileRows: Int = segmentLayout.tileLayout.tileRows

  // 定位该Segment片整张影像的哪一列/行
  protected def layoutCol: Int = segmentIndex % layoutCols
  protected def layoutRow: Int = segmentIndex / layoutCols
    
  // ...省略

}

// 以瓦片式排布为例
private [geotiff] case class TiledSegmentTransform(segmentIndex: Int, segmentLayoutTransform: GeoTiffSegmentLayoutTransform) extends SegmentTransform {
  // 根据行列号计算在本Segment片中指定位置的序列号
  def gridToIndex(col: Int, row: Int): Int = {
    val tileCol = col - (layoutCol * tileCols)
    val tileRow = row - (layoutRow * tileRows)
    tileRow * tileCols + tileCol
  }

}

5.3 GeoTiffSegmentLayoutTransform

GeoTiffSegmentLayoutTransform类将SegmentTransform特质和GeoTiffSegmentLayout类组合起来使用:

trait GeoTiffSegmentLayoutTransform {
  private [geotrellis] def segmentLayout: GeoTiffSegmentLayout
  // 这里使用了懒加载配合对象抽取,在segmentLayout被赋值后自动获取相关的一系列参数
  private lazy val GeoTiffSegmentLayout(totalCols, totalRows, tileLayout, isTiled, interleaveMethod) =
    segmentLayout
    
  // 获取Segment片的序列号
  private [geotiff] def getSegmentIndex(col: Int, row: Int): Int =
    segmentLayout.getSegmentIndex(col, row)

  // 获取指定序列的Segment片的转换器
  private [geotiff] def getSegmentTransform(segmentIndex: Int): SegmentTransform = {
    val id = segmentIndex % bandSegmentCount
    if (segmentLayout.isStriped)
      StripedSegmentTransform(id, GeoTiffSegmentLayoutTransform(segmentLayout, bandCount))
    else
      TiledSegmentTransform(id, GeoTiffSegmentLayoutTransform(segmentLayout, bandCount))
}

object GeoTiffSegmentLayoutTransform {
  def apply(_segmentLayout: GeoTiffSegmentLayout, _bandCount: Int): GeoTiffSegmentLayoutTransform =
    new GeoTiffSegmentLayoutTransform {
      val segmentLayout = _segmentLayout
      val bandCount = _bandCount
    }
}

从类继承图中可以看到,GeotiffTile类实现了GeoTiffSegmentLayoutTransform的特质,即GeotiffTile类拥有了从指定行列号读取具体类型数值的能力:

def get(col: Int, row: Int): Int = {
    // 获取指定位置所在的瓦片序号(来自GeoTiffSegmentLayout的方法)
    val segmentIndex = getSegmentIndex(col, row)
    // 获取指定位置在该瓦片中的位置(来自GeoTiffSegmentLayoutTransform和SegmentTransform的方法)
    val i = getSegmentTransform(segmentIndex).gridToIndex(col, row)
    // 精确定位位置,获取数值(来自SegmentByte和GeotiffSegment的方法)
    getSegment(segmentIndex).getInt(i)
}

6. 总结

我们通过分析类继承图我们将特质分为两大类:

Segment相关
宏相关

我们主要研究了Segment相关的特质,并引入了Segment模型.Segment模型主要实现了两大功能:

定位数据具体位置
读取原始Byte数据并转换到实际的数据类型

其中的核心概念就是Segment.什么是Segment?Segment是一个逻辑概念:

对于瓦片式排布:一个瓦片就是一个Segment
对于条带式排布:一个条带就是一个Segment
根据行列号计算具体位置时,操作的Tile也是Segment

Segment模型打通了从读取到访问的全套流程.

其实,宏在数据的读取与转换中也发挥了巨大的作用.我们下一节就分析一下宏模型在Geotrellis中起的作用.

How it works(18) Geotrellis是如何读取
1. 引入上一篇我们讨论了Geotrellis如何设计底层的数据类型模型,Geotrellis实际上如何从tif...
How it works(17) Geotrellis是如何读取
1. 引入上一篇我们解析了如何读取元数据,有了元数据,我们就可以创建一个GeoTiffTile对象: 追寻Geo...
How it works(16) Geotrellis是如何读取
1. 引入 Geotrellis是如何读取Geotiff?先看官方文档中读取单波段GeoTiff的样例: 我们可以...
How it works(19) Geotrellis是如何读取
1. 引入宏在Geotrellis中更类似于一种锦上添花的存在:没有它不会动摇整体的功能,使用它则会带来许多方便...
How it works(20) Geotrellis是如何在S
1. 引入我们会使用Geotrellis自然是因为它能利用spark进行高效的分布式计算,而分布式计算无论在数据...
How it works(21) Geotrellis是如何在S
1. 引入在上一章我们已经讨论了数据读取的实现,了解了Geotrellis如何从本地将若干Geotiff文件读取...
How it works(23) Geotrellis是如何在S
1. 引入在上一章,我们使用Tile对象的Map方法实现了计算NDVI的功能.但对于一些更复杂的功能,Map(f...
How it works(22) Geotrellis是如何在S
1.引入在上一章结尾我们最终生成了MultibandTileLayerRDD[SpatialKey]对象,一切都...
How it works(25) Geotrellis是如何在S
1. 引入上一章我们研究了Focal类中最基础的游标(Cursor)类算子,游标类算子的核心思想代表了大多数Fo...
How it works(26) Geotrellis是如何在S
1. 引入在过去几章,我们从NDVI计算入手,深入到Geotrellis中了解了内置的各种算子.如今我们回归最初...

How it works(18) Geotrellis是如何读取

1. 引入

2. 图像数据在tiff文件中的排布方式

2.1 条带式排布

2.2 瓦片式排布

2.3 两种模型的区别与联系

3. Segment模型概况

4. 实现读取Byte数据到具体类型的功能

4.1 GeotiffSegmentCollection特质

4.2 SegmentBytes特质

4.2.1 LazySegmentBytes类

4.2.2 ArraySegmentBytes类

4.3 Decompressor类

4.4 GeotiffSegment类及其继承类

5. 实现从指定位置读取数据的功能

5.1 GeoTiffSegmentLayout类

5.2 SegmentTransform特质

5.3 GeoTiffSegmentLayoutTransform

6. 总结

相关文章

How it works(18) Geotrellis是如何读取

How it works(17) Geotrellis是如何读取

How it works(16) Geotrellis是如何读取

How it works(19) Geotrellis是如何读取

How it works(20) Geotrellis是如何在S

How it works(21) Geotrellis是如何在S

How it works(23) Geotrellis是如何在S

How it works(22) Geotrellis是如何在S

How it works(25) Geotrellis是如何在S

How it works(26) Geotrellis是如何在S

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Geotrellis

GIS后端