GeoSpark空间索引
- GeoSpark提供两种空间索引:Quad-Tree和R-Tree
- 和上一节一样,我们初始化一个SparkContext,并调用GeoSpark的ShapefileReader,将我们的Shape文件导入。
SparkConf conf = new SparkConf();
conf.setAppName("GeoSpark02");
conf.setMaster("local[*]");
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.set("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator");
JavaSparkContext sc = new JavaSparkContext(conf);
// Get SpatialRDD
String shapeInputLocation = Learn02.class.getResource("/parks").toString();
SpatialRDD rdd = ShapefileReader.readToGeometryRDD(sc, shapeInputLocation);
构建索引
// 构建索引
boolean buildOnSpatialPartitionedRDD = false; // 如果只需要在做空间分析的时候构建索引,则设置为true
rdd.buildIndex(IndexType.QUADTREE, buildOnSpatialPartitionedRDD);
使用索引查询
// 查询
GeometryFactory geometryFactory = new GeometryFactory();
Coordinate[] coordinates = new Coordinate[5];
coordinates[0] = new Coordinate(-123.1,49.2);
coordinates[1] = new Coordinate(-123.1,49.3);
coordinates[2] = new Coordinate(-123.2,49.3);
coordinates[3] = new Coordinate(-123.2,29.2);
coordinates[4] = coordinates[0]; // The last coordinate is the same as the first coordinate in order to compose a closed ring
Polygon polygonObject = geometryFactory.createPolygon(coordinates);
boolean usingIndex = true; // 使用索引
JavaRDD<Geometry> queryResult = RangeQuery.SpatialRangeQuery(rdd, polygonObject, false, usingIndex);
System.out.println(String.format("查询结果总数为: %d",queryResult.count()));
查询结果总数为: 62
3. 输出查询结果
// 遍历查询结果
queryResult.foreach(new VoidFunction<Geometry>() {
@Override
public void call(Geometry geometry) throws Exception {
System.out.println(geometry);
}
});
POLYGON ((-123.15566057081632 49.26206733490204, -123.15564728017853 49.26241791476514, -123.15548939905344 49.262415429329856, -123.15550257747702 49.26206484963618, -123.15566057081632 49.26206733490204)) 1 -9999 Kitsilano N
POLYGON ((-123.15760176703519 49.261936547646954, -123.15718706338478 49.2619299178749, -123.15719832396375 49.26162160945501, -123.15761313807661 49.26162814910161, -123.15760218456263 49.26192530535148, -123.15760176703519 49.261936547646954)) 2 208 Rosemary Brown Park Kitsilano W 11th Avenue Vine Street N N N
.................................
POLYGON ((-123.12325003271694 49.290529597005786, -123.12325184999034
POLYGON ((-123.11921795166444 49.288179012132034, -123.11889234917355 49.28806261407178, -123.11905901714364 49.28781953241384, -123.11954592548769 49.28796238352621, -123.11921795166444 49.288179012132034)) 80 27 Portal Park Downtown W Hastings Street Thurlow Street N N N
KNN临近查询
KNN临近查询用于查询距指定点最近的K个Geometry,在本案例中我们指定中心点为(-123.1,49.2),我们来查询距离这个点最近的5个公园的分布。
注意:QTREE索引不支持KNN查询,所以在使用KNN的时候,可以选择R-Tree索引。
// 临近查询(KNN)
rdd.buildIndex(IndexType.RTREE, buildOnSpatialPartitionedRDD); // QTREE不支持KNN查询
Point pointObject = geometryFactory.createPoint(new Coordinate(-123.1,49.2));
int K = 5; // K Nearest Neighbors
List<Geometry> result = KNNQuery.SpatialKnnQuery(rdd, pointObject, K, usingIndex);
距离点(-123.1,49.2)最近的五个公园是:
1: MULTIPOLYGON... 3 141 Tea Swamp Park Mount Pleasant E 15th Avenue Sophia Street N N N
2: POLYGON ... 23 140 Robson Park Mount Pleasant Kingsway St. George Street N Y N
4: POLYGON... 18 138 Major Matthews Park Mount Pleasant W 11th Avenue Manitoba Street N N N
5: POLYGON ... 24 136 Guelph Park Mount Pleasant E 7th Avenue Brunswick Street N N N
空间关联查询(Spatial Join Query)
空间关联查询:创建一个表联接(类似于SQL语句中的join关联),其中根据两层中特性的相对位置,将一个图层属性表中的字段追加到另一个图层层属性表中。
现在我们有两个图层,一个是公园(polygon),另外一个是公园里一些点(point),现在我们要把这些点所在公园的属性赋给这些点,用到的就是空间关联查询。
![](https://img.haomeiwen.com/i18085087/55931c44fdd52e67.png)
// 空间关联查询
shapeInputLocation = Learn02.class.getResource("/point").toString();
SpatialRDD pointRdd = ShapefileReader.readToGeometryRDD(sc, shapeInputLocation);
// analyze方法主要用来计算边界
pointRdd.analyze();
parkRdd.analyze();
// spark中的分区操作,空间关联查询前必须进行
parkRdd.spatialPartitioning(GridType.KDBTREE);
pointRdd.spatialPartitioning(parkRdd.getPartitioner());
boolean considerBoundaryIntersection = true; // 因为是以点为基础,因此必须考虑边界
usingIndex = false;
JavaPairRDD<Geometry, HashSet<Geometry>> joinResult = JoinQuery.SpatialJoinQuery(parkRdd, pointRdd, usingIndex, considerBoundaryIntersection);
System.out.println("空间关联查询结果为:");
joinResult.foreach((kv)->{
System.out.println(String.format("{%s}--{%s}", kv._1.getUserData().toString(), kv._2.toString()));
});
SpatialJoinQuery返回一个类似于Map集合的RDD,其中Key是SpatialJoinQuery中的第一个RDD,也就是我们的pointRDD,Value是一个Set集合,集合里面每个元素都是来自于SpatialJoinQuery中的第二个RDD,也就是parkRDD,在我们这个案例下,我们的key、value都是一一对应的,但是当我们反过来执行JoinQuery.SpatialJoinQuery(pointRdd, parkRdd, usingIndex, considerBoundaryIntersection)
,那此时key和value就是一对多关系了,因此一个公园里会有很多个点,但是一个点只会属于一个公园(边界上的点除外)。
{5}--{[POLYGON... 62 201 English Bay Beach Park West End Beach Avenue Bidwell Street N Y Y]}
{10}--{[POLYGON ... 48 110 Hadden Park Kitsilano Ogden Avenue Cypress Street N Y Y]}
{13}--{[POLYGON ... 76 24 Marina Square Downtown Bayshore Drive Denman Street N N Y]}
{14}--{[POLYGON... 77 11 Cardero Park Downtown Bayshore Drive Cardero Street N N Y]}
{3}--{[POLYGON ... 74 206 Stanley Park West End W Georgia Street Chilco Street N Y Y]}
{2}--{[POLYGON... 61 -9999 Sunset Beach Park West End Y ]}
{0}--{[POLYGON ... 73 -9999 Nelson Park West End Y ]}
{1}--{[POLYGON ... 74 206 Stanley Park West End W Georgia Street Chilco Street N Y Y]}
{15}--{[POLYGON ... 75 18 Devonian Harbour Park Downtown W Georgia Street Denman Street N N Y]}
{19}--{[POLYGON... 35 28 CRAB Park at Portside Downtown E Waterfront Road Main Street Y Y N]}
{8}--{[POLYGON ... 73 -9999 Nelson Park West End Y ]}
{6}--{[POLYGON ... 62 201 English Bay Beach Park West End Beach Avenue Bidwell Street N Y Y]}
{12}--{[POLYGON ... 79 -9999 Harbour Green Park Downtown N ]}
{11}--{[POLYGON ... 78 13 Coal Harbour Park Downtown W Hastings Street Broughton Street N N N]}
网友评论