Geospark加载PostgreSQL数据库
PostGreSQL以及Mysql均有空间引擎扩展,且GeoSpark是针对大数据的空间分析,而SHP、GeoJson等只是小数据集的,PostGIS就能胜任其分析任务。所以对于大数据集的,我们还是要用空间关系数据库存储,利用Spark SQL从数据库中加载数据,获得DataFrame,然后利用Geospark转为几何弹性数据集RDD。
关于PostgreSQL和PostGIS的介绍,后续会有更新,今天还是以Geospark为核心,来介绍如何用GeoSpark加载我们的数据库中的空间数据。
假设我们已经安装了PostGreSQL以及PostGIS扩展,并将我们这里的公园数据(park)导入到PostGreSQL中。
![](https://img.haomeiwen.com/i18085087/21fc87e0e2496c83.png)
这里我的Postgresq的连接参数为:
url=192.168.10.174:5432
table=prks
user=postgres
password=root
- 首先我们需要加入postgresql的jdbc依赖\
<!-- https://mvnrepository.com/artifact/org.postgresql/postgresql -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.2.5</version>
</dependency>
- 还是和以前一样,首先初始化Spark,并注册GeosparkSQL
SparkSession spark = SparkSession.builder().
config("spark.serializer","org.apache.spark.serializer.KryoSerializer").
config("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator").
master("local[*]").appName("Learn07").getOrCreate();
GeoSparkSQLRegistrator.registerAll(spark);
GeoSparkVizRegistrator.registerAll(spark);
- 首先利用SparkSQL读取我们导入到postgresql中的数据,在postgis中有一个geom字段,他里面存放的是wkb格式的几何信息,对于sparksql来说,geom字段就是一个varchar字段,我们还需要使用geospark的相关函数来将其转为Geometry。
String url = "jdbc:postgresql://192.168.10.174:5432/geospark";
String table = "parks";
Properties connectionProperties = new Properties();
connectionProperties.put("user","postgres");
connectionProperties.put("password","root");
connectionProperties.put("driver","org.postgresql.Driver");
Dataset<Row> df = spark.read().jdbc(url, table, connectionProperties);
df.createOrReplaceTempView("parks");
df.show();
+---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+
|gid|parkid|refparkid| parkname| neighborho| ewstreet| nsstreet|dogpark|washrooms|specialfea| geom|
+---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+
| 1| 1| -9999| null| Kitsilano| null| null| N| null| null|01060000000100000...|
| 2| 2| 208| Rosemary Brown Park| Kitsilano| W 11th Avenue| Vine Street| N| N| N|01060000000100000...|
| 3| 3| 141| Tea Swamp Park|Mount Pleasant| E 15th Avenue| Sophia Street| N| N| N|01060000000200000...|
| 4| 4| -9999| null| Strathcona| null| null| N| null| null|01060000000200000...|
| 5| 5| 202| Morton Park| West End| Morton Avenue| Denman Street| N| N| N|01060000000200000...|
| 6| 6| -9999| Mcbride Park| Kitsilano| null| null| N| null| null|01060000000200000...|
| 7| 7| -9999| Granville Park| Fairview| null| null| N| null| null|01060000000200000...|
| 8| 8| -9999| null|Mount Pleasant| null| null| N| null| null|01060000000300000...|
| 9| 9| 15| Creekside Park|Mount Pleasant|Terminal Avenue| Quebec Street| N| N| Y|01060000000200000...|
| 10| 10| 134|China Creek South...|Mount Pleasant| E 10th Avenue| Clark Drive| N| N| N|01060000000200000...|
- 最开始说过,从postgresql中读取的geom列为wkb格式的Geometry,因此需要利用Geospark将其转为Geometry。
// 在将DataFrame转为RDD的时候,Geospark默认第一列为Geometry
String sql = "select ST_GeomFromWKB(geom) as geom, parkname, parkid from parks";
df = spark.sql(sql);
df.show();
+--------------------+--------------------+------+
| geom| parkname|parkid|
+--------------------+--------------------+------+
|POLYGON ((-123.15...| null| 1|
|POLYGON ((-123.15...| Rosemary Brown Park| 2|
|MULTIPOLYGON (((-...| Tea Swamp Park| 3|
|MULTIPOLYGON (((-...| null| 4|
|MULTIPOLYGON (((-...| Morton Park| 5|
|MULTIPOLYGON (((-...| Mcbride Park| 6|
|MULTIPOLYGON (((-...| Granville Park| 7|
|MULTIPOLYGON (((-...| null| 8|
|MULTIPOLYGON (((-...| Creekside Park| 9|
|MULTIPOLYGON (((-...|China Creek South...| 10|
- Geospark提供了Adapter静态类,可以在DataFrame和RDD之间进行转换操作。
JavaRDD<Geometry> rdd = Adapter.toJavaRdd(df);
rdd.foreach((geometry -> {
System.out.println(geometry);
}));
POLYGON ((-123.15566057081632 49.26206733490204, -123.15564728017853 49.26241791476514, -123.15548939905344 49.262415429329856, -123.15550257747702 49.26206484963618, -123.15566057081632 49.26206733490204)) null 1
POLYGON ((-123.15760176703519 49.261936547646954, -123.15718706338478 49.2619299178749, -123.15719832396375 49.26162160945501, -123.15761313807661 49.26162814910161, -123.15760218456263 49.26192530535148, -123.15760176703519 49.261936547646954)) Rosemary Brown Park 2
MULTIPOLYGON (((-123.09870507685639 49.25665872626679, -123.09884321830909 49.25666085651373, -123.09898135977342 49.25666298659548, -123.09918885538166 49.25666622613204, -123.09917717616912 49.25700195924662, -123.09897035961777 49.256998988908855, -123.09883221757528 49.25699703868207, -123.0986933956194 49.25699508887092, -123.09855525341915 49.256993048379265, -123.09856693541536 49.25665659585471, -123.09870507685639 49.25665872626679)), ((-123.0988872811734 49.257388923290726, -123.09874913822517 49.257387062899134, -123.09854232044736 49.257384271663746, -123.09855332287414 49.257047909676935, -123.09876081971025 49.257050970113596, -123.09889896208026 49.25705301035933, -123.0988872811734 49.257388923290726))) Tea Swamp Park 3
MULTIPOLYGON (((-123.08789868681455 49.27697178369337, -123.08779279965931 49.276970245390025, -123.08768827290808 49.27696870595726, -123.0875838595297 49.276967166344015, -123.08747933279125 49.27696562672224, -123.08749381859947 49.2766309730507, -123.0875983449497 49.27663269252672, -123.08770287146653 49.27663450184158, -123.08780728446555 49.27663622121462, -123.08791317138568 49.276638029305445, -123.08789868681455 49.27697178369337)), ((-123.08777557719297 49.27735877134432, -123.08767116251035 49.277356962010295, -123.08756663446789 49.27735515266783, -123.08746210643292 49.2773533432308, -123.08747670583442 49.27701868949804, -123.08758111979606 49.27702049900829, -123.08768564713182 49.277022308338154, -123.0877915350338 49.27702420647329, -123.08777557719297 49.27735877134432))) null 4
MULTIPOLYGON (((-123.14231536153594 49.288085334402886, -123.14152321088636 49.28757269767203, -123.14181135642984 49.28738168560991, -123.1422763290905 49.287682390935885, -123.14231536153594 49.288085334402886)), ((-123.14225003608222 49.28741127411166, -123.14200556426914 49.28725275241468, -123.14221142848969 49.287116250139206, -123.1422217546953 49.287118935432616, -123.14225003608222 49.28741127411166))) Morton Park 5
MULTIPOLYGON (((-123.1815623274226 49.26844720942817, -123.17990316962349 49.26842093595351, -123.17992703362962 49.2677831787414, -123.18158548965377 49.26780936303689, -123.1815623274226 49.26844720942817)), ((-123.17948874889649 49.26841438577292, -123.17783027685374 49.26838907073858, -123.17785348555584 49.267750775367986, -123.17951250489656 49.267776628815625, -123.17948874889649 49.26841438577292))) Mcbride Park 6
MULTIPOLYGON (((-123.14133650443698 49.25886332633381, -123.14136437634508 49.25810730827778, -123.14199539077381 49.25811759634792, -123.14262005884645 49.25812770893823, -123.14262638528903 49.25796357183548, -123.14346603622347 49.25797737135294, -123.14345971282438 49.25814159843223, -123.14343107963043 49.25889761794439, -123.14196594195184 49.25887361649547, -123.14133650443698 49.25886332633381)), ((-123.14578559773607 49.25853754368109, -123.14372265753794 49.25850460441758, -123.1437353757422 49.25816230034241, -123.14579830035647 49.25819478972661, -123.14578559773607 49.25853754368109))) Granville Park 7
MULTIPOLYGON (((-123.08896518268944 49.263513027273426, -123.08895956291931 49.263854420100515, -123.08824281220937 49.263848674370315, -123.08824279938008 49.26384147968065, -123.0882483235879 49.26350728166151, -123.08896518268944 49.263513027273426)), ((-123.08945644895253 49.26424084147778, -123.08923033699166 49.26423876797776, -123.08923596790757 49.26390456985079, -123.08946207752874 49.263906193672696, -123.08945644895253 49.26424084147778)), ((-123.0889541299678 49.264236732613874, -123.08876100016299 49.264235172400404, -123.08876652075031 49.26390088444774, -123.08876650784482 49.26389368975824, -123.08895895645973 49.26389534042054, -123.0889541299678 49.264236732613874))) null 8
网友评论