GeoSpark是基于Spark分布式的地理信息计算引擎,相比于传统的ArcGIS,GeoSpark可以提供更好性能的空间分析、查询服务。
准备工作
- Ubuntu18.04
- IDEA
- GeoSpark支持Java、Scala两种,本次开发语言选择Java。
JDK8安装
-
下载JDK8:https://download.oracle.com/otn/java/jdk/8u211-b12/478a62b7d4e34b78b671c754eaaf38ab/jdk-8u211-linux-x64.tar.gz (注:现在需要注册Oracle账户才允许下载)
-
下载解压后,复制到
/opt
下面,然后在~/.bashrc
下面添加环境变量export JAVA_HOME=/opt/jdk1.8.0_172 #这里改成你的jdk目录名 export PATH=${JAVA_HOME}/bin:$PATH export CLASSPAHT=.:/opt/jdk1.8.0_172/lib:/opt/jdk1.8.0_172/lib/dt.jar:/opt/jdk1.8.0_172/lib/tools.jar #在JDK8后应该是不需要在配置CLASSPATH,这里为了保险起见,还是加上了
Scala配置
-
下载Scala2.12.8:https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz
-
下载解压后,复制到
/opt
下面,然后在~/.bashrc
下面添加环境变量export SCALA_HOME=/opt/scala-2.12.8 export PATH=${SCALA_HOME}/bin:$PATH
-
然后执行
source ~/.bashrc
-
执行
scala -version
,如果出现有类似以下信息,则表示安装成功Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
Spark单机配置
-
这里配置的是单机版Spark,不需要集群,不需要部署Hadoop等环境.
-
下载Spark2.4.3: https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.6.tgz
-
下载解压后,复制到用户目录下面
/home/{user}
,然后在~/.bashrc
下面添加环境变量:export SPARK_HOME=/home/hwang/spark-2.4.3-bin-hadoop2.6 export SPARK_LOCAL_IP="127.0.0.1" export PATH=${SPARK_HOME}/bin:$PATH
-
然后执行
spark-shell
,如果出现以下信息则表示安装成功Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1559006613213). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.3 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172) scala>
GeoSpark
-
打开IDEA,创建Maven新工程,修改pom.xml文件
<properties> <scala.version>2.11</scala.version> <geospark.version>1.2.0</geospark.version> <spark.compatible.verison>2.3</spark.compatible.verison> <spark.version>2.4.3</spark.version> <hadoop.version>2.7.2</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.11.0</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark-sql_${spark.compatible.verison}</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>geospark-viz_${spark.compatible.verison}</artifactId> <version>${geospark.version}</version> </dependency> <dependency> <groupId>org.datasyslab</groupId> <artifactId>sernetcdf</artifactId> <version>0.1.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> <scope>${dependency.scope}</scope> <exclusions> <exclusion> <groupId>org.apache.hadoop</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> <scope>${dependency.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> <scope>${dependency.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> <scope>${dependency.scope}</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
-
我们从CSV中创建一个Spark的RDD,CSV内容如下:
-88.331492,32.324142,hotel -88.175933,32.360763,gas -88.388954,32.357073,bar -88.221102,32.35078,restaurant
然后我们初始化一个SparkContext,并调用GeoSpark的PointRDD,将我们的CSV导入。
SparkConf conf = new SparkConf(); conf.setAppName("GeoSpark01"); conf.setMaster("local[*]"); conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); conf.set("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator"); JavaSparkContext sc = new JavaSparkContext(conf); String pointRDDInputLocation = Learn01.class.getResource("checkin.csv").toString(); Integer pointRDDOffset = 0; // 地理位置(经纬度)从第0列开始 FileDataSplitter pointRDDSplitter = FileDataSplitter.CSV; Boolean carryOtherAttributes = true; // 第二列的属性(酒店名) PointRDD rdd = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes);
-
坐标系转换
-
GeoSpark采用EPGS标准坐标系,其坐标系也可参考EPSG官网:https://epsg.io/
// 坐标系转换 String sourceCrsCode = "epsg:4326"; String targetCrsCode = "epsg:3857"; rdd.CRSTransform(sourceCrsCode, targetCrsCode);
-
网友评论