Publication¶
Apache Sedona was formerly called GeoSpark, initiated by Arizona State University Data Systems Lab.
Key publications¶
"Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond" is the full research paper that talks about the entire GeoSpark ecosystem. Please cite this paper if your work mentions GeoSpark core system.
"GeoSparkViz: A Scalable Geospatial Data Visualization Framework in the Apache Spark Ecosystem" is the full research paper that talks about map visualization system in GeoSpark. Please cite this paper if your work mentions GeoSpark visualization system.
"Building A Microscopic Road Network Traffic Simulator in Apache Spark" is the full research paper that talks about the traffic simulator in GeoSpark. Please cite this paper if your work mentions GeoSparkSim traffic simulator.
Third-party evaluation¶
GeoSpark were evaluated by papers published on database top venues. It is worth noting that we do not have any collaboration with the authors.
- SIGMOD 2020 paper "Architecting a Query Compiler for Spatial Workloads" Ruby Y. Tahboub, Tiark Rompf (Purdue University).
In Figure 16a, GeoSpark distance join query runs around 7x - 9x faster than Simba, a spatial extension on Spark, on 1 - 24 core machines.
- PVLDB 2018 paper "How Good Are Modern Spatial Analytics Systems?" Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:
GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.
Full publications¶
GeoSpark Ecosystem¶
"Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond" (research paper). Jia Yu, Zongsi Zhang, Mohamed Sarwat. Geoinformatica Journal 2019.
"A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data" (demo paper). Jia Yu, Jinxuan Wu, Mohamed Sarwat. In Proceeding of IEEE International Conference on Data Engineering ICDE 2016, Helsinki, FI, May 2016
"GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data" (short paper). Jia Yu, Jinxuan Wu, Mohamed Sarwat. In Proceeding of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA November 2015
GeoSparkViz Visualization System¶
"GeoSparkViz in Action: A Data System with built-in support for Geospatial Visualization" (demo paper) Jia Yu, Anique Tahir, and Mohamed Sarwat. In Proceedings of the International Conference on Data Engineering, ICDE, 2019
"GeoSparkViz: A Scalable Geospatial Data Visualization Framework in the Apache Spark Ecosystem" (research paper). Jia Yu, Zongsi Zhang, Mohamed Sarwat. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bolzano-Bozen, Italy July 2018
GeoSparkSim Traffic Simulator¶
"Dissecting GeoSparkSim: a scalable microscopic road network traffic simulator in Apache Spark" (journal paper) Jia Yu, Zishan Fu, Mohamed Sarwat. Distributed Parallel Databases 38(4): 963-994 (2020)
"Demonstrating GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark". Zishan Fu, Jia Yu, Mohamed Sarwat. International Symposium on Spatial and Temporal Databases, SSTD, 2019
"Building A Microscopic Road Network Traffic Simulator in Apache Spark" (research paper) Zishan Fu, Jia Yu, and Mohamed Sarwat. In Proceedings of the International Conference on Mobile Data Management, MDM, 2019
A Tutorial about Geospatial Data Management in Spark¶
"Geospatial Data Management in Apache Spark: A Tutorial" (Tutorial) Jia Yu and Mohamed Sarwat. In Proceedings of the International Conference on Data Engineering, ICDE, 2019