Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Understand Overture Map data¶
from sedona.spark import *
import os
import time
import gresearch.spark.parquet
import geopandas as gpd
Wherobots version¶
Wherobots only releases a version for overturemaps-us-west-2/release/2023-07-26-alpha.0/.
This data is in GeoParquet format and data is clustered by their spatial proximity to ensure efficient filter pushdown performance
DATA_LINK = "s3a://wherobots-public-data/overturemaps-us-west-2/release/2023-07-26-alpha.0/"
OMF versions¶
The following files are official OMF releases. They are GeoParquet files generated by Apache Sedona.
However, unlike the Wherobots version, these data might not be in the same file structures and hence the spatial query might be a bit slower.
# DATA_LINK = "s3a://overturemaps-us-west-2/release/2023-11-14-alpha.0/"
# DATA_LINK = "s3a://overturemaps-us-west-2/release/2023-12-14-alpha.0/"
# DATA_LINK = "s3a://overturemaps-us-west-2/release/2024-01-17-alpha.0/"
Create Sedona Context¶
config = SedonaContext.builder() .\
config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"). \
config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"). \
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,'
'org.datasyslab:geotools-wrapper:1.5.1-28.2,'
'uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4'). \
config('spark.jars.repositories', 'https://artifacts.unidata.ucar.edu/repository/unidata-all'). \
getOrCreate()
sedona = SedonaContext.create(config)
Warning: Ignoring non-Spark config property: fs.s3a.aws.credentials.provider https://artifacts.unidata.ucar.edu/repository/unidata-all added as a remote repository with the name: repo-1 Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars org.apache.sedona#sedona-spark-3.4_2.12 added as a dependency org.datasyslab#geotools-wrapper added as a dependency uk.co.gresearch.spark#spark-extension_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-c4d888cc-a352-48f0-9927-791ecebcce11;1.0 confs: [default]
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
found org.apache.sedona#sedona-spark-3.4_2.12;1.5.1 in central found org.apache.sedona#sedona-common;1.5.1 in central found org.apache.commons#commons-math3;3.6.1 in central found org.locationtech.jts#jts-core;1.19.0 in central found org.wololo#jts2geojson;0.16.1 in central found org.locationtech.spatial4j#spatial4j;0.8 in central found com.google.geometry#s2-geometry;2.0.0 in central found com.google.guava#guava;25.1-jre in central found com.google.code.findbugs#jsr305;3.0.2 in central found org.checkerframework#checker-qual;2.0.0 in central found com.google.errorprone#error_prone_annotations;2.1.3 in central found com.google.j2objc#j2objc-annotations;1.1 in central found org.codehaus.mojo#animal-sniffer-annotations;1.14 in central found com.uber#h3;4.1.1 in central found net.sf.geographiclib#GeographicLib-Java;1.52 in central found com.github.ben-manes.caffeine#caffeine;2.9.2 in central found org.checkerframework#checker-qual;3.10.0 in central found com.google.errorprone#error_prone_annotations;2.5.1 in central found org.apache.sedona#sedona-spark-common-3.4_2.12;1.5.1 in central found commons-lang#commons-lang;2.6 in central found org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 in central found org.beryx#awt-color-factory;1.0.0 in central found org.datasyslab#geotools-wrapper;1.5.1-28.2 in central found uk.co.gresearch.spark#spark-extension_2.12;2.11.0-3.4 in central found com.github.scopt#scopt_2.12;4.1.0 in central :: resolution report :: resolve 280ms :: artifacts dl 8ms :: modules in use: com.github.ben-manes.caffeine#caffeine;2.9.2 from central in [default] com.github.scopt#scopt_2.12;4.1.0 from central in [default] com.google.code.findbugs#jsr305;3.0.2 from central in [default] com.google.errorprone#error_prone_annotations;2.5.1 from central in [default] com.google.geometry#s2-geometry;2.0.0 from central in [default] com.google.guava#guava;25.1-jre from central in [default] com.google.j2objc#j2objc-annotations;1.1 from central in [default] com.uber#h3;4.1.1 from central in [default] commons-lang#commons-lang;2.6 from central in [default] net.sf.geographiclib#GeographicLib-Java;1.52 from central in [default] org.apache.commons#commons-math3;3.6.1 from central in [default] org.apache.sedona#sedona-common;1.5.1 from central in [default] org.apache.sedona#sedona-spark-3.4_2.12;1.5.1 from central in [default] org.apache.sedona#sedona-spark-common-3.4_2.12;1.5.1 from central in [default] org.beryx#awt-color-factory;1.0.0 from central in [default] org.checkerframework#checker-qual;3.10.0 from central in [default] org.codehaus.mojo#animal-sniffer-annotations;1.14 from central in [default] org.datasyslab#geotools-wrapper;1.5.1-28.2 from central in [default] org.locationtech.jts#jts-core;1.19.0 from central in [default] org.locationtech.spatial4j#spatial4j;0.8 from central in [default] org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 from central in [default] org.wololo#jts2geojson;0.16.1 from central in [default] uk.co.gresearch.spark#spark-extension_2.12;2.11.0-3.4 from central in [default] :: evicted modules: org.checkerframework#checker-qual;2.0.0 by [org.checkerframework#checker-qual;3.10.0] in [default] com.google.errorprone#error_prone_annotations;2.1.3 by [com.google.errorprone#error_prone_annotations;2.5.1] in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 25 | 0 | 0 | 2 || 23 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-c4d888cc-a352-48f0-9927-791ecebcce11 confs: [default] 0 artifacts copied, 23 already retrieved (0kB/5ms) 24/01/20 23:15:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spatial filter by boundary¶
# Washington state boundary
#spatial_filter = "POLYGON((-123.3208 49.0023,-123.0338 49.0027,-122.0650 49.0018,-121.7491 48.9973,-121.5912 48.9991,-119.6082 49.0009,-118.0378 49.0005,-117.0319 48.9996,-117.0415 47.9614,-117.0394 46.5060,-117.0394 46.4274,-117.0621 46.3498,-117.0277 46.3384,-116.9879 46.2848,-116.9577 46.2388,-116.9659 46.2022,-116.9254 46.1722,-116.9357 46.1432,-116.9584 46.1009,-116.9762 46.0785,-116.9433 46.0537,-116.9165 45.9960,-118.0330 46.0008,-118.9867 45.9998,-119.1302 45.9320,-119.1708 45.9278,-119.2559 45.9402,-119.3047 45.9354,-119.3644 45.9220,-119.4386 45.9172,-119.4894 45.9067,-119.5724 45.9249,-119.6013 45.9196,-119.6700 45.8565,-119.8052 45.8479,-119.9096 45.8278,-119.9652 45.8245,-120.0710 45.7852,-120.1705 45.7623,-120.2110 45.7258,-120.3628 45.7057,-120.4829 45.6951,-120.5942 45.7469,-120.6340 45.7460,-120.6924 45.7143,-120.8558 45.6721,-120.9142 45.6409,-120.9471 45.6572,-120.9787 45.6419,-121.0645 45.6529,-121.1469 45.6078,-121.1847 45.6083,-121.2177 45.6721,-121.3392 45.7057,-121.4010 45.6932,-121.5328 45.7263,-121.6145 45.7091,-121.7361 45.6947,-121.8095 45.7067,-121.9338 45.6452,-122.0451 45.6088,-122.1089 45.5833,-122.1426 45.5838,-122.2009 45.5660,-122.2641 45.5439,-122.3321 45.5482,-122.3795 45.5756,-122.4392 45.5636,-122.5676 45.6006,-122.6891 45.6236,-122.7647 45.6582,-122.7750 45.6817,-122.7619 45.7613,-122.7962 45.8106,-122.7839 45.8642,-122.8114 45.9120,-122.8148 45.9612,-122.8587 46.0160,-122.8848 46.0604,-122.9034 46.0832,-122.9597 46.1028,-123.0579 46.1556,-123.1210 46.1865,-123.1664 46.1893,-123.2810 46.1446,-123.3703 46.1470,-123.4314 46.1822,-123.4287 46.2293,-123.4946 46.2691,-123.5557 46.2582,-123.6209 46.2573,-123.6875 46.2497,-123.7404 46.2691,-123.8729 46.2350,-123.9292 46.2383,-123.9711 46.2677,-124.0212 46.2924,-124.0329 46.2653,-124.2444 46.2596,-124.2691 46.4312,-124.3529 46.8386,-124.4380 47.1832,-124.5616 47.4689,-124.7566 47.8012,-124.8679 48.0423,-124.8679 48.2457,-124.8486 48.3727,-124.7539 48.4984,-124.4174 48.4096,-124.2389 48.3599,-124.0116 48.2964,-123.9141 48.2795,-123.5413 48.2247,-123.3998 48.2539,-123.2501 48.2841,-123.1169 48.4233,-123.1609 48.4533,-123.2220 48.5548,-123.2336 48.5902,-123.2721 48.6901,-123.0084 48.7675,-123.0084 48.8313,-123.3215 49.0023,-123.3208 49.0023))"
# Bellevue city boundary
spatial_filter = "POLYGON ((-122.235128 47.650163, -122.233796 47.65162, -122.231581 47.653287, -122.228514 47.65482, -122.227526 47.655204, -122.226175 47.655729, -122.222039 47.656743999999996, -122.218428 47.657464, -122.217026 47.657506, -122.21437399999999 47.657588, -122.212091 47.657464, -122.212135 47.657320999999996, -122.21092999999999 47.653552, -122.209834 47.650121, -122.209559 47.648976, -122.209642 47.648886, -122.21042 47.648658999999995, -122.210897 47.64864, -122.211005 47.648373, -122.21103099999999 47.648320999999996, -122.211992 47.64644, -122.212457 47.646426, -122.212469 47.646392, -122.212469 47.646088999999996, -122.212471 47.645213, -122.213115 47.645212, -122.213123 47.644576, -122.21352999999999 47.644576, -122.213768 47.644560999999996, -122.21382 47.644560999999996, -122.21382 47.644456999999996, -122.21373299999999 47.644455, -122.213748 47.643102999999996, -122.213751 47.642790999999995, -122.213753 47.642716, -122.213702 47.642697999999996, -122.213679 47.642689999999995, -122.21364 47.642678, -122.213198 47.642541, -122.213065 47.642500000000005, -122.212918 47.642466, -122.21275 47.642441, -122.212656 47.642433, -122.21253899999999 47.642429, -122.212394 47.64243, -122.212182 47.642444999999995, -122.211957 47.642488, -122.211724 47.642551999999995, -122.21143599999999 47.642647, -122.210906 47.642834, -122.210216 47.643099, -122.209858 47.643215, -122.20973000000001 47.643248, -122.20973599999999 47.643105, -122.209267 47.643217, -122.208832 47.643302, -122.208391 47.643347999999996, -122.207797 47.643414, -122.207476 47.643418, -122.20701199999999 47.643397, -122.206795 47.643387999999995, -122.205742 47.643246, -122.20549 47.643201999999995, -122.20500200000001 47.643119, -122.204802 47.643085, -122.204641 47.643066, -122.204145 47.643012, -122.203547 47.643012, -122.203097 47.643107, -122.20275699999999 47.643283, -122.202507 47.643496999999996, -122.202399 47.643653, -122.202111 47.643771, -122.201668 47.643767, -122.201363 47.643665, -122.20133 47.643648999999996, -122.201096 47.643536, -122.200744 47.64328, -122.200568 47.64309, -122.200391 47.642849, -122.200162 47.642539, -122.199896 47.642500000000005, -122.19980799999999 47.642424, -122.199755 47.642376999999996, -122.199558 47.642227999999996, -122.199439 47.642157, -122.199293 47.642078999999995, -122.199131 47.642004, -122.198928 47.641925, -122.19883 47.641892, -122.19856300000001 47.641811999999994, -122.198203 47.641731, -122.197662 47.641619999999996, -122.196819 47.641436, -122.196294 47.641309, -122.196294 47.642314, -122.19628 47.642855, -122.196282 47.642897999999995, -122.196281 47.643111, -122.196283 47.643415, -122.196283 47.643508999999995, -122.19628399999999 47.643739, -122.196287 47.644203999999995, -122.196287 47.644262999999995, -122.19629 47.644937999999996, -122.19629 47.644954999999996, -122.196292 47.645271, -122.196291 47.645426, -122.19629499999999 47.646315, -122.19629499999999 47.646432, -122.195925 47.646432, -122.195251 47.646432, -122.190853 47.646429999999995, -122.187649 47.646428, -122.187164 47.646426, -122.18683 47.646426, -122.185547 47.646409, -122.185546 47.646316, -122.185537 47.645599, -122.185544 47.644197, -122.185537 47.643294999999995, -122.185544 47.642733, -122.185541 47.641757, -122.185555 47.640681, -122.185561 47.63972, -122.185557 47.638228999999995, -122.185591 47.635419, -122.185611 47.634750999999994, -122.18562299999999 47.634484, -122.18561700000001 47.634375999999996, -122.185592 47.634311, -122.185549 47.634232999999995, -122.185504 47.634181999999996, -122.185426 47.634119, -122.184371 47.633424999999995, -122.18400000000001 47.633198, -122.183896 47.633134, -122.1838 47.633067, -122.18375499999999 47.633019999999995, -122.183724 47.632959, -122.183695 47.632858, -122.183702 47.632675, -122.182757 47.632622999999995, -122.182365 47.63259, -122.18220600000001 47.632562, -122.181984 47.632504999999995, -122.18163799999999 47.632363, -122.18142 47.632262999999995, -122.181229 47.632165, -122.181612 47.632172999999995, -122.18271899999999 47.632151, -122.183138 47.632135, -122.18440000000001 47.632081, -122.184743 47.632065999999995, -122.185312 47.63205, -122.185624 47.632047, -122.185625 47.631873999999996, -122.184618 47.63187, -122.184291 47.631878, -122.184278 47.631817999999996, -122.183882 47.629942, -122.182689 47.623548, -122.182594 47.622789999999995, -122.182654 47.622155, -122.183135 47.622372999999996, -122.183471 47.622506, -122.18360200000001 47.622552, -122.183893 47.622637999999995, -122.184244 47.62272, -122.184618 47.622777, -122.184741 47.622727999999995, -122.184605 47.622679, -122.18424 47.622622, -122.183985 47.622569, -122.183717 47.622501, -122.183506 47.622439, -122.18327 47.622357, -122.18305699999999 47.622271999999995, -122.182669 47.622088999999995, -122.182796 47.621545, -122.18347 47.619628999999996, -122.18365 47.619098, -122.183859 47.6184, -122.183922 47.617793999999996, -122.183956 47.617292, -122.183792 47.616388, -122.183261 47.614391999999995, -122.183202 47.613802, -122.183209 47.613155, -122.183436 47.612384999999996, -122.18395100000001 47.610445999999996, -122.184338 47.60924, -122.184657 47.609116, -122.18481 47.609051, -122.18491900000001 47.608987, -122.184974 47.608942, -122.185047 47.608846, -122.185082 47.608743999999994, -122.185109 47.608526999999995, -122.185116 47.608359, -122.18513 47.608315999999995, -122.185157 47.608273999999994, -122.185183 47.608247, -122.185246 47.608214, -122.185354 47.608196, -122.185475 47.608191999999995, -122.185472 47.606697, -122.185472 47.606373999999995, -122.185521 47.606272, -122.185528 47.606210999999995, -122.185506 47.606037, -122.185451 47.605872999999995, -122.185411 47.605781, -122.185358 47.605681999999995, -122.185248 47.605509999999995, -122.185127 47.605365, -122.185058 47.605292, -122.184772 47.605038, -122.184428 47.604834, -122.184122 47.604693999999995, -122.183775 47.604574, -122.183644 47.604546, -122.183708 47.604400999999996, -122.183749 47.604223999999995, -122.18376 47.604037, -122.183707 47.603778, -122.183619 47.603556999999995, -122.183559 47.603406, -122.183488 47.603303, -122.183824 47.603167, -122.184108 47.603052, -122.184478 47.602902, -122.18543 47.602495, -122.186669 47.601957, -122.186433 47.601220999999995, -122.186341 47.601127999999996, -122.18874199999999 47.593742999999996, -122.188434 47.592338999999996, -122.188479 47.591786, -122.188217 47.591269999999994, -122.18795399999999 47.590871, -122.186822 47.589228, -122.187421 47.589228999999996, -122.18848299999999 47.589228999999996, -122.188433 47.587922999999996, -122.18990000000001 47.588547, -122.191368 47.589169999999996, -122.19158 47.589222, -122.191779 47.589254999999994, -122.192117 47.589289, -122.191569 47.587478999999995, -122.191323 47.586628999999995, -122.191295 47.586554, -122.191268 47.586479, -122.191192 47.586318, -122.191163 47.586268999999994, -122.1911 47.586164, -122.19099 47.586011, -122.19067 47.585668999999996, -122.1905 47.585515, -122.190301 47.58531, -122.190143 47.585152, -122.189573 47.584576999999996, -122.188702 47.583735999999995, -122.188646 47.583679, -122.188239 47.583258, -122.188037 47.583005, -122.187832 47.582657, -122.187726 47.582164999999996, -122.18769499999999 47.581964, -122.18768299999999 47.581781, -122.187678 47.581592, -122.18766099999999 47.581455, -122.187674 47.581311, -122.18768 47.581146, -122.187722 47.580877, -122.187817 47.580569999999994, -122.187932 47.580301999999996, -122.188047 47.580087, -122.188161 47.579933999999994, -122.188399 47.579660999999994, -122.18851699999999 47.579547, -122.188621 47.579454, -122.188042 47.579493, -122.18762 47.579527, -122.187806 47.579358, -122.188009 47.579175, -122.18814499999999 47.579051, -122.188177 47.579021, -122.18842000000001 47.5788, -122.188638 47.578461, -122.188895 47.57806, -122.189791 47.577281, -122.190008 47.577103, -122.190372 47.576805, -122.19119 47.576358, -122.191877 47.576087, -122.193025 47.57566, -122.194317 47.575185999999995, -122.196061 47.574664, -122.197239 47.574386999999994, -122.197873 47.574267, -122.198286 47.574189999999994, -122.199091 47.574044, -122.199067 47.574574999999996, -122.199007 47.575921, -122.200335 47.578222, -122.20057299999999 47.578345999999996, -122.2009 47.578517999999995, -122.201095 47.578621999999996, -122.20138399999999 47.578776999999995, -122.201465 47.57882, -122.201516 47.578846999999996, -122.205753 47.581112, -122.209515 47.583124, -122.210634 47.583721, -122.21473399999999 47.587021, -122.21538699999999 47.588254, -122.21580399999999 47.589042, -122.216534 47.590421, -122.220092 47.596261, -122.220434 47.596821, -122.22041899999999 47.597837999999996, -122.220289 47.606455, -122.220234 47.610121, -122.22048 47.615221999999996, -122.220359 47.615379, -122.220283 47.615477999999996, -122.21999 47.615854999999996, -122.219993 47.61597, -122.22023300000001 47.616634, -122.220356 47.616687999999996, -122.220409 47.616712, -122.221401 47.618538, -122.22142 47.618573, -122.221456 47.618635, -122.221791 47.619222, -122.222492 47.619682999999995, -122.222799 47.619886, -122.222083 47.620368, -122.222046 47.620407, -122.222028 47.620449, -122.222025 47.620483, -122.22203999999999 47.620523999999996, -122.222079 47.620557999999996, -122.222156 47.620594999999994, -122.222458 47.620629, -122.222454 47.620673, -122.222454 47.620711, -122.22244599999999 47.621041999999996, -122.223056 47.621041, -122.223129 47.62104, -122.223153 47.62104, -122.223574 47.621041, -122.22377900000001 47.621041, -122.223857 47.621041, -122.22467499999999 47.621041, -122.224712 47.62104, -122.224958 47.62104, -122.225167 47.621049, -122.226882 47.621037, -122.227565 47.621032, -122.228002 47.621029, -122.22797800000001 47.621300999999995, -122.227919 47.626574999999995, -122.227914 47.627085, -122.227901 47.6283, -122.227881 47.630069, -122.227869 47.631177, -122.227879 47.631952999999996, -122.22789 47.633879, -122.227886 47.63409, -122.227871 47.635534, -122.227918 47.635565, -122.228953 47.635624, -122.22895199999999 47.635571999999996, -122.231018 47.635574999999996, -122.233276 47.635588999999996, -122.233287 47.63617, -122.233273 47.63639, -122.233272 47.636469999999996, -122.23327 47.636578, -122.233266 47.636827, -122.233263 47.636851, -122.233262 47.637014, -122.23322999999999 47.638110999999995, -122.233239 47.638219, -122.233262 47.638279, -122.233313 47.638324999999995, -122.233255 47.638359, -122.233218 47.638380999999995, -122.233153 47.638450999999996, -122.233136 47.638552999999995, -122.233137 47.638692, -122.232715 47.639348999999996, -122.232659 47.640093, -122.232704 47.641375, -122.233821 47.645111, -122.234906 47.648874, -122.234924 47.648938, -122.235128 47.650163))"
Visualizing Overture Maps¶
Explanation to the each step is similar across the different datasets. Click here to learn more about Overture Maps.
df = sedona.read.format("geoparquet").load(DATA_LINK+"theme=XX/type=YY")
It reads the dataset mentioned by theme and type, that's stored in GeoParquet format.
df = df.filter("ST_Contains(ST_GeomFromWKT('"+state_boundary+"'), geometry) = true")
This filters out all the data that is not in the mentioned
state_boundary
string. Please select a state as you wish.ST_GeomFromWKT() - constructs a geometry from WKT (Well Known Text)
ST_Contains(A, B) - checks if A fully contains B and returns True
XX_geom = df.selectExpr("geometry")
Storing geometry column for SedonaKepler.
map = SedonaKepler.create_map(XX_geom, 'XX')
Creating a map object using SedonaKepler with inputs geometry column and the name of dataset.
Place Dataset¶
Inspect the metadata of GeoParquet files¶
Inspect the parquet metadata of the building dataset using spark-extension. Detailed usage can be found here: https://github.com/G-Research/spark-extension/blob/master/PARQUET.md
sedona.read.parquet_blocks(DATA_LINK+"theme=places/type=place").show()
24/01/20 23:15:24 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties [Stage 5:> (0 + 1) / 1]
+--------------------+-----+----------+---------------+-----------------+------+-------+--------+-------+ | filename|block|blockStart|compressedBytes|uncompressedBytes| rows|columns| values| nulls| +--------------------+-----+----------+---------------+-----------------+------+-------+--------+-------+ |s3a://wherobots-p...| 1| 4| 132489370| 212295497|675704| 26|27013095|4170699| |s3a://wherobots-p...| 2| 132489374| 33009562| 51995284|168295| 26| 6727332|1039665| |s3a://wherobots-p...| 1| 4| 131748222| 207244630|658924| 26|26686490|4121025| |s3a://wherobots-p...| 2| 131748226| 131665990| 207128595|658924| 26|26681616|4123667| |s3a://wherobots-p...| 3| 263414216| 131697923| 207173797|658924| 26|26686185|4122635| |s3a://wherobots-p...| 4| 395112139| 128013214| 201361886|640397| 26|25932308|4002991| |s3a://wherobots-p...| 1| 4| 131748222| 207244630|658924| 26|26686490|4121025| |s3a://wherobots-p...| 2| 131748226| 131665990| 207128595|658924| 26|26681616|4123667| |s3a://wherobots-p...| 3| 263414216| 131697923| 207173797|658924| 26|26686185|4122635| |s3a://wherobots-p...| 4| 395112139| 128013214| 201361886|640397| 26|25932308|4002991| |s3a://wherobots-p...| 1| 4| 131748222| 207244630|658924| 26|26686490|4121025| |s3a://wherobots-p...| 2| 131748226| 131665990| 207128595|658924| 26|26681616|4123667| |s3a://wherobots-p...| 3| 263414216| 131697923| 207173797|658924| 26|26686185|4122635| |s3a://wherobots-p...| 4| 395112139| 128013214| 201361886|640397| 26|25932308|4002991| |s3a://wherobots-p...| 1| 4| 131738563| 217599597|677442| 26|27329024|4088909| |s3a://wherobots-p...| 2| 131738567| 132112948| 218210876|679198| 26|27405447|4099796| |s3a://wherobots-p...| 3| 263851515| 72665451| 119480256|373256| 26|15060504|2251382| |s3a://wherobots-p...| 1| 4| 131738563| 217599597|677442| 26|27329024|4088909| |s3a://wherobots-p...| 2| 131738567| 132112948| 218210876|679198| 26|27405447|4099796| |s3a://wherobots-p...| 3| 263851515| 72665451| 119480256|373256| 26|15060504|2251382| +--------------------+-----+----------+---------------+-----------------+------+-------+--------+-------+ only showing top 20 rows
Inspect the GeoParquet metadata using Sedona geoparquet.metadata
sedona.read.format("geoparquet.metadata").load(DATA_LINK+"theme=places/type=place").drop("path").printSchema()
[Stage 6:====================================================>(997 + 10) / 1012]
root |-- version: string (nullable = true) |-- primary_column: string (nullable = true) |-- columns: map (nullable = true) | |-- key: string | |-- value: struct (valueContainsNull = true) | | |-- encoding: string (nullable = true) | | |-- geometry_types: array (nullable = true) | | | |-- element: string (containsNull = true) | | |-- bbox: array (nullable = true) | | | |-- element: double (containsNull = true) | | |-- crs: string (nullable = true) |-- geohash: string (nullable = true)
sedona.read.format("geoparquet.metadata").load(DATA_LINK+"theme=places/type=place").drop("path").show(truncate = False)
[Stage 9:=======================================> (2 + 1) / 3]
+------------+--------------+------------------------------------------------------------------------------------------+-------+ |version |primary_column|columns |geohash| +------------+--------------+------------------------------------------------------------------------------------------+-------+ |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-45.0, 45.0038534, -33.75027, 50.5347176], null}} |g0 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-45.0, 33.8141064, -33.7678528, 39.3735873], null}} |en | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [135.378685, -78.2065631, 145.546875, -73.1334471], null}} |p4 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [101.4470901, -50.2472049, 112.3773706, -45.0773997], null}} |nr | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [112.7343607, 11.25, 123.7499978, 16.8749], null}} |wd | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [78.75, -33.6608961, 89.6400654, -28.1843664], null}} |mf | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [80.15625, 78.9564349, 88.9382744, 84.1249732], null}} |vy | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-146.25, 56.2921567, -135.0004853, 61.8691013], null}} |bf | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-11.25, 39.37504, -2.12E-5, 44.9999908], null}} |ez | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [48.51614, 84.6080623, 48.51614, 84.6080623], null}} |vp | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [146.3126564, 0.0542566, 156.09375, 5.6159858], null}} |x2 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [67.5, 61.8817, 78.7172, 67.488], null}} |ve | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-22.4986671, -22.4719545, -11.2554932, -16.9467365], null}} |7s | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-134.8484052, 17.0052467, -123.7679812, 22.4397813], null}} |95 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-146.25, -11.1931059, -135.0013733, -6.0916292], null}} |2y | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-135.0, -83.8333, -124.4478565, -78.7677918], null}} |11 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-44.8187256, -56.1774766, -34.1630816, -50.8584471], null}} |5n | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-44.619453, -44.9958826, -33.7843399, -39.4124306], null}} |70 | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-112.2099566, -33.7316387, -101.2729317, -28.2008956], null}}|3d | |1.0.0-beta.1|geometry |{geometry -> {WKB, [Point], [-33.2666016, -72.9196355, -22.6803555, -67.5007358], null}} |57 | +------------+--------------+------------------------------------------------------------------------------------------+-------+ only showing top 20 rows
Inspect the schema of GeoParquet files
sedona.read.format("geoparquet").load(DATA_LINK+"theme=places/type=place").printSchema()
root |-- id: string (nullable = true) |-- updatetime: string (nullable = true) |-- version: integer (nullable = true) |-- names: map (nullable = true) | |-- key: string | |-- value: array (valueContainsNull = true) | | |-- element: map (containsNull = true) | | | |-- key: string | | | |-- value: string (valueContainsNull = true) |-- categories: struct (nullable = true) | |-- main: string (nullable = true) | |-- alternate: array (nullable = true) | | |-- element: string (containsNull = true) |-- confidence: double (nullable = true) |-- websites: array (nullable = true) | |-- element: string (containsNull = true) |-- socials: array (nullable = true) | |-- element: string (containsNull = true) |-- emails: array (nullable = true) | |-- element: string (containsNull = true) |-- phones: array (nullable = true) | |-- element: string (containsNull = true) |-- brand: struct (nullable = true) | |-- names: map (nullable = true) | | |-- key: string | | |-- value: array (valueContainsNull = true) | | | |-- element: map (containsNull = true) | | | | |-- key: string | | | | |-- value: string (valueContainsNull = true) | |-- wikidata: string (nullable = true) |-- addresses: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value: string (valueContainsNull = true) |-- sources: array (nullable = true) | |-- element: map (containsNull = true) | | |-- key: string | | |-- value: string (valueContainsNull = true) |-- bbox: struct (nullable = true) | |-- minx: double (nullable = true) | |-- maxx: double (nullable = true) | |-- miny: double (nullable = true) | |-- maxy: double (nullable = true) |-- geometry: geometry (nullable = true) |-- geohash: string (nullable = true)
Run a spatial range query¶
%%time
df_place = sedona.read.format("geoparquet").load(DATA_LINK+"theme=places/type=place")
df_place = df_place.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true").cache()
CPU times: user 32 ms, sys: 6.48 ms, total: 38.5 ms Wall time: 10.6 s
%%time
map_place = SedonaKepler.create_map(df_place, "Place")
map_place
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
CPU times: user 602 ms, sys: 96.3 ms, total: 698 ms Wall time: 1min 23s
KeplerGl(data={'Place': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 2…
Write the result back to a small GeoParquet file¶
df_place.select("id", "geometry", "categories.main").limit(1000).repartition(1) \
.write.format("geoparquet").option("geoparquet.version", "1.0.0").option("geoparquet.crs", "").mode('overwrite').save("places.parquet")
Convert a part of the result to GeoPandas and save to different file formats¶
gdf = gpd.GeoDataFrame(df_place.select("id", "geometry", "categories.main").limit(1000).toPandas(), geometry="geometry")
gdf.to_file('places.geojson', driver='GeoJSON')
gdf.to_file('places.shp')
gdf
id | geometry | main | |
---|---|---|---|
0 | tmp_1C7D119020F10C096A27DBAD487578E3 | POINT (-122.22225 47.63574) | cafe |
1 | tmp_CABF2AA2D28C364087CA75322925EF3B | POINT (-122.19133 47.63664) | dentist |
2 | tmp_4DF5EDF25453A5B26C67DB7DCF4AADEA | POINT (-122.18555 47.63417) | park |
3 | tmp_2F03B1439285DF7ED6B85BDB817B38F5 | POINT (-122.18565 47.61602) | None |
4 | tmp_A07F048103476EE43B98CD623E0939C7 | POINT (-122.20394 47.61775) | fast_food_restaurant |
... | ... | ... | ... |
995 | tmp_0F88FA5D31AADC237ECC31C029737EB0 | POINT (-122.20033 47.61743) | None |
996 | tmp_341F7756223EFDC7FCB60CB22276F67B | POINT (-122.19403 47.61779) | mortgage_broker |
997 | tmp_26900E583436FD0BA86722A6A78309B3 | POINT (-122.19101 47.61840) | bakery |
998 | tmp_FF38B028F3C5ED0C3919C357BAF09315 | POINT (-122.19332 47.62042) | bubble_tea |
999 | tmp_683827FB15AFF93F5530B3096DD86828 | POINT (-122.18405 47.62007) | lighting_store |
1000 rows × 3 columns
Building Dataset¶
%%time
df_building = sedona.read.format("geoparquet").load(DATA_LINK+"theme=buildings/type=building")
df_building = df_building.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true")
df_building = df_building.limit(200_000)
CPU times: user 12.4 ms, sys: 2.26 ms, total: 14.6 ms Wall time: 6.07 s
%%time
map_building = SedonaKepler.create_map(df_building, 'Building')
map_building
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
CPU times: user 1.82 s, sys: 228 ms, total: 2.04 s Wall time: 5min 6s
/usr/local/lib/python3.10/dist-packages/jupyter_client/session.py:718: UserWarning: Message serialization failed with: Out of range float values are not JSON compliant Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant content = self.pack(content)
KeplerGl(data={'Building': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20…
Admins Theme Datasets¶
Administrative Boundary Dataset¶
%%time
df_admin = sedona.read.format("geoparquet").load(DATA_LINK+"theme=admins/type=administrativeBoundary")
df_admin = df_admin.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true")
CPU times: user 10.9 ms, sys: 4.18 ms, total: 15.1 ms Wall time: 4.12 s
%%time
map_admin = SedonaKepler.create_map(df_admin, "Admin")
map_admin
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
[Stage 35:==================================================> (10 + 1) / 11]
CPU times: user 62.9 ms, sys: 8.02 ms, total: 70.9 ms Wall time: 15 s
KeplerGl(data={'Admin': {'index': [], 'columns': ['id', 'updatetime', 'version', 'names', 'adminlevel', 'marit…
Locality Dataset¶
%%time
df_locality = sedona.read.format("geoparquet").load(DATA_LINK+"theme=admins/type=locality")
df_locality = df_locality.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true")
CPU times: user 11 ms, sys: 917 µs, total: 11.9 ms Wall time: 4.05 s
%%time
map_locality = SedonaKepler.create_map(df_locality, 'Locality')
map_locality
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
[Stage 38:====================================================> (14 + 1) / 15]
CPU times: user 70.8 ms, sys: 14.6 ms, total: 85.4 ms Wall time: 22 s
KeplerGl(data={'Locality': {'index': [], 'columns': ['id', 'updatetime', 'version', 'names', 'adminlevel', 'ma…
Transportation Theme Datasets¶
Connector Dataset¶
%%time
df_connector = sedona.read.format("geoparquet").load(DATA_LINK+"theme=transportation/type=connector")
df_connector = df_connector.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true")
CPU times: user 14.8 ms, sys: 8.02 ms, total: 22.8 ms Wall time: 5.92 s
%%time
map_connector = SedonaKepler.create_map(df_connector, "Connector")
map_connector
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
CPU times: user 1.64 s, sys: 146 ms, total: 1.79 s Wall time: 2min 42s
/usr/local/lib/python3.10/dist-packages/jupyter_client/session.py:718: UserWarning: Message serialization failed with: Out of range float values are not JSON compliant Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant content = self.pack(content)
KeplerGl(data={'Connector': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2…
Segment Dataset¶
%%time
df_segment = sedona.read.format("geoparquet").load(DATA_LINK+"theme=transportation/type=segment")
df_segment = df_segment.filter("ST_Contains(ST_GeomFromWKT('"+spatial_filter+"'), geometry) = true")
df_segment = df_segment.limit(200000)
CPU times: user 19 ms, sys: 3.48 ms, total: 22.5 ms Wall time: 6.13 s
%%time
map_segment = SedonaKepler.create_map(df_segment, "Segment")
map_segment
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
CPU times: user 1.74 s, sys: 124 ms, total: 1.86 s Wall time: 4min 53s
/usr/local/lib/python3.10/dist-packages/jupyter_client/session.py:718: UserWarning: Message serialization failed with: Out of range float values are not JSON compliant Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant content = self.pack(content)
KeplerGl(data={'Segment': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,…