Understand Consumer Behavior With Verified Foot Traffic Data¶
Background Visitor and demographic aggregation data provide essential context on population behavior. How often a place of interest is visited, how long do visitors stay, where did they come from, and where are they going? The answers are invaluable in numerous industries. Building financial indicators, city and urban planning, public health indicators, or identifying your primary business competitors all require accurate, high quality population and POI data.
Objective Our workshop’s objective is to provide professionals, researchers, and practitioners interested in deriving human movement patterns from location data. We use a sample of our Weekly and Monthly Patterns and Core Places products to perform market research on a potential new coffee shop location. We’ll address these concerns and more in building a market analysis proposal in real time.
Questions to Answer
- How far are customers willing to travel for coffee?
- What location will receive the most visibility?
- Where do most of the coffee customers come from?
Notebook Setup¶
from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = (
SparkSession.builder.appName("sigspatial2021")
.master("spark://data-ocean-lab-1:7077")
.config("spark.serializer", KryoSerializer.getName)
.config("spark.kryo.registrator", SedonaKryoRegistrator.getName)
.config(
"spark.jars.packages",
"org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.0-incubating,"
"org.datasyslab:geotools-wrapper:1.1.0-25.2",
)
.getOrCreate()
)
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/media/hdd1/code/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release
:: loading settings :: url = jar:file:/media/hdd1/code/spark-3.1.2-bin-hadoop3.2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/jiayu/.ivy2/cache The jars for the packages stored in: /home/jiayu/.ivy2/jars org.apache.sedona#sedona-python-adapter-3.0_2.12 added as a dependency org.datasyslab#geotools-wrapper added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-f959fc69-d6af-4223-b017-6334535799e9;1.0 confs: [default] found org.apache.sedona#sedona-python-adapter-3.0_2.12;1.2.0-incubating in central found org.locationtech.jts#jts-core;1.18.0 in local-m2-cache found org.wololo#jts2geojson;0.16.1 in central found com.fasterxml.jackson.core#jackson-databind;2.12.2 in central found com.fasterxml.jackson.core#jackson-annotations;2.12.2 in central found com.fasterxml.jackson.core#jackson-core;2.12.2 in central found org.apache.sedona#sedona-core-3.0_2.12;1.2.0-incubating in central found org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 in central found org.apache.sedona#sedona-sql-3.0_2.12;1.2.0-incubating in central found org.datasyslab#geotools-wrapper;1.1.0-25.2 in central :: resolution report :: resolve 322ms :: artifacts dl 7ms :: modules in use: com.fasterxml.jackson.core#jackson-annotations;2.12.2 from central in [default] com.fasterxml.jackson.core#jackson-core;2.12.2 from central in [default] com.fasterxml.jackson.core#jackson-databind;2.12.2 from central in [default] org.apache.sedona#sedona-core-3.0_2.12;1.2.0-incubating from central in [default] org.apache.sedona#sedona-python-adapter-3.0_2.12;1.2.0-incubating from central in [default] org.apache.sedona#sedona-sql-3.0_2.12;1.2.0-incubating from central in [default] org.datasyslab#geotools-wrapper;1.1.0-25.2 from central in [default] org.locationtech.jts#jts-core;1.18.0 from local-m2-cache in [default] org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 from central in [default] org.wololo#jts2geojson;0.16.1 from central in [default] :: evicted modules: org.locationtech.jts#jts-core;1.18.1 by [org.locationtech.jts#jts-core;1.18.0] in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 11 | 0 | 0 | 1 || 10 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-f959fc69-d6af-4223-b017-6334535799e9 confs: [default] 0 artifacts copied, 10 already retrieved (0kB/8ms) 2022-05-22 13:07:11,104 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
import pyspark.sql.functions as f
from pyspark.sql.types import MapType, StringType, IntegerType
import pandas as pd
Load in SafeGraph sample data from s3¶
The covers all coffee shops (category_tag.contains("Coffee Shop")
) in Seattle (county_FIPS.isin(["53033","53053","53061"]
), with multiple rows per POI corresponding to monthly foot traffic since the beginning of 2018 (1 month per row per POI).
Columns are SafeGraph Core, Geo, and Patterns pre-joined together with placekey
as the join key.
sample_csv_path = "file:///media/hdd1/code/sigspatial-2021-cafe-analysis/data/seattle_coffee_monthly_patterns/"
sample = (
spark.read.option("header", "true")
.option("escape", '"')
.csv(sample_csv_path)
.withColumn("date_range_start", f.to_date(f.col("date_range_start")))
.withColumn("date_range_end", f.to_date(f.col("date_range_end")))
.withColumn(
"visitor_home_cbgs",
f.from_json("visitor_home_cbgs", schema=MapType(StringType(), IntegerType())),
)
.withColumn("distance_from_home", f.col("distance_from_home"))
)
print("Number of coffe shops patterns: ", sample.count())
print("Number of coffe shops: ", sample.select("placekey").distinct().count())
sample.limit(10).toPandas().head()
Number of coffe shops patterns: 66607
Number of coffe shops: 1679
placekey | safegraph_place_id | parent_placekey | parent_safegraph_place_id | safegraph_brand_ids | location_name | brands | store_id | top_category | sub_category | ... | distance_from_home | median_dwell | bucketed_dwell_times | related_same_day_brand | related_same_month_brand | popularity_by_hour | popularity_by_day | device_type | carrier_name | county_fips | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 226-222@5x3-t3j-ch5 | sg:290a6c59dd3f4b28a1c086a510e40944 | None | None | None | Black Gold Coffee Company | None | None | Restaurants and Other Eating Places | Snack and Nonalcoholic Beverage Bars | ... | 11416 | 20.0 | {"<5":3,"5-10":30,"11-20":12,"21-60":14,"61-12... | {"CENEX":10,"Starbucks":9,"Safeway Pharmacy":9... | {"McDonald's":44,"Walmart":37,"Starbucks":37,"... | [9,9,9,9,9,12,11,9,15,20,15,22,16,12,11,11,8,7... | {"Monday":13,"Tuesday":20,"Wednesday":8,"Thurs... | {"android":25,"ios":14} | {"AT&T":5,"Sprint":4,"T-Mobile":9,"Verizon":14} | 53033 |
1 | 224-222@5x4-49y-8gk | sg:33d6c187ce6e419cbcb09f0b47ed6236 | None | None | None | Bite Box | None | None | Restaurants and Other Eating Places | Snack and Nonalcoholic Beverage Bars | ... | 2737 | 41.0 | {"<5":4,"5-10":28,"11-20":12,"21-60":30,"61-12... | {"Safeway Pharmacy":6,"Ace Hardware":5,"ARCO":... | {"Starbucks":41,"Shell Oil":33,"Safeway Pharma... | [8,10,10,9,8,12,13,19,22,25,30,34,39,32,21,20,... | {"Monday":17,"Tuesday":25,"Wednesday":15,"Thur... | {"android":20,"ios":51} | {"AT&T":14,"Sprint":1,"T-Mobile":14,"Verizon":35} | 53033 |
2 | 228-222@5x2-t6c-qfz | sg:442c1606a0d54d2aa017ea5a89ea8d78 | zzw-223@5x2-t6c-qfz | sg:a6e078c59b154cd59d8ac9e1f6f6311c | SG_BRAND_f116acfe9147494063e58da666d1d57e | Starbucks | Starbucks | 26209-243989 | Restaurants and Other Eating Places | Snack and Nonalcoholic Beverage Bars | ... | 5074 | 9.0 | {"<5":22,"5-10":165,"11-20":47,"21-60":50,"61-... | {"Walmart":7,"Safeway Fuel Station":4,"McDonal... | {"Walmart":55,"McDonald's":45,"Costco":35,"She... | [9,7,6,6,6,10,17,22,26,47,61,51,39,35,39,48,26... | {"Monday":52,"Tuesday":60,"Wednesday":38,"Thur... | {"android":100,"ios":160} | {"AT&T":64,"Sprint":9,"T-Mobile":62,"Verizon":98} | 53061 |
3 | zzw-22b@5x4-4yr-gtv | sg:5b5977ecc8814452bb5630d0b1b56ae1 | None | None | None | Cozy Bubble Tea | None | None | Restaurants and Other Eating Places | Snack and Nonalcoholic Beverage Bars | ... | None | 90.0 | {"<5":0,"5-10":0,"11-20":0,"21-60":1,"61-120":... | {"Hallmark Cards":20,"76":20} | {"Costco Gasoline":60,"Costco":60,"Starbucks":... | [1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,1,2,2,2,2] | {"Monday":0,"Tuesday":1,"Wednesday":1,"Thursda... | {"android":4,"ios":4} | {"AT&T":1,"T-Mobile":2,"Verizon":1} | 53033 |
4 | zzw-224@5x4-4bc-d5f | sg:6af014f98e9f471b85b8e3eacb58b2f7 | None | None | SG_BRAND_f116acfe9147494063e58da666d1d57e | Starbucks | Starbucks | 3278-4859 | Restaurants and Other Eating Places | Snack and Nonalcoholic Beverage Bars | ... | 9319 | 45.0 | {"<5":36,"5-10":201,"11-20":119,"21-60":175,"6... | {"Potbelly Sandwich Works":1,"Kung Fu Tea":1,"... | {"Chevron":18,"Costco":15,"Shell Oil":13,"McDo... | [202,186,187,192,187,196,235,254,272,267,259,2... | {"Monday":181,"Tuesday":171,"Wednesday":127,"T... | {"android":136,"ios":289} | {"AT&T":41,"Sprint":2,"T-Mobile":69,"Verizon":91} | 53033 |
5 rows × 51 columns
Exploratory Data Analysis and Visualization¶
Visualize the coffee shops¶
from pyspark.sql.window import Window
import geopandas as gpd
import folium
w = Window().partitionBy("placekey").orderBy(f.col("date_range_start").desc())
cafes_latest = (
sample
# as our data improves, addresses or geocodes for a given location may change over time
# use a window function to keep only the most recent appearance of the given cafe
.withColumn("row_num", f.row_number().over(w)).filter(f.col("row_num") == 1)
# select the columns we need for mapping
.select(
"placekey",
"location_name",
"brands",
"street_address",
"city",
"region",
"postal_code",
"latitude",
"longitude",
"open_hours",
)
)
# create a geopandas geodataframe
cafes_gdf = cafes_latest.toPandas()
cafes_gdf = gpd.GeoDataFrame(
cafes_gdf,
geometry=gpd.points_from_xy(cafes_gdf["longitude"], cafes_gdf["latitude"]),
crs="EPSG:4326",
)
def map_cafes(gdf):
# map bounds
sw = [gdf.unary_union.bounds[1], gdf.unary_union.bounds[0]]
ne = [gdf.unary_union.bounds[3], gdf.unary_union.bounds[2]]
folium_bounds = [sw, ne]
# map
x = gdf.centroid.x[0]
y = gdf.centroid.y[0]
map_ = folium.Map(location=[y, x], tiles="OpenStreetMap")
for i, point in gdf.iterrows():
tooltip = f"placekey: {point['placekey']}<br>location_name: {point['location_name']}<br>brands: {point['brands']}<br>street_address: {point['street_address']}<br>city: {point['city']}<br>region: {point['region']}<br>postal_code: {point['postal_code']}<br>open_hours: {point['open_hours']}"
folium.Circle(
[point["geometry"].y, point["geometry"].x],
radius=40,
fill_color="blue",
color="blue",
fill_opacity=1,
tooltip=tooltip,
).add_to(map_)
map_.fit_bounds(folium_bounds)
return map_
location_name: {point['location_name']}
brands: {point['brands']}
street_address: {point['street_address']}
city: {point['city']}
region: {point['region']}
postal_code: {point['postal_code']}
open_hours: {point['open_hours']}" folium.Circle( [point["geometry"].y, point["geometry"].x], radius=40, fill_color="blue", color="blue", fill_opacity=1, tooltip=tooltip, ).add_to(map_) map_.fit_bounds(folium_bounds) return map_
map_ = map_cafes(cafes_gdf)
map_
/tmp/ipykernel_2685102/3619001546.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. x = gdf.centroid.x[0] /tmp/ipykernel_2685102/3619001546.py:10: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. y = gdf.centroid.y[0]