Spatial Queries: k-NN & Point-in-Radius¶
This guide covers two spatial query operations that use exact geodesic (Vincenty) distances on the WGS-84 ellipsoid:
geodesic_knn: Find the k closest points to a query locationpoint_in_radius: Find all points within a given distance of a centre
These fill an important gap: popular tools like scikit-learn's BallTree only support the haversine metric (spherical approximation), which can introduce errors of up to ~0.3% compared to true ellipsoidal distances.
k-Nearest Neighbours (geodesic_knn)¶
Why Geodesic k-NN?¶
Standard k-NN implementations use Euclidean distance or haversine (spherical) approximations. On the WGS-84 ellipsoid, these can produce incorrect nearest-neighbour rankings, especially for:
- Points near the poles (where longitude lines converge)
- Long-distance queries (where ellipsoidal effects are significant)
- High-precision applications (logistics, surveying, telecommunications)
Geodistpy's geodesic_knn uses Vincenty's inverse formula for every distance computation, ensuring sub-millimetre accuracy in the distance ranking.
Basic Usage¶
from geodistpy import geodesic_knn
# Candidate cities
cities = [
(48.8566, 2.3522), # 0: Paris
(40.7128, -74.0060), # 1: New York
(51.5074, -0.1278), # 2: London
(41.9028, 12.4964), # 3: Rome
(59.3293, 18.0686), # 4: Stockholm
]
# Find the 3 nearest cities to Berlin
query = (52.5200, 13.4050)
idx, dists = geodesic_knn(query, cities, k=3, metric='km')
print("3 nearest cities to Berlin:")
names = ['Paris', 'New York', 'London', 'Rome', 'Stockholm']
for i, (ci, d) in enumerate(zip(idx, dists)):
print(f" {i+1}. {names[ci]:>10} — {d:.1f} km")
Understanding the Output¶
geodesic_knn returns a tuple of two NumPy arrays:
indices: Shape(k,)— indices into thecandidatesarray, ordered nearest-firstdistances: Shape(k,)— corresponding distances in the specified metric
idx, dists = geodesic_knn(query, candidates, k=3, metric='km')
# idx[0] is the index of the nearest candidate
# dists[0] is the distance to the nearest candidate
# idx[1] is the second nearest, etc.
Different Metrics¶
from geodistpy import geodesic_knn
query = (52.5200, 13.4050)
cities = [(48.8566, 2.3522), (51.5074, -0.1278)]
# Same query, different units
_, d_m = geodesic_knn(query, cities, k=1, metric='meter')
_, d_km = geodesic_knn(query, cities, k=1, metric='km')
_, d_mi = geodesic_knn(query, cities, k=1, metric='mile')
_, d_nm = geodesic_knn(query, cities, k=1, metric='nmi')
print(f"Nearest: {d_m[0]:.0f} m = {d_km[0]:.1f} km = {d_mi[0]:.1f} mi = {d_nm[0]:.1f} nmi")
Error Handling¶
from geodistpy import geodesic_knn
# k must be ≥ 1
geodesic_knn((0, 0), [(1, 1)], k=0) # → ValueError
# k must not exceed number of candidates
geodesic_knn((0, 0), [(1, 1)], k=5) # → ValueError
# Invalid coordinates
geodesic_knn((95, 0), [(1, 1)], k=1) # → ValueError (lat > 90)
Point-in-Radius¶
Why Point-in-Radius?¶
The point_in_radius function finds all candidate points within a specified geodesic distance of a centre point. This is the fundamental operation behind:
- Geofencing: Trigger actions when devices enter/leave a zone
- Store locators: "Find stores within 10 km of me"
- Proximity alerts: Warn when approaching a restricted area
- Service coverage: Determine which users are within coverage radius
- Spatial filtering: Pre-filter candidates before detailed analysis
Basic Usage¶
from geodistpy import point_in_radius
# Candidate locations
locations = [
(48.8566, 2.3522), # 0: Paris (~880 km from Berlin)
(40.7128, -74.0060), # 1: New York (~6400 km from Berlin)
(51.5074, -0.1278), # 2: London (~930 km from Berlin)
(41.9028, 12.4964), # 3: Rome (~1180 km from Berlin)
(59.3293, 18.0686), # 4: Stockholm (~810 km from Berlin)
]
# Find locations within 1000 km of Berlin
centre = (52.5200, 13.4050)
idx, dists = point_in_radius(centre, locations, 1000, metric='km')
names = ['Paris', 'New York', 'London', 'Rome', 'Stockholm']
print(f"Cities within 1000 km of Berlin:")
for i, (ci, d) in enumerate(zip(idx, dists)):
print(f" {names[ci]:>10} — {d:.1f} km")
Understanding the Output¶
point_in_radius returns a tuple of two NumPy arrays:
indices: Indices intocandidatesof points within the radiusdistances: Corresponding distances to each point within the radius
idx, dists = point_in_radius(centre, candidates, radius, metric='km')
# len(idx) is the number of points found
# idx contains the original indices into candidates
# dists[i] is the distance to candidates[idx[i]]
Boundary Behaviour¶
Points exactly on the boundary (distance = radius) are included:
from geodistpy import point_in_radius, geodist
centre = (0.0, 0.0)
pt = (0.0, 1.0)
exact_dist = geodist(centre, pt, metric='km')
idx, _ = point_in_radius(centre, [pt], exact_dist, metric='km')
print(f"Included: {0 in idx}") # True (boundary is inclusive)
Empty Results¶
When no points fall within the radius, empty arrays are returned:
from geodistpy import point_in_radius
idx, dists = point_in_radius((0, 0), [(48.8566, 2.3522)], 10, metric='km')
print(f"Found: {len(idx)} points") # 0
print(f"idx: {idx}") # []
print(f"dists: {dists}") # []
Real-World Examples¶
Store Locator¶
from geodistpy import point_in_radius
# Store coordinates (lat, lon)
stores = [
(48.8534, 2.3488), # Store A — central Paris
(48.8738, 2.2950), # Store B — near Eiffel Tower
(48.8606, 2.3376), # Store C — Louvre area
(48.8156, 2.3631), # Store D — south Paris
(49.0097, 2.5479), # Store E — CDG airport area
]
# User location
user = (48.8566, 2.3522) # near Notre-Dame
# Find stores within 3 km
idx, dists = point_in_radius(user, stores, 3, metric='km')
print(f"Stores within 3 km:")
for ci, d in zip(idx, dists):
print(f" Store {chr(65 + ci)} — {d * 1000:.0f} m away")
Geofencing Alert System¶
from geodistpy import point_in_radius
# Restricted zones (centre points)
restricted_zones = [
(51.4775, -0.4614), # Heathrow Airport
(51.1537, -0.1821), # Gatwick Airport
]
zone_radius_km = 5 # 5 km exclusion zone
# Check if a drone position violates any zone
drone_pos = (51.4700, -0.4500)
for i, zone in enumerate(restricted_zones):
idx, dists = point_in_radius(zone, [drone_pos], zone_radius_km, metric='km')
if len(idx) > 0:
print(f"⚠️ ALERT: Within {dists[0]:.2f} km of restricted zone {i}")
else:
print(f"✅ Clear of restricted zone {i}")
Combining k-NN with Point-in-Radius¶
A common pattern: first filter by radius, then rank the results:
from geodistpy import point_in_radius, geodesic_knn
import numpy as np
# Many candidate locations
candidates = [
(48.8566, 2.3522), # Paris
(40.7128, -74.0060), # New York
(51.5074, -0.1278), # London
(41.9028, 12.4964), # Rome
(59.3293, 18.0686), # Stockholm
(50.0755, 14.4378), # Prague
(52.2297, 21.0122), # Warsaw
(47.4979, 19.0402), # Budapest
]
query = (52.5200, 13.4050) # Berlin
# Step 1: Filter to within 1500 km
idx_in_range, _ = point_in_radius(query, candidates, 1500, metric='km')
nearby = [candidates[i] for i in idx_in_range]
# Step 2: Find the 3 nearest among those
if len(nearby) >= 3:
knn_idx, knn_dists = geodesic_knn(query, nearby, k=3, metric='km')
print("Top 3 nearest within 1500 km:")
for ki, kd in zip(knn_idx, knn_dists):
original_idx = idx_in_range[ki]
print(f" Index {original_idx}: {kd:.1f} km")
Haversine vs Vincenty: Why It Matters¶
| Aspect | Haversine (spherical) | Vincenty (ellipsoidal) |
|---|---|---|
| Earth model | Perfect sphere (R=6371 km) | WGS-84 ellipsoid |
| Mean error | ~0.3% (~3 km per 1000 km) | ~0.000001% (~9 µm) |
| Max error | ~0.5% at equator | Sub-millimetre |
| Ranking errors | Possible for close candidates | None |
| Speed (geodistpy) | ~0.3 µs/call | ~0.5 µs/call |
For most k-NN and radius queries, the ranking is what matters. In cases where two candidates are close in distance, haversine's ~0.3% error can swap their order, returning the wrong "nearest" point. Vincenty eliminates this risk entirely.
Performance Considerations¶
| Operation | 10 candidates | 100 candidates | 1,000 candidates |
|---|---|---|---|
geodesic_knn |
~5 µs | ~50 µs | ~500 µs |
point_in_radius |
~5 µs | ~50 µs | ~500 µs |
Both functions compute distances sequentially (one Vincenty call per candidate). For very large candidate sets (>100,000 points), consider:
- Pre-filtering with a bounding box on lat/lon before calling these functions
- Using
geodist_matrixfor batch computation when you need all pairwise distances
Using Different Ellipsoids¶
Both geodesic_knn() and point_in_radius() accept an optional ellipsoid parameter. By default, WGS-84 is used. You can choose from six built-in ellipsoids or pass a custom (a, f) tuple:
from geodistpy import geodesic_knn, point_in_radius, ELLIPSOIDS
query = (52.5200, 13.4050) # Berlin
cities = [
(48.8566, 2.3522), # Paris
(51.5074, -0.1278), # London
(41.9028, 12.4964), # Rome
]
# k-NN on the GRS-80 ellipsoid
idx, dists = geodesic_knn(query, cities, k=2, metric='km', ellipsoid='GRS-80')
print(f"GRS-80 nearest: indices {idx}, distances {dists.round(1)} km")
# Point-in-radius on the Clarke 1880 ellipsoid
idx, dists = point_in_radius(query, cities, 1000, metric='km', ellipsoid='Clarke (1880)')
print(f"Clarke within 1000 km: indices {idx}")
# Custom ellipsoid
idx, dists = geodesic_knn(query, cities, k=1, metric='km', ellipsoid=(6378160.0, 1/298.25))
print(f"Custom nearest: index {idx[0]}, distance {dists[0]:.1f} km")
# See all available ellipsoids
print(ELLIPSOIDS.keys())
# dict_keys(['WGS-84', 'GRS-80', 'Airy (1830)', 'Intl 1924', 'Clarke (1880)', 'GRS-67'])
Supported named ellipsoids: WGS-84 (default), GRS-80, Airy (1830), Intl 1924, Clarke (1880), GRS-67.
Edge Cases¶
| Scenario | Behavior |
|---|---|
| k = n (all candidates) | Returns all candidates sorted by distance |
| Coincident query and candidate | Distance = 0, included in results |
| Near-antipodal points | GeographicLib fallback ensures correct distance |
| Points at the poles | Handled correctly |
| Empty radius result | Returns empty arrays (array([]), array([])) |