Home
Differential Privacy for Location Data
Privacy Budget Allocation for Spatial Queries
Setting Epsilon Values for Spatial Heatmap Generation

Setting Epsilon Values for Spatial Heatmap Generation

Q: What epsilon value satisfies GDPR for a public spatial heatmap?

GDPR does not prescribe a specific ε. In practice, regulators expect documented risk assessments; most public-sector deployments target ε ≤ 1.0 per query and maintain an auditable privacy budget ledger covering all published layers.

Q: Can I reuse the same epsilon across multiple heatmap exports?

No. Each export consumes budget. Under naive sequential composition, k exports at ε each yield total loss ε_total = kε. Use advanced composition (zCDP or the moments accountant) to get tighter bounds across a series of map releases.

Q: Should I apply kernel density smoothing before or after noise injection?

Apply KDE before noise injection. Smoothing after noise injection misrepresents spatial uncertainty and can re-introduce patterns that the noise was meant to obscure. If post-injection smoothing is unavoidable, widen the kernel bandwidth and document the decision.

For most spatial heatmaps, an epsilon (ε) of 0.1–1.0 per layer delivers strong differential privacy guarantees while preserving recognisable spatial patterns; the exact value depends on your grid resolution, noise mechanism, total privacy budget, and downstream regulatory requirements.

Core Calculation permalink

In differential privacy, ε bounds the maximum multiplicative information leakage between two adjacent datasets. For a spatial heatmap built by binning point records into a grid and counting them, the global sensitivity is:

\Delta = 1

because adding or removing one individual shifts exactly one cell’s count by at most one. Given Δ and a chosen ε, the noise scale for the two standard mechanisms is:

Laplace mechanism (strict ε-DP):

b = \frac{\Delta}{\varepsilon}

where $b$ is the scale parameter of the zero-mean Laplace distribution added to each cell count.

Gaussian mechanism ((ε, δ)-DP):

\sigma = \frac{\Delta \sqrt{2 \ln(1.25/\delta)}}{\varepsilon}

where δ is the relaxed failure probability (typically $10^{-5}$ to $10^{-7}$ ).

Parameter Reference Table permalink

Parameter	Typical Range	Effect on Heatmap
ε (per layer)	0.01 – 2.0	Lower ε = more noise; higher ε = sharper spatial patterns
δ (Gaussian only)	10⁻⁷ – 10⁻⁵	Smaller δ = tighter approximation to pure DP
Δ (sensitivity)	1 (unweighted)	Equals maximum per-individual weight for weighted records
Grid cell size	50 m – 1 km	Coarser cells accumulate more counts, reducing relative noise impact

Worked Numeric Example permalink

Suppose you are generating a 100 m × 100 m grid of pedestrian counts in WGS 84 (EPSG:4326), reprojected to UTM Zone 33N (EPSG:32633) for correct metre-based cell sizing. A busy intersection cell contains 120 raw counts; a peripheral cell contains 3.

With ε = 0.5 and the Laplace mechanism:

b = \frac{1}{0.5} = 2.0

The busy cell receives noise drawn from Laplace(0, 2): 95% of draws fall within ±5.5 counts — a ~4.6% relative error.
The peripheral cell also receives noise from the same distribution — a potential ±183% relative error.

This illustrates why sparse cells are the hardest to protect at any useful ε: you cannot publish a cell count of 3 ± 5 without effectively revealing whether the true count is near zero or not. The mitigation is cell suppression (drop counts below a noise-adjusted threshold) or coarser binning.

With ε = 0.1:

b = \frac{1}{0.1} = 10.0

The busy cell now has a 95% noise window of ±28 counts (~23% relative error). For a public health or demographic overlay, this level of noise may be required to prevent re-identification of individuals from spatial density signatures.

Python Implementation permalink

The function below handles both the Laplace and Gaussian mechanisms for a pre-binned spatial grid, with CRS-aware usage notes. Input counts are assumed to have been computed in a projected CRS (e.g., EPSG:32633 / UTM Zone 33N) so that cell areas are uniform.

import numpy as np
from typing import Literal


def private_spatial_heatmap(
    counts: np.ndarray,
    epsilon: float,
    delta: float = 0.0,
    mechanism: Literal["laplace", "gaussian"] = "laplace",
    sensitivity: float = 1.0,
    suppress_threshold: float | None = None,
    seed: int | None = None,
) -> np.ndarray:
    """
    Apply differential privacy noise to a pre-binned spatial count grid.

    The input `counts` should be a 2-D array whose cells represent equal-area
    spatial bins computed in a projected CRS (e.g., UTM or EPSG:3857).
    Do NOT feed in counts from a geographic (WGS 84 / EPSG:4326) grid directly,
    because cell areas vary with latitude and the sensitivity assumption breaks.

    Args:
        counts:           2-D array of non-negative integer bin counts.
        epsilon:          Privacy budget for this single query (ε > 0).
        delta:            (ε, δ)-DP relaxation; ignored for Laplace, required
                          for Gaussian. Typical values: 1e-5 to 1e-7.
        mechanism:        "laplace" for strict ε-DP; "gaussian" for (ε, δ)-DP.
        sensitivity:      Global sensitivity Δ. Use 1.0 for unweighted point
                          counts; set to max individual weight for weighted data.
        suppress_threshold: If set, cells whose noisy value falls below this
                          threshold are zeroed out (post-processing; no extra
                          budget consumed). Helps mask near-zero cells that
                          could leak sparse population locations.
        seed:             RNG seed for reproducibility. Omit in production so
                          each release draws fresh randomness.

    Returns:
        Noisy count array (float). Values are clamped to ≥ 0 as valid
        post-processing that does not consume additional privacy budget.

    Raises:
        ValueError: On invalid epsilon, delta, or mechanism values.
    """
    if epsilon <= 0:
        raise ValueError(f"epsilon must be > 0, got {epsilon}")

    rng = np.random.default_rng(seed)

    if mechanism == "laplace":
        # b = Δ / ε; smaller ε → larger scale → more noise
        scale = sensitivity / epsilon
        noise = rng.laplace(loc=0.0, scale=scale, size=counts.shape)

    elif mechanism == "gaussian":
        if delta <= 0:
            raise ValueError(
                f"delta must be > 0 for Gaussian mechanism, got {delta}"
            )
        # σ = Δ * sqrt(2 * ln(1.25/δ)) / ε
        # This is the calibration from Dwork & Roth (2014) Theorem A.1.
        sigma = sensitivity * np.sqrt(2.0 * np.log(1.25 / delta)) / epsilon
        noise = rng.normal(loc=0.0, scale=sigma, size=counts.shape)

    else:
        raise ValueError(f"mechanism must be 'laplace' or 'gaussian', got {mechanism!r}")

    noisy = np.maximum(counts.astype(float) + noise, 0.0)

    # Optional suppression: zero out cells whose noisy count is below threshold.
    # This is deterministic post-processing and does not violate DP guarantees.
    if suppress_threshold is not None:
        noisy[noisy < suppress_threshold] = 0.0

    return noisy


# ---------------------------------------------------------------------------
# Example: 100 m × 100 m pedestrian count grid, UTM Zone 33N (EPSG:32633)
# ---------------------------------------------------------------------------
if __name__ == "__main__":
    raw_grid = np.array([
        [120,  85,  3],
        [  4,  47, 22],
        [  1,   9, 61],
    ])

    # Laplace at ε = 0.5 — suitable for aggregated mobility data
    private_grid_lap = private_spatial_heatmap(
        raw_grid, epsilon=0.5, mechanism="laplace", seed=42
    )

    # Gaussian at ε = 0.3, δ = 1e-6 — better utility for multi-layer exports
    private_grid_gau = private_spatial_heatmap(
        raw_grid, epsilon=0.3, delta=1e-6,
        mechanism="gaussian", suppress_threshold=2.0, seed=42
    )
    print("Laplace noisy grid:\n", private_grid_lap.round(1))
    print("Gaussian noisy grid (suppressed):\n", private_grid_gau.round(1))

Verification Snippet permalink

Run these checks after generating the private grid to confirm the implementation behaves correctly before publishing:

import numpy as np

def verify_heatmap_privacy(
    raw: np.ndarray,
    noisy: np.ndarray,
    epsilon: float,
    high_density_rel_error_threshold: float = 0.20,
    low_density_cutoff: int = 10,
) -> dict:
    """
    Spot-check that the noisy heatmap meets basic utility and noise expectations.

    Returns a dict with keys: mean_abs_error, high_density_ok, noise_symmetric.
    All three must be True / within tolerance before publishing.
    """
    abs_error = np.abs(noisy - raw)
    mean_abs_error = float(abs_error.mean())

    # High-density cells should have low relative error
    high_mask = raw >= low_density_cutoff
    if high_mask.any():
        rel_err = (abs_error[high_mask] / raw[high_mask]).mean()
        high_density_ok = bool(rel_err <= high_density_rel_error_threshold)
    else:
        rel_err = float("nan")
        high_density_ok = True  # no high-density cells to check

    # Noise should be centred near zero (no systematic bias)
    noise = noisy - raw
    noise_symmetric = bool(abs(noise.mean()) < 2.0 / epsilon)

    return {
        "mean_abs_error": round(mean_abs_error, 3),
        "high_density_relative_error": round(rel_err, 3) if not np.isnan(rel_err) else "n/a",
        "high_density_ok": high_density_ok,
        "noise_symmetric": noise_symmetric,
    }


# Usage
raw_grid = np.array([[120, 85, 3], [4, 47, 22], [1, 9, 61]])
noisy_grid = private_spatial_heatmap(raw_grid, epsilon=0.5, seed=42)  # from above
report = verify_heatmap_privacy(raw_grid, noisy_grid, epsilon=0.5)
print(report)
# Expected: high_density_ok=True, noise_symmetric=True for ε ≥ 0.3

Additionally, compute Global Moran’s I on both the raw and noisy grids using esda (part of the PySAL ecosystem). If the Moran’s I drops by more than 30%, the spatial autocorrelation structure that makes the heatmap meaningful has been damaged — consider raising ε or coarsening the grid.

Edge Cases and Adjustments permalink

Sparse cells in rural or peripheral zones. When most cells have counts below 5, even a modest b = 2 (ε = 0.5) can produce relative errors exceeding 100%. Apply cell suppression (suppress_threshold) or aggregate to a coarser grid before injecting noise. Alternatively, use the Laplace or Gaussian noise for coordinate data approach of jittering raw points rather than binning first, then re-bin from the jittered coordinates.
Non-uniform density zones (urban core vs. rural fringe). A single ε across the entire grid penalises urban cells (over-protected) and fails rural ones (under-protected at sensible ε). Consider adaptive binning: use finer cells in high-density zones and coarser cells where counts are naturally low, then apply the same ε. The sensitivity remains Δ = 1 regardless of cell size for unweighted counts.
Temporal windowing and repeated exports. Publishing weekly or monthly snapshots of the same grid exhausts the budget rapidly under naive sequential composition. Switch to zero-concentrated DP (zCDP) accounting or the moments accountant to track cumulative loss tightly. A time series of k snapshots at ε each costs at most $\varepsilon_{\text{total}} \approx \varepsilon\sqrt{2k\ln(1/\delta)} + k\varepsilon^2$ under advanced composition, far less than the naive $k\varepsilon$ .
CRS and projection gotchas. Sensitivity Δ = 1 holds for unweighted point counts regardless of CRS, but cell area matters for interpreting noise magnitude. Always reproject to a local equal-area or UTM projection (e.g., EPSG:32633 for central Europe) before binning. Feeding geographic coordinates (EPSG:4326) into a rectilinear grid produces cells with wildly varying real-world areas at different latitudes, making utility comparisons across the grid meaningless.

Frequently Asked Questions permalink

What epsilon value satisfies GDPR for a public spatial heatmap?

GDPR does not prescribe a specific ε. Regulators expect documented risk assessments and empirical utility-privacy tradeoff analyses. Most public-sector deployments committed to GDPR and CCPA compliance for location data target ε ≤ 1.0 per query and maintain an auditable privacy budget ledger covering all published layers. For heatmaps containing health, demographic, or vulnerable-population data, ε ≤ 0.3 is common.

How does ε change at coarser grid resolutions?

Coarser cells accumulate more genuine counts per bin, so the signal-to-noise ratio improves at the same ε. You can lower ε (strengthen privacy) at coarser resolutions while keeping hotspot topology intact — or hold ε fixed and treat the coarser resolution as providing a privacy bonus. This is the key lever for balancing privacy against visual resolution in public dashboards.

Can I reuse the same epsilon across multiple heatmap exports?

No. Each export consumes budget from the same underlying dataset. Under naive sequential composition, k exports at ε each yield total loss ε_total = kε, which means five weekly exports at ε = 0.5 accumulate to ε_total = 2.5 — a weak guarantee. Use advanced composition or zCDP to obtain tighter bounds, or enforce a hard per-year budget cap with a centralised privacy budget ledger.

Should I apply kernel density smoothing before or after noise injection?

Always before. Smoothing raw counts before noise injection does not violate DP (it is a preprocessing step on the data curator’s side, not a query result). Smoothing after noise injection can misrepresent spatial uncertainty: the smoothed surface obscures where the noise is large (sparse cells) versus small (dense cells), misleading downstream analysts about data reliability. If post-injection smoothing is operationally unavoidable, use a wider bandwidth and document the decision in your privacy audit log.

Privacy Budget Allocation for Spatial Queries — composition strategies for multi-layer and time-series map releases
Accuracy vs. Utility Tradeoffs in Geospatial DP — measuring and optimising the privacy-utility balance for spatial analytics
Laplace and Gaussian Noise for Coordinate Data — applying noise directly to raw coordinates before binning
Re-identification Risk Assessment for Geospatial Datasets — quantifying how spatial density signatures enable individual re-identification
GDPR and CCPA Compliance Mapping for Location Data — regulatory requirements that shape ε selection and audit documentation

← Back to Privacy Budget Allocation for Spatial Queries

Setting Epsilon Values for Spatial Heatmap Generation

Core Calculation # permalink

Parameter Reference Table # permalink

Worked Numeric Example # permalink

Python Implementation # permalink

Verification Snippet # permalink

Edge Cases and Adjustments # permalink

Frequently Asked Questions # permalink

Related # permalink

Related topics