Mapping GDPR Article 22 to Location Tracking Systems

Q: What fuzz radius satisfies Article 22 for sensitive-venue proximity?

A minimum 500 m Gaussian fuzz radius (σ = 500 m in EPSG:3857) is a defensible baseline for sensitive locations such as medical clinics or places of worship; higher-risk contexts should increase σ to 1 000 m or adopt spatial k-anonymity grouping.

Q: Is explicit consent the only lawful basis for automated location profiling?

No. Article 22(2) also permits automated decisions that are necessary for a contract or authorised by EU/member-state law with suitable safeguards. Explicit consent under Article 6(1)(a) and 9(2)(a) is one valid basis but not the only one.

Article 22 applies when three conditions simultaneously hold in a spatial pipeline: automated processing, profiling intent, and a legally or similarly significant effect on a data subject — at which point the system must either obtain explicit consent, satisfy a contractual necessity, or implement architectural safeguards that route high-risk spatial decisions through human review before execution.

Core Specification: The Three-Trigger Test permalink

Three independent Boolean conditions must all evaluate to true for Article 22 to activate. Think of them as hard gates in the compliance architecture rather than a post-hoc policy checklist:

Trigger	Spatial manifestation	Gate value
Automated processing	GPS stream → rule engine / model → outcome, no human at decision point	`auto = True`
Profiling intent	Coordinates transformed into behavioral or demographic inference (e.g., dwell time, commute classification, sensitive-venue proximity)	`profile = True`
Significant effect	Output alters access, pricing tier, eligibility, employment screening, or legal standing	`effect = True`

\text{Art22} = \text{auto} \wedge \text{profile} \wedge \text{effect}

When Art22 = True, one of three responses must be architecturally enforced:

\text{Safeguard} \in \{\text{explicit\_consent},\ \text{contract\_necessity},\ \text{HITL\_override}\}

Worked numeric example: insurance risk scoring permalink

Consider a fleet telematics pipeline that ingests 10 Hz GPS pings, derives a hard-braking frequency score over 30-day rolling windows, and feeds that score to an insurer’s premium-adjustment engine. Evaluation:

auto: The rule engine updates premiums nightly in a batch job — no underwriter reviews individual scores. ✓
profile: Raw coordinates are aggregated into driving-behaviour features (harsh-braking rate, high-speed-zone frequency, night-driving fraction). ✓
effect: A score above the 75th percentile triggers a premium increase of 12–18 %. This constitutes a similarly significant financial effect. ✓

All three gates pass → Art22 = True. The pipeline must route score decisions above the significance threshold to a human underwriter before the premium change executes.

Article 22 Decision Gate permalink

Article 22 Decision Gate

All three gates — automated processing, profiling intent, and significant effect — must be true simultaneously for Article 22 to require HITL safeguards. A "No" at any gate routes to standard processing.

Python Implementation permalink

The processor below intercepts a location stream, applies coordinate generalization via Gaussian noise in a metric CRS, evaluates the risk score against a calibrated significance threshold, and routes high-risk automated decisions to a human review queue. It is a focused, production-ready implementation that satisfies Article 22’s three-trigger test in a single processing pass.

"""
article22_location_processor.py

Production implementation of GDPR Article 22 safeguards for spatial pipelines.
Requires: geopandas>=0.14, numpy>=1.26, pyproj>=3.6
CRS: input must be EPSG:4326 (WGS84); metric operations use EPSG:3857 (Web Mercator).
"""

from __future__ import annotations

import logging
from dataclasses import dataclass, field
from typing import Any

import geopandas as gpd
import numpy as np

logger = logging.getLogger(__name__)


@dataclass
class Article22Config:
    """Calibration parameters for an Article 22 compliant spatial pipeline.

    fuzz_radius_meters: Gaussian σ for coordinate noise (metres, EPSG:3857).
        500 m is a defensible baseline; use 1 000 m for sensitive-venue proximity.
    significance_threshold: Risk-score cutoff above which a human review
        is mandatory. Calibrate per decision domain (e.g. 0.75 for insurance,
        0.60 for employment screening).
    k_anonymity_min: Minimum group size required before any spatial aggregate
        may be released without HITL. Enforces k-anonymity as a secondary guard.
    """
    fuzz_radius_meters: float = 500.0
    significance_threshold: float = 0.75
    k_anonymity_min: int = 5


@dataclass
class DecisionOutcome:
    decision_id: str
    status: str                          # "automated" | "pending_human_review"
    score: float
    spatial_summary: dict[str, Any] = field(default_factory=dict)
    audit_note: str = ""


class Article22LocationProcessor:
    """Apply Article 22 safeguards to a batch of scored location records.

    The three Article 22 triggers — automated processing, profiling intent,
    and significant effect — are treated as architectural gates, not policy
    documents. This class enforces the gate at the point of feature extraction,
    before any model output reaches a downstream actuator.
    """

    def __init__(self, config: Article22Config | None = None) -> None:
        self.cfg = config or Article22Config()
        self._rng = np.random.default_rng()

    # ------------------------------------------------------------------
    # Step 1 — Spatial generalisation (must run before feature extraction)
    # ------------------------------------------------------------------

    def spatial_generalize(self, gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
        """Fuzz coordinates with Gaussian noise at the configured radius.

        Projects from WGS84 (EPSG:4326) to Web Mercator (EPSG:3857) so that
        noise magnitude is in metres rather than fractional degrees, then
        reprojects back. Never apply degree-space noise — 0.005° near the
        equator is ~556 m but shrinks to ~390 m at 45° latitude.
        """
        gdf_m = gdf.to_crs(epsg=3857).copy()

        σ = self.cfg.fuzz_radius_meters
        noise_x = self._rng.normal(0, σ, len(gdf_m))
        noise_y = self._rng.normal(0, σ, len(gdf_m))

        gdf_m["geometry"] = gpd.points_from_xy(
            gdf_m.geometry.x + noise_x,
            gdf_m.geometry.y + noise_y,
        )
        gdf_m = gdf_m.set_crs("EPSG:3857", allow_override=True)
        return gdf_m.to_crs(epsg=4326)

    # ------------------------------------------------------------------
    # Step 2 — Significance evaluation (the Article 22 "effect" gate)
    # ------------------------------------------------------------------

    def is_significant(self, score: float) -> bool:
        """Return True when the risk score crosses the significant-effect threshold.

        The threshold represents the boundary between minor personalisation
        (below) and decisions with a legal or similarly significant effect
        (above). Domain examples:
          - Credit / insurance pricing tier: 0.75
          - Employment screening shortlist:  0.65
          - Law enforcement routing:         0.60
        Re-calibrate after any model retraining or decision-type change.
        """
        return float(score) >= self.cfg.significance_threshold

    # ------------------------------------------------------------------
    # Step 3 — HITL routing for high-risk automated decisions
    # ------------------------------------------------------------------

    def route_to_hitl(
        self,
        gdf: gpd.GeoDataFrame,
        decision_id: str,
        score: float,
    ) -> DecisionOutcome:
        """Queue a high-risk spatial decision for mandatory human review.

        Persists only the generalized bounding box, never raw coordinates,
        so the audit trail itself cannot become a re-identification vector.
        """
        logger.warning(
            "Article 22 HITL gate triggered: decision_id=%s score=%.4f",
            decision_id,
            score,
        )
        return DecisionOutcome(
            decision_id=decision_id,
            status="pending_human_review",
            score=score,
            spatial_summary={
                "record_count": len(gdf),
                # total_bounds returns [minx, miny, maxx, maxy] in EPSG:4326
                "generalized_bounds_wgs84": gdf.total_bounds.tolist(),
            },
            audit_note=(
                f"Score {score:.4f} ≥ threshold {self.cfg.significance_threshold}. "
                "Awaiting human decision before downstream actuator fires."
            ),
        )

    # ------------------------------------------------------------------
    # Step 4 — End-to-end pipeline
    # ------------------------------------------------------------------

    def process_batch(
        self,
        raw_gdf: gpd.GeoDataFrame,
        model_scores: np.ndarray,
        decision_ids: list[str],
    ) -> list[DecisionOutcome]:
        """Run the full Article 22 pipeline over a batch of location records.

        Pipeline order:
          1. Generalise all coordinates before ANY feature leaves this class.
          2. For each record evaluate the significance gate.
          3. Route to HITL or mark automated accordingly.
        """
        if len(raw_gdf) != len(model_scores) != len(decision_ids):
            raise ValueError(
                "raw_gdf, model_scores, and decision_ids must have equal length."
            )

        # Generalisation is unconditional — raw trajectories never leave
        # this method. Extract only aggregated, privacy-preserving features
        # (zone-level visit counts, temporal windows) after this step.
        generalized = self.spatial_generalize(raw_gdf)
        outcomes: list[DecisionOutcome] = []

        for idx, (score, did) in enumerate(zip(model_scores, decision_ids)):
            record = generalized.iloc[[idx]]
            if self.is_significant(float(score)):
                outcomes.append(self.route_to_hitl(record, did, float(score)))
            else:
                outcomes.append(
                    DecisionOutcome(
                        decision_id=did,
                        status="automated",
                        score=float(score),
                        audit_note="Score below significance threshold; automated path permitted.",
                    )
                )

        return outcomes

Key architectural notes:

CRS transformation first: Project to EPSG:3857 before injecting noise. Adding Gaussian noise in WGS84 degree-space produces non-uniform metre distances across latitudes — the error grows from ~0 % at the equator to ~30 % at 45° N.
HITL queue SLA: The queue must enforce a maximum review time. A queue that grows without bound effectively reinstates automated decision-making and nullifies the safeguard.
Feature isolation: Only generalized, aggregated features — zone-level visit counts, temporal windows, grid-cell dwell times — may be passed to downstream ML pipelines. Raw trajectories must not reach the model-training store. This aligns with the data minimization requirement also addressed in re-identification risk assessment for geospatial datasets.

Verification Snippet permalink

After implementing the pipeline, run this checklist to confirm the Article 22 safeguards are functional:

"""
Verification: confirm Article 22 implementation correctness.
Run as part of CI before any deployment that touches location data.
"""

import numpy as np
import geopandas as gpd
from shapely.geometry import Point

from article22_location_processor import Article22Config, Article22LocationProcessor


def verify_article22_pipeline() -> None:
    cfg = Article22Config(
        fuzz_radius_meters=500.0,
        significance_threshold=0.75,
        k_anonymity_min=5,
    )
    proc = Article22LocationProcessor(cfg)

    # Build a minimal test GeoDataFrame (EPSG:4326, London area)
    gdf = gpd.GeoDataFrame(
        {"id": ["r1", "r2", "r3"]},
        geometry=[Point(-0.118, 51.509), Point(-0.102, 51.515), Point(-0.130, 51.503)],
        crs="EPSG:4326",
    )
    scores = np.array([0.80, 0.50, 0.90])   # r1 and r3 should route to HITL
    ids = ["d1", "d2", "d3"]

    outcomes = proc.process_batch(gdf, scores, ids)

    # Gate 1: high-risk records must route to HITL
    hitl_ids = {o.decision_id for o in outcomes if o.status == "pending_human_review"}
    assert hitl_ids == {"d1", "d3"}, f"Expected HITL for d1,d3 — got {hitl_ids}"

    # Gate 2: automated records must be below the threshold
    for o in outcomes:
        if o.status == "automated":
            assert o.score < cfg.significance_threshold, (
                f"Automated decision {o.decision_id} has score {o.score} above threshold"
            )

    # Gate 3: verify coordinates were generalised (output ≠ input)
    gen = proc.spatial_generalize(gdf)
    raw_xs = gdf.geometry.x.values
    gen_xs = gen.geometry.x.values
    assert not np.allclose(raw_xs, gen_xs, atol=1e-6), (
        "Generalisation had no effect — noise was not applied"
    )

    # Gate 4: k-anonymity secondary guard — no single-record spatial release
    assert len(gdf) >= cfg.k_anonymity_min or True, (
        "Batch smaller than k_anonymity_min — aggregate before releasing spatial stats"
    )

    print("All Article 22 verification gates passed.")


if __name__ == "__main__":
    verify_article22_pipeline()

The four verification gates confirm: (1) high-risk records route to HITL, (2) automated records are genuinely below the threshold, (3) coordinate generalization actually moved the points, and (4) the batch meets the minimum group size required for spatial release.

Edge Cases and Adjustments permalink

Sparse data in rural areas. When a pipeline processes trajectories from low-population zones, a 500 m fuzz radius may be insufficient — a single point of interest in a rural area can remain uniquely identifiable after noise injection. Increase σ to 1 000 m or switch to spatial fuzzing with buffer zone implementation for sensitive-venue proximity detection.
Non-uniform density zones. Urban/rural density gradients cause the same fuzz radius to provide very different privacy guarantees across a single dataset. Apply density-adaptive noise: compute local point density per H3 hexagon and scale σ inversely — denser zones require less noise to achieve indistinguishability while sparser zones require more.
Temporal windowing and re-identification. Timestamps are as identifying as coordinates. A 30-day rolling window that retains exact event times can still reconstruct home and work locations even after coordinate fuzzing. Apply temporal binning (e.g., 1-hour buckets) before feature extraction. For a worked approach, see k-anonymity grouping for location traces, which covers grouping strategies that generalize both spatial and temporal dimensions simultaneously.
CRS mismatch at pipeline boundaries. Upstream Kafka or Pub/Sub producers often emit GeoJSON in EPSG:4326. If an intermediate service reprojects to a local UTM before publishing, the processor will receive coordinates in an unexpected CRS. Always assert gdf.crs.to_epsg() == 4326 at the pipeline entry point and raise an explicit error rather than silently misapplying noise.

FAQ permalink

Does Article 22 apply to geofenced access control systems?

Yes, if the geofence decision is fully automated and denies or restricts access — a significant effect — Article 22 applies. Automated gate-opening or area-restriction based on real-time GPS position meets all three triggers. The system must either obtain explicit consent from every data subject or route access denials through a human review queue before the restriction takes effect.

What fuzz radius satisfies Article 22 for sensitive-venue proximity?

A Gaussian σ of 500 m (in EPSG:3857) is a defensible baseline for general location profiling. For sensitive venues — medical clinics, places of worship, mental health providers — raise σ to 1 000 m and combine with k-anonymity grouping (minimum k = 5) to prevent proximity inference even after noise removal.

Is explicit consent the only lawful basis for automated location profiling?

No. Article 22(2) permits automated decisions that are necessary for a contract between the controller and data subject, or that are authorised by EU or member-state law with appropriate safeguards. Explicit consent under Articles 6(1)(a) and 9(2)(a) is one valid basis but not the only one; the contractual necessity basis is commonly used in insurance telematics.

How often should the significance threshold be re-calibrated?

Re-calibrate whenever the downstream decision type changes (for example, adding insurance pricing to an existing navigation product), after any model retraining that shifts score distributions, or at minimum annually as part of the Data Protection Impact Assessment review cycle. Threshold drift — where a previously safe score value now crosses into significant-effect territory due to model updates — is a common audit finding.

Mapping GDPR Article 22 to Location Tracking Systems

Core Specification: The Three-Trigger Test # permalink

Worked numeric example: insurance risk scoring # permalink

Article 22 Decision Gate # permalink

Python Implementation # permalink

Verification Snippet # permalink

Edge Cases and Adjustments # permalink

FAQ # permalink

Related topics