Building a Privacy Risk Matrix for Municipal GIS

Q: Should the Likelihood axis use WGS84 or a projected CRS when evaluating coordinate precision?

Evaluate coordinate precision in a projected CRS (e.g., UTM zone or a local state-plane system) where distances are in metres, not decimal degrees. A parcel centroid at 1 m precision in UTM is far more linkable than the same point expressed at six decimal places in WGS84, and your scoring criteria should reflect real-world ground resolution, not angular precision.

A privacy risk matrix for municipal GIS assigns each spatial dataset a composite score derived from re-identification likelihood and harm impact, then maps that score to one of four actionable tiers that dictate anonymization requirements, access controls, and publishing workflows.

Core Calculation: Likelihood × Impact permalink

The matrix operates on two orthogonal axes. Both are scored 1–5 by a cross-functional review team using the criteria below. The composite risk score is their product, optionally scaled by a sensitivity multiplier when the data intersects regulated categories.

\text{Risk Score} = L \times (I \times m)

where:

Symbol	Meaning	Range
$$L$$	Likelihood of re-identification	1 (near-impossible) – 5 (trivial)
$$I$$	Impact of harm if re-identified	1 (negligible) – 5 (critical)
$$m$$	Sensitivity multiplier	1.0 (default) – 1.5 (health / safety / financial data at precise coordinates)

Scoring the Likelihood axis. Spatial re-identification risk is driven by coordinate precision, attribute uniqueness, and linkage potential. Score L using:

Coordinate precision: Block-group centroids → L+0; parcel centroids → L+1; sub-metre GPS → L+2
Attribute uniqueness: Common codes (L+0) vs. rare permit types or sub-5-count incident categories (L+2)
Linkage potential: No overlap with public registries (L+0); joinable to voter rolls, property-tax rolls, or building-permit databases (L+1 to +2)
Temporal granularity: Annual static snapshots (L+0); real-time sensor feeds (L+2)

Scoring the Impact axis. Impact reflects the severity of harm if an individual or sensitive facility is identified from the data:

1 (Negligible): Street centrelines, public park boundaries, topographic contours
2–3 (Moderate): Utility outage zones, aggregated demographic summaries, traffic-count stations
4 (High): Property-tax delinquency, code-enforcement complaints, business-licensing locations
5 (Critical): Public-health incident locations, domestic-violence shelter proximity, juvenile-facility service areas, critical-infrastructure vulnerability points

Applying the sensitivity multiplier. When a dataset carries health, safety, or financial attributes and is spatially resolved to parcel or sub-parcel precision, apply $$m = 1.5$$. This prevents a moderate-looking baseline score from masking regulatory-grade risk. For example, a code-enforcement complaint layer scoring $$L=4, I=3$$ yields a baseline of 12 (Medium); with the multiplier: $$4 \times (3 \times 1.5) = 18$$ (High), triggering mandatory legal sign-off.

Risk tier table permalink

Tier	Score Range	Required Controls	Publishing Workflow
Low	1–4	Standard metadata, open access	Direct publication
Medium	5–12	Attribute suppression, coordinate rounding	Internal review required
High	13–19	Aggregation to census blocks, role-based access	Legal / privacy sign-off
Critical	20–25	Full anonymization or restricted internal use only	Executive approval + audit log

Worked numeric example permalink

A dataset of juvenile-probation service-area polygons in UTM Zone 18N (EPSG:32618) presents as:

$L = 4$$ (parcel-precision boundaries joinable to address registries)$
$m = 1.5$$ (health and safety data at precise coordinates)$

The score exceeds 25 (the theoretical maximum at $$m=1.0$$) because the multiplier can push high-severity cases past the ceiling. Clamp to Critical and apply the full restricted-access workflow. Document the multiplier rationale explicitly in the audit record.

Python Implementation permalink

The function below integrates into any geopandas/pandas ETL pipeline. It accepts per-row metadata from a GIS layer inventory and returns the assigned tier. Project coordinates should be in a metric CRS (e.g., EPSG:32618 for UTM Zone 18N) before evaluating coordinate precision in the Likelihood scoring step upstream.

from __future__ import annotations

import pandas as pd
import geopandas as gpd


def calculate_risk_tier(
    likelihood: int,
    impact: int,
    impact_multiplier: float = 1.0,
) -> str:
    """
    Assign a privacy risk tier for a municipal GIS dataset.

    The composite score is: likelihood × (impact × impact_multiplier).
    Scores above 25 (possible when multiplier > 1.0) are clamped to Critical.

    Args:
        likelihood: Integer 1–5 measuring re-identification ease.
            1 = near-impossible (aggregate, no linkage potential).
            5 = trivial (sub-metre GPS joinable to public registry).
        impact: Integer 1–5 measuring severity of harm if re-identified.
            1 = negligible public infrastructure.
            5 = critical: health, safety, or juvenile records.
        impact_multiplier: Sensitivity uplift when health/safety/financial
            attributes intersect parcel-precision coordinates. Typical
            values: 1.0 (default) or 1.5 (regulated categories).

    Returns:
        One of "Low", "Medium", "High", or "Critical".

    Raises:
        ValueError: If likelihood or impact are outside 1–5.
    """
    if not (1 <= likelihood <= 5):
        raise ValueError(f"likelihood must be 1–5, got {likelihood}")
    if not (1 <= impact <= 5):
        raise ValueError(f"impact must be 1–5, got {impact}")
    if impact_multiplier < 1.0:
        raise ValueError("impact_multiplier must be ≥ 1.0")

    adjusted_impact = impact * impact_multiplier
    score = likelihood * adjusted_impact

    if score <= 4:
        return "Low"
    elif score <= 12:
        return "Medium"
    elif score <= 19:
        return "High"
    else:
        return "Critical"


def score_gis_inventory(inventory: pd.DataFrame) -> pd.DataFrame:
    """
    Apply risk scoring to a GIS layer inventory DataFrame.

    Expected columns:
        layer_name (str): Human-readable dataset identifier.
        crs_epsg (int): EPSG code confirming metric projection for
            coordinate-precision assessment (e.g., 32618 for UTM 18N).
        likelihood_score (int): Pre-assigned L value 1–5.
        impact_score (int): Pre-assigned I value 1–5.
        impact_multiplier (float): Sensitivity multiplier; defaults to 1.0.

    Returns:
        DataFrame with added columns: composite_score (float) and
        risk_tier (str).
    """
    # Validate CRS is metric — WGS84 (EPSG:4326) coordinates in decimal
    # degrees produce misleadingly small numeric values that distort
    # precision assessments; require a projected CRS.
    geographic_crs = {4326, 4269, 4258}
    flagged = inventory[inventory["crs_epsg"].isin(geographic_crs)]
    if not flagged.empty:
        raise ValueError(
            f"Layers {flagged['layer_name'].tolist()} use a geographic CRS. "
            "Reproject to a metric projected CRS (e.g., UTM) before scoring."
        )

    df = inventory.copy()
    multiplier = df.get("impact_multiplier", pd.Series(1.0, index=df.index))
    df["composite_score"] = df["likelihood_score"] * (
        df["impact_score"] * multiplier.fillna(1.0)
    )
    df["risk_tier"] = df.apply(
        lambda row: calculate_risk_tier(
            row["likelihood_score"],
            row["impact_score"],
            row.get("impact_multiplier", 1.0),
        ),
        axis=1,
    )
    return df


# ---------------------------------------------------------------------------
# Example usage
# ---------------------------------------------------------------------------
# sample_inventory = pd.DataFrame([
#     {
#         "layer_name": "Street_Centrelines",
#         "crs_epsg": 32618,          # UTM Zone 18N — metric
#         "likelihood_score": 1,
#         "impact_score": 1,
#         "impact_multiplier": 1.0,
#     },
#     {
#         "layer_name": "CodeEnforcement_Complaints",
#         "crs_epsg": 32618,
#         "likelihood_score": 4,
#         "impact_score": 3,
#         "impact_multiplier": 1.5,   # parcel-precision + sensitive attributes
#     },
#     {
#         "layer_name": "JuvenileProbation_ServiceAreas",
#         "crs_epsg": 32618,
#         "likelihood_score": 4,
#         "impact_score": 5,
#         "impact_multiplier": 1.5,
#     },
# ])
# result = score_gis_inventory(sample_inventory)
# print(result[["layer_name", "composite_score", "risk_tier"]])
# #   layer_name                  composite_score  risk_tier
# #   Street_Centrelines                      1.0  Low
# #   CodeEnforcement_Complaints             18.0  High
# #   JuvenileProbation_ServiceAreas         30.0  Critical

Verification Checklist permalink

After running score_gis_inventory, confirm the output meets expectations before the layer proceeds to publication:

def verify_risk_scores(scored: pd.DataFrame) -> list[str]:
    """
    Return a list of audit failures for a scored GIS inventory.

    Checks:
    - No Critical or High layer has impact_multiplier < 1.0.
    - No layer with impact_score == 5 is assigned Low or Medium.
    - Every row has a non-null risk_tier.
    """
    failures: list[str] = []

    null_tiers = scored[scored["risk_tier"].isna()]
    if not null_tiers.empty:
        failures.append(
            f"NULL risk_tier on: {null_tiers['layer_name'].tolist()}"
        )

    # Critical-impact layers must never score below High
    high_impact = scored[scored["impact_score"] == 5]
    under_scored = high_impact[high_impact["risk_tier"].isin(["Low", "Medium"])]
    if not under_scored.empty:
        failures.append(
            f"Impact=5 layers scored below High: "
            f"{under_scored['layer_name'].tolist()}"
        )

    # High/Critical tiers should carry the multiplier if health/safety data
    regulated_tiers = scored[scored["risk_tier"].isin(["High", "Critical"])]
    missing_multiplier = regulated_tiers[
        regulated_tiers.get("impact_multiplier", pd.Series(1.0)) < 1.5
    ]
    if not missing_multiplier.empty:
        failures.append(
            f"High/Critical tiers without sensitivity multiplier "
            f"(review needed): {missing_multiplier['layer_name'].tolist()}"
        )

    return failures

Run verify_risk_scores(result) and assert an empty list in CI. Any returned strings must be resolved — either by adding the multiplier or by documenting a signed justification for why the lower multiplier was intentional.

Edge Cases and Adjustments permalink

Legacy schema drift. Older shapefiles or CAD exports often contain unredacted PII fields (owner names, contact phone numbers) that enterprise GIS strips automatically. Score these layers’ Likelihood at least one step higher until a manual field audit confirms clean attributes; the CRS is frequently EPSG:4326 (geographic), which must be reprojected to a metric system before precision scoring.
Sparse-data zones. Rural datasets may have fewer than five records per spatial unit, defeating k-anonymity grouping at any practical aggregation level. In these cases, shift the anonymization control from aggregation to coordinate jittering with bounded noise or synthetic record substitution.
FOIA / public-records overlap. A multiplier-elevated High or Critical score does not exempt a dataset from sunshine-law disclosure obligations. The correct response is aggregation (hexagonal binning, census-block rollup) rather than suppression; document the aggregation method and original score to show proportional treatment during an audit.
Cross-agency sharing. Data moving from a planning department to a public-health agency inherits new regulatory thresholds. Re-score the layer under the receiving agency’s sensitivity criteria; a utility-outage zone (Medium in planning) can become High when linked to health-vulnerability registers in a public-health workflow.

Frequently Asked Questions permalink

What score threshold triggers mandatory legal sign-off in a municipal GIS context?

A composite score ≥ 13 (High tier) requires documented legal or privacy officer sign-off before publication. Scores ≥ 20 (Critical tier) require executive approval and full audit logging; these datasets must either undergo differential-privacy anonymization or remain restricted to authenticated internal sessions.

How does the sensitivity multiplier interact with FOIA public-records obligations?

A multiplier-elevated score does not grant exemption from sunshine-law disclosure; it signals that the dataset must be aggregated before release rather than suppressed entirely. Document the aggregation method and the original pre-aggregation score in the compliance record to demonstrate proportional treatment to auditors.

Should the Likelihood axis use WGS84 or a projected CRS when evaluating coordinate precision?

Evaluate coordinate precision in a projected CRS — UTM zone or a local state-plane system — where distances are in metres. A parcel centroid at 1 m precision in UTM is far more linkable than the same point at six decimal places in WGS84 (EPSG:4326). Your Likelihood scoring criteria should reflect real-world ground resolution, not angular precision; the score_gis_inventory function enforces this by rejecting geographic CRS inputs.

Can the same Python function handle both point and polygon datasets?

Yes. The risk-scoring function operates on attribute-level metadata, not geometry type. A polygon layer for code-enforcement complaint zones scores identically to point-feature data with the same attribute sensitivity. Geometry type becomes relevant only when selecting the anonymization control: polygons may require boundary generalisation or spatial fuzzing via buffer zones rather than coordinate jittering.

← Back to Privacy Risk Scoring Frameworks for GIS

Building a Privacy Risk Matrix for Municipal GIS

Core Calculation: Likelihood × Impact # permalink

Risk tier table # permalink

Worked numeric example # permalink

Python Implementation # permalink

Verification Checklist # permalink

Edge Cases and Adjustments # permalink

Frequently Asked Questions # permalink

Related # permalink

Related topics