Carpool Insight - Methodology

Overview

Carpool Insight identifies business clusters in the Oslo/Akershus region where public transit is inadequate compared to driving, making them ideal targets for carpool initiatives.

The analysis uses a sequential filtering approach:

Phase 1: Find all business clusters (destination-first mapping)
Phase 2: Assess public transit quality for each cluster
Phase 3: Analyze residential catchment areas for top targets

1 Business Clustering

The first phase identifies geographic clusters of businesses in the Oslo/Akershus region.

Data Filtering

Geography: Oslo (0301) and Akershus municipalities (32xx series)
Employee count: 20-200 employees per business unit

Entity Type Handling

Norwegian businesses register two types of entities:

Underenheter (sub-units): Physical operating locations (stores, offices, factories) for companies with multiple sites. These have beliggenhetsadresse (location address) - the actual work location.
Hovedenheter (parent entities): Legal/registered company records. May only have forretningsadresse (business address) which isn't necessarily an operational site.

To accurately map work locations and avoid duplicates:

Include all underenheter - actual physical locations
Include hovedenheter without sub-units - single-location companies
Exclude hovedenheter that have sub-units - prevents duplicates

Geocoding

Business addresses are geocoded using the Kartverket API. Results are cached locally to avoid redundant API calls.

Spatial Clustering (DBSCAN)

Businesses are grouped into clusters using the DBSCAN algorithm:

DBSCAN Parameters

eps: 500 meters (maximum distance between businesses in a cluster)
min_samples: 2 (minimum businesses to form a cluster)
metric: Haversine (accounts for Earth's curvature)

Super-Clusters

Nearby clusters are grouped into super-clusters representing shared commuter sheds. People traveling to the same general area can share corridors regardless of specific employer.

Super-Cluster Parameters

radius: 1500 meters (maximum distance between cluster centroids)
method: Single-linkage clustering on centroids
naming: Auto-generated as "[Place]-omradet" based on largest cluster

Output

Each cluster contains:

Centroid point (geographic center)
Boundary polygon (convex hull of member businesses)
Total employee count and business count
Auto-generated name based on street/location
super_cluster_id and super_cluster_name for grouping

2 PT Disadvantage Assessment

Each cluster is scored on how poorly served it is by public transit. Higher score = worse PT = better carpool target.

Four Metrics

Metric	Weight	What is measured
Competitiveness	50%	PT time / driving time ratio
Access	30%	Meters to nearest frequent stop
Transfer	10%	Average number of transfers (12 origins)
Frequency	10%	Departures/hour (4-hour window)

Reference time: Tuesday 07:30

1. Competitiveness Score (50% weight)

How does PT travel time compare to driving?

Tests 12 representative origins across all corridors:

North: Jessheim, Eidsvoll (rail)
East: Lillestrom (rail)
South: Ski, Vestby (rail), Son, Drobak (bus)
West: Sandvika (rail), Royken, Klokkarstua (bus)
Oslo: Grefsen, Roa (metro)

Compares PT journey time vs driving time for each
Score 100 = PT same speed as driving (ratio 1.0)
Score 0 = PT 3x slower than driving (ratio >= 3.0)

time_ratio = avg_pt_time / avg_drive_time competitiveness_score = max(0, 100 - (time_ratio - 1.0) * 50)

2. Access Score (30% weight)

How easy is it to reach a transit stop from the cluster?

Measures distance to nearest stop with >= 4 departures/hour
Score 100 = stop within 200m
Score 0 = no frequent stop within 1500m

access_score = max(0, 100 - (distance_meters / 15))

3. Transfer Score (10% weight)

How many transfers are needed on the PT journey?

Calculated from average transfers across all 12 origin journeys
Score 100 = 0 transfers (direct connection)
Score 70 = 1 transfer
Score 40 = 2 transfers
Score 0 = 3+ transfers

4. Frequency Score (10% weight)

How often do departures occur from nearby stops?

Uses 4-hour window (05:30-09:30) for robust average
Score 100 = >= 8 departures/hour (metro/train level)
Score 50-100 = 4-8 departures/hour (good bus service)
Score 0-50 = 1-4 departures/hour (sparse service)

Combined PT Disadvantage Score

pt_quality = (competitiveness * 0.50) + (access * 0.30) + (transfer * 0.10) + (frequency * 0.10) pt_disadvantage_score = 100 - pt_quality

Target Selection Threshold

Only clusters with PT disadvantage >= 40 are shown in the Top 100 Targets list. This threshold filters out well-served downtown areas that would otherwise rank highly due to employee count alone.

Carpool Potential Ranking

Clusters are ranked by carpool potential, which balances PT disadvantage with cluster size:

carpool_potential = log(employees) * pt_disadvantage

Using logarithm reduces the impact of employee count differences, making PT disadvantage the primary ranking factor while still giving weight to cluster size.

3 Catchment Analysis

For super-clusters with high PT disadvantage, the system analyzes residential catchment areas to understand where potential carpoolers live.

Four Views

Targets Clusters Catchment Parkering

Targets Tab

Individual business clusters ranked by carpool potential. Shows employee count, PT disadvantage score, and super-cluster membership.

Clusters Tab

Super-cluster groupings showing combined boundary polygons. Only displays super-clusters with:

2+ member clusters - single clusters are already in Targets
All members have PT disadvantage >= 50 - focuses on high-value targets

Catchment Tab

Driving isochrones and residential population analysis for each super-cluster:

Isochrone rings: 15, 30, and 45 minute driving contours
Wedge corridors: Population breakdown by road direction
Visual display: Darker blue = closer (15 min), lighter = further (45 min)

Isochrone Generation

Driving isochrones are fetched from the Mapbox Isochrone API for each super-cluster centroid:

15 minutes: Immediate catchment (darkest blue)
30 minutes: Primary catchment
45 minutes: Extended catchment (lightest blue)

Isochrones are pre-computed and cached to avoid API calls on each page load.

Wedge Corridor Analysis

The 30-minute isochrone is divided into directional wedges (N, NE, E, SE, S, SW, W, NW) to identify which roads carry the most potential carpoolers:

Divide isochrone into 8 compass direction wedges
Intersect each wedge with SSB population grid data
Sum residential population within each wedge
Identify the primary road in each direction (E6, E18, Rv4, etc.)

Catchment Output

For each super-cluster, the Catchment tab displays:

Total catchment population within 30-minute drive
Population bars showing breakdown by road/direction
Percentage of total population for each corridor

Clicking a catchment item shows the isochrone rings on the map and highlights only the member cluster markers (hiding others for clarity).

P Parking Data Layer

An independent parking data layer provides context on parking availability across the region. Parking data is displayed as a standalone map layer and sidebar tab — it does not affect PT disadvantage scoring or carpool potential rankings.

Why Parking Matters

Parking availability is a key contextual factor for carpool analysis:

Ample free parking: People already drive — high carpool potential
Expensive/limited parking: People already use PT — lower carpool potential
Park & Ride (innfartsparkering): Facilities that both compete with and complement carpooling

Data Source

Parking data comes from Statens vegvesens Parkeringsregister, an open API providing all registered public parking areas in Norway. Areas are filtered to the Oslo/Akershus bounding box.

Data Limitations

Only public/regulated parking — private workplace parking is not included
No exact pricing — only whether spaces are paid or free (count of each)
Some areas lack type classification (shown as "Annet")

Parking Categories

Category	Description
Innfartsparkering (P+R)	Park & Ride facilities near transit stations
Parkeringshus	Multi-story parking garages
Avgrenset område	Dedicated surface parking lots
Langs kjørebane	On-street parking

Map Display

Parking areas are shown as a GeoJSON layer with clustering at low zoom levels. Markers are color-coded by type, with Park & Ride facilities given a larger, distinct marker. A toggle button (P) in the map controls allows showing/hiding the parking layer independently of the active sidebar tab.

Parkering Tab

The sidebar tab displays a summary of all parking areas grouped by category, with P+R facilities listed first (most relevant for carpool analysis). Each entry shows name, capacity, and type. Clicking an item flies the map to that location.

Capacity details include: paid spaces, free spaces, EV charging spaces, and disabled-accessible spaces.

Data Sources

🏢

Bronnoyundregistrene (BRREG)

Norwegian business registry - provides company data including employee counts, addresses, and industry codes (NACE).

Local CSV enheter.csv.gz underenheter.csv.gz

📍

Kartverket

Norwegian mapping authority - geocoding service converts addresses to coordinates.

API ws.geonorge.no/adresser/v1/sok

🚌

Entur

National journey planner - provides PT routing, stop locations, and departure frequencies.

GraphQL API api.entur.io/journey-planner/v3

🚗

Mapbox

Mapping platform - provides driving time calculations and isochrone generation.

API Directions API Isochrone API

👥

SSB (Statistics Norway)

Population statistics - provides residential population grid data for catchment analysis.

API data.ssb.no

🗺️

Ruter GTFS

Transit feed specification - bus, tram, metro, and ferry stop locations, routes, and schedules.

GTFS ZIP rb_rut-aggregated-gtfs.zip

🚆

Vy GTFS

Regional and intercity rail schedules - train stop locations and departure frequencies.

GTFS ZIP rb_vy-aggregated-gtfs.zip

🅿️

Statens vegvesen Parkeringsregister

National parking registry - public parking areas including capacity, type, pricing (paid/free), and Park & Ride classification.

API parkreg-open.atlas.vegvesen.no