Everything we've built so far runs on hand-crafted data — station positions and line definitions we typed
ourselves. But thousands of transit agencies worldwide publish their data in a standardized format called
GTFS (General Transit Feed Specification). If you can parse GTFS, you can draw a map of any
city on earth.
GTFS is surprisingly simple. It's a zip file containing CSV text files. No binary formats, no proprietary
schemas, no API authentication. Just comma-separated values describing stops, routes, trips, and schedules.
Our rendering engine expects two things: a station map ({ id → {name, x, y} }) and an array of
lines ({ id, color, stops[] }). The job of a GTFS parser is to extract exactly that.
The Files That Matter
A GTFS zip typically contains 6-12 CSV files. We only need four:
stops.txt — Where stations are
Each row is a stop (station, platform, or entrance). The columns we need:
That's the first half of the puzzle — station positions. The projectStations() function converts
lat/lng coordinates into SVG pixel space using a simple linear projection. It's not cartographically perfect (it
ignores Earth's curvature), but for a city-sized metro network the distortion is negligible.
Geographic vs Schematic
This projection gives a geographic layout — stations are placed where they actually are in
the real world. The result is often messy: stations in the city center cluster too tightly, suburban stations
spread too far. That's why most transit maps use schematic layouts.
For a dashboard or real-time tracker, geographic accuracy matters. For a wayfinding map, schematic clarity
wins. The rendering engine doesn't care — it draws whatever positions you give it.
routes.txt — What lines exist
Each row is a transit line. The key columns:
route_id — unique identifier
route_short_name — display name ("L1", "Blue Line")
route_color — hex color (without #)
route_type — 0=tram, 1=metro, 2=rail, 3=bus
trips.txt — Which trips belong to which line
A trip is a single journey of a vehicle along a route. Many trips exist per route (every departure is a trip).
We need trips to connect routes to their stop sequences. Key columns:
trip_id — unique trip identifier
route_id — links to routes.txt
direction_id — 0 or 1 (outbound/inbound)
stop_times.txt — Which stops are on which trip, in what order
This is the big file — every stop on every trip. For a metro system it can have hundreds of thousands of rows.
Key columns:
trip_id — links to trips.txt
stop_id — links to stops.txt
stop_sequence — order within the trip (1, 2, 3...)
Putting It Together
The chain: stop_times → trips → routes. Pick one trip per route (they all visit the same stops
in the same order), extract the stop sequence, and you have the line's station list.
From CSV text to rendered map. The gtfsToMapData() function does the entire conversion: parse four
CSV files, project coordinates, extract stop sequences, and output the exact { S, lines } structure
our rendering engine expects.
The rendering code below the parser is character-for-character identical to Chapter 7. The
engine doesn't know its data came from GTFS. It just sees stations and lines.
The Parser Is 30 Lines
The entire gtfsToMapData() function — from raw CSV to render-ready data — is about 30 lines.
That's it. The GTFS format is simple enough that a production parser isn't much more complex than this demo.
The hard work was done by the people who designed the GTFS standard.
The Pipeline
Here's the complete data flow:
stops.txt → { stop_id → { name, lat, lng } } → project to pixel space →
{ stop_id → { name, x, y } }
routes.txt → { route_id → { name, color } }
trips.txt → pick one representative trip per route/direction
stop_times.txt → for each representative trip, extract ordered stop IDs → stops[]
The output is the same two-object structure we've been using since Chapter 7: stations with positions, and lines
with stop sequences and colors. The rendering engine is a pure function of this data. It makes
the same decisions regardless of whether the data was hand-crafted or parsed from GTFS.
Real-World Considerations
Picking a Representative Trip
A route can have hundreds of trips (every departure is a trip). Most trips visit the same stations in the same
order — the route's pattern. Some trips are shortened (peak-hour expresses, early terminations). We pick the
trip with the most stops as the representative for each route and direction. That gives us the full
line, not a shortened variant.
Direction Matters
GTFS has direction_id (0 or 1) in trips.txt. For a simple map, we only need one direction — the
stop sequence is the same, just reversed. For a real-time tracker showing train direction, you'd want both.
Geographic Projection
Our linear projection (lat → y, lng → x) works for metro-scale networks (a few tens of kilometers).
For regional rail spanning hundreds of kilometers, you'd want a proper Mercator or UTM projection. Libraries
like proj4js handle this if needed.
Finding GTFS Feeds
Over 2,500 transit agencies worldwide publish GTFS data. Resources to find them:
Mobility Database (mobilitydatabase.org) — the most comprehensive catalog, 6000+ feeds across
75 countries.
Transitland (transit.land) — community-curated feeds with an API for querying stops, routes,
and schedules.
National Access Points — many countries operate national portals. The EU requires member states
to publish transit data through National Access Points.
Any City, Any Map
Download a GTFS zip. Unzip it. Feed the CSVs to the parser. Feed the output to the renderer. You have a
transit map.
Tokyo. New York. Berlin. São Paulo. Melbourne. The GTFS standard means the same parser works for all of them.
The visual result depends on the network's shape, not on the code. The code is done — it's been done since
Chapter 5.
Our rendering engine now has a complete data pipeline: GTFS files in, transit map out. In the next chapter, we
make it beautiful.