Chapter Eight

From GTFS to Map

Real cities publish their transit data in a standard format. Here's how to turn it into a map.

Everything we've built so far runs on hand-crafted data — station positions and line definitions I typed by hand. But thousands of transit agencies worldwide publish their data in a standardized format called GTFS (General Transit Feed Specification). If you can parse GTFS, you can draw a map of any city on earth.

GTFS is surprisingly simple. It's a zip file containing CSV text files. No binary formats, no proprietary schemas, no API authentication. Just comma-separated values describing stops, routes, trips, and schedules.

The rendering engine expects two things: a station map ({ id → {name, x, y} }) and an array of lines ({ id, color, stops[] }). The job of a GTFS parser is to extract exactly that.

The Files That Matter

A GTFS zip typically contains 6-12 CSV files. We only need four:

stops.txt — Where stations are

Each row is a stop (station, platform, or entrance). The columns we need:

That's the first half of the puzzle — station positions. The projectStations() function converts lat/lng coordinates into SVG pixel space using a simple linear projection. It's not cartographically perfect (it ignores Earth's curvature), but for a city-sized metro network the distortion is negligible.

Geographic vs Schematic

This projection gives a geographic layout — stations are placed where they actually are in the real world. The result is often messy: stations in the city center cluster too tightly, suburban stations spread too far. That's why most transit maps use schematic layouts.

For a dashboard or real-time tracker, geographic accuracy matters. For a wayfinding map, schematic clarity wins. The rendering engine doesn't care — it draws whatever positions you give it.

routes.txt — What lines exist

Each row is a transit line. The key columns:

route_id — unique identifier
route_short_name — display name ("L1", "Blue Line")
route_color — hex color (without #)
route_type — 0=tram, 1=metro, 2=rail, 3=bus

trips.txt — Which trips belong to which line

A trip is a single journey of a vehicle along a route. Many trips exist per route (every departure is a trip). We need trips to connect routes to their stop sequences. Key columns:

trip_id — unique trip identifier
route_id — links to routes.txt
direction_id — 0 or 1 (outbound/inbound)

stop_times.txt — Which stops are on which trip, in what order

This is the big file — every stop on every trip. For a metro system it can have hundreds of thousands of rows. Key columns:

trip_id — links to trips.txt
stop_id — links to stops.txt
stop_sequence — order within the trip (1, 2, 3...)

Putting It Together

The chain: stop_times → trips → routes. Pick one trip per route (they all visit the same stops in the same order), extract the stop sequence, and you have the line's station list.

From CSV text to rendered map. The gtfsToMapData() function does the entire conversion: parse four CSV files, project coordinates, extract stop sequences, and output the exact { S, lines } structure the rendering engine expects.

The rendering code below the parser is character-for-character identical to Chapter 7. The engine doesn't know its data came from GTFS. It just sees stations and lines.

The Parser Is 30 Lines

The entire gtfsToMapData() function — from raw CSV to render-ready data — is about 30 lines. That's it. The GTFS format is simple enough that a production parser isn't much more complex than this demo. The hard work was done by the people who designed the GTFS standard.

The Pipeline

Here's the complete data flow:

stops.txt → { stop_id → { name, lat, lng } } → project to pixel space → { stop_id → { name, x, y } }

routes.txt → { route_id → { name, color } }

trips.txt → pick one representative trip per route/direction

stop_times.txt → for each representative trip, extract ordered stop IDs → stops[]

The output is the same two-object structure we've been using since Chapter 7: stations with positions, and lines with stop sequences and colors. The rendering engine is a pure function of this data. It makes the same decisions regardless of whether the data was hand-crafted or parsed from GTFS.

Real-World Considerations

Picking a Representative Trip

A route can have hundreds of trips (every departure is a trip). Most trips visit the same stations in the same order — the route's pattern. Some trips are shortened (peak-hour expresses, early terminations). We pick the trip with the most stops as the representative for each route and direction. That gives us the full line, not a shortened variant.

Direction Matters

GTFS has direction_id (0 or 1) in trips.txt. For a simple map, we only need one direction — the stop sequence is the same, just reversed. For a real-time tracker showing train direction, you'd want both.

Geographic Projection

This linear projection (lat → y, lng → x) works for metro-scale networks (a few tens of kilometers). For regional rail spanning hundreds of kilometers, you'd want a proper Mercator or UTM projection. Libraries like proj4js handle this if needed.

Finding GTFS Feeds

Over 2,500 transit agencies worldwide publish GTFS data. Resources to find them:

Mobility Database (mobilitydatabase.org) — the most comprehensive catalog, 6000+ feeds across 75 countries.

Transitland (transit.land) — community-curated feeds with an API for querying stops, routes, and schedules.

National Access Points — many countries operate national portals. The EU requires member states to publish transit data through National Access Points.

Any City, Any Map

Download a GTFS zip. Unzip it. Feed the CSVs to the parser. Feed the output to the renderer. You have a transit map.

Tokyo. New York. Berlin. São Paulo. Melbourne. The GTFS standard means the same parser works for all of them. The visual result depends on the network's shape, not on the code. The code is done — it's been done since Chapter 5.

The rendering engine now has a complete data pipeline: GTFS files in, transit map out. In the next chapter, we make it beautiful.

luisnomad.com

Luis Serrano 2026