Python: Basic usage of r5py#

Credits:

This tutorial was written by Henrikki Tenkanen, Christoph Fink & Willem Klumpenhouwer (i.e. r5py developer team). You can read the full documentation of r5py which includes much more information and detailed user manual in case you are interested in using the library for research purposes.

Getting started#

There are basically three options to run the codes in this tutorial:

  1. Copy-Paste the codes from the website and run the codes line-by-line on your own computer with your preferred IDE (Jupyter Lab, Spyder, PyCharm etc.).

  2. Download this Notebook (see below) and run it using Jupyter Lab which you should have installed by following the installation instructions.

  3. Run the codes using Binder (see below) which is the easiest way, but has very limited computational resources (i.e. can be very slow).

Download the Notebook#

You can download this tutorial Notebook to your own computer by clicking the Download button from the Menu on the top-right section of the website.

  • Right-click the option that says .ipynb and choose “Save link as ..”

Download tutorial Notebook

Run the codes on your own computer#

Before you can run this Notebook, and/or do any programming, you need to launch the Jupyter Lab programming environment. The JupyterLab comes with the environment that you installed earlier (if you have not done this yet, follow the installation instructions). To run the JupyterLab:

  1. Using terminal/command prompt, navigate to the folder where you have downloaded the Jupyter Notebook tutorial: $ cd /mydirectory/

  2. Activate the programming environment: $ conda activate geo

  3. Launch the JupyterLab: $ jupyter lab

After these steps, the JupyterLab interface should open, and you can start executing cells (see hints below at “Working with Jupyter Notebooks”).

Alternatively: Run codes in Binder (with limited resources)#

Alternatively (not recommended due to limited computational resources), you can run this Notebook by launching a Binder instance. You can find buttons for activating the python environment at the top-right of this page which look like this:

Launch Binder

Working with Jupyter Notebooks#

Jupyter Notebooks are documents that can be used and run inside the JupyterLab programming environment containing the computer code and rich text elements (such as text, figures, tables and links).

A couple of hints:

  • You can execute a cell by clicking a given cell that you want to run and pressing Shift + Enter (or by clicking the “Play” button on top)

  • You can change the cell-type between Markdown (for writing text) and Code (for writing/executing code) from the dropdown menu above.

See further details and help for using Notebooks and JupyterLab from here.

Introduction#

R5py is a Python library for routing and calculating travel time matrices on multimodal transport networks (walk, bike, public transport and car). It provides a simple and friendly interface to R5 (the Rapid Realistic Routing on Real-world and Reimagined networks) which is a routing engine developed by Conveyal. R5py is designed to interact with GeoPandas GeoDataFrames, and it is inspired by r5r which is a similar wrapper developed for R. R5py exposes some of R5’s functionality via its Python API, in a syntax similar to r5r’s. At the time of this writing, only the computation of travel time matrices has been fully implemented. Over time, r5py will be expanded to incorporate other functionalities from R5.

Data requirements#

Data for creating a routable network#

When calculating travel times with r5py, you typically need a couple of datasets:

  • A road network dataset from OpenStreetMap (OSM) in Protocolbuffer Binary (.pbf) -format:

    • This data is used for finding the fastest routes and calculating the travel times based on walking, cycling and driving. In addition, this data is used for walking/cycling legs between stops when routing with transit.

    • Hint: Sometimes you might need modify the OSM data beforehand, e.g. by cropping the data or adding special costs for travelling (e.g. for considering slope when cycling/walking). When doing this, you should follow the instructions at Conveyal website. For adding customized costs for pedestrian and cycling analyses, see this repository.

  • A transit schedule dataset in General Transit Feed Specification (GTFS.zip) -format (optional):

    • This data contains all the necessary information for calculating travel times based on public transport, such as stops, routes, trips and the schedules when the vehicles are passing a specific stop. You can read about GTFS standard from here.

    • Hint: r5py can also combine multiple GTFS files, as sometimes you might have different GTFS feeds representing e.g. the bus and metro connections.

Data for origin and destination locations#

In addition to OSM and GTFS datasets, you need data that represents the origin and destination locations (OD-data) for routings. This data is typically stored in one of the geospatial data formats, such as Shapefile, GeoJSON or GeoPackage. As r5py is build on top of geopandas, it is easy to read OD-data from various different data formats.

Where to get these datasets?#

Here are a few places from where you can download the datasets for creating the routable network:

  • OpenStreetMap data in PBF-format:

    • pyrosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).

    • pydriosm -library. Allows downloading data directly from Python (based on GeoFabrik and BBBike).

    • GeoFabrik -website. Has data extracts for many pre-defined areas (countries, regions, etc).

    • BBBike -website. Has data extracts readily available for many cities across the world. Also supports downloading data by specifying your own area or interest.

    • Protomaps -website. Allows to download the data with custom extent by specifying your own area of interest.

  • GTFS data:

    • Transitfeeds -website. Easy to navigate and find GTFS data for different countries and cities. Includes current and historical GTFS data. Notice: The site will be depracated in the future.

    • Mobility Database -website. Will eventually replace TransitFeeds -website.

    • Transitland -website. Find data based on country, operator or feed name. Includes current and historical GTFS data.

Sample datasets#

In the following tutorial, we use various open source datasets:

  • The point dataset for Helsinki has been obtained from Helsinki Region Environmental Services (HSY) licensed under a Creative Commons By Attribution 4.0.

  • The street network for Helsinki is a cropped and filtered extract of OpenStreetMap (© OpenStreetMap contributors, ODbL license)

  • The GTFS transport schedule dataset for Helsinki is a cropped and minimised copy of Helsingin seudun liikenne’s (HSL) open dataset Creative Commons BY 4.0.

Installation#

Before you can start using r5py, you need install it and a few libraries. Check installation instructions for more details.

Getting started with r5py#

In this tutorial, we will learn how to calculate travel times with r5py between locations spread around the city center area of Helsinki, Finland.

Load and prepare the origin and destination data#

Let’s start by downloading a sample dataset into a geopandas GeoDataFrame that we can use as our destination locations. To make testing the library easier, we have prepared a helper r5py.sampledata.helsinki which can be used to easily download the sample data sets for Helsinki (including population grid, GTFS data and OSM data). The population grid data covers the city center area of Helsinki and contains information about residents of each 250 meter cell:

import geopandas as gpd
import osmnx as ox
import r5py.sampledata.helsinki

pop_grid_fp = r5py.sampledata.helsinki.population_grid
pop_grid = gpd.read_file(pop_grid_fp)
pop_grid.head()
id population geometry
0 0 389 POLYGON ((24.90545 60.16086, 24.90545 60.16311...
1 1 296 POLYGON ((24.90546 60.15862, 24.90545 60.16086...
2 2 636 POLYGON ((24.90547 60.15638, 24.90546 60.15862...
3 3 1476 POLYGON ((24.90547 60.15413, 24.90547 60.15638...
4 4 23 POLYGON ((24.90994 60.16535, 24.90994 60.16760...

The pop_grid GeoDataFrame contains a few columns, namely id, population and geometry. The id column with unique values and geometry columns are required for r5py to work. If your input dataset does not have an id column with unique values, r5py will throw an error.

To get a better sense of the data, let’s create a map that shows the locations of the polygons and visualise the number of people living in each cell:

pop_grid.explore("population", cmap="Reds")
Make this Notebook Trusted to load map: File -> Trust Notebook

Convert polygon layer to points#

Lastly, we need to convert the Polygons into points because r5py expects that the input data is represented as points. We can do this by making a copy of our grid and calculating the centroid of the Polygons.

Note: You can ignore the UserWarning raised by geopandas about the geographic CRS. The location of the centroid is accurate enough for most purposes.

# Convert polygons into points
points = pop_grid.copy()
points["geometry"] = points.centroid
points.explore(max_zoom=13, color="red")
/tmp/ipykernel_38782/415363510.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  points["geometry"] = points.centroid
Make this Notebook Trusted to load map: File -> Trust Notebook

Retrieve the origin location by geocoding an address#

Let’s geocode an address for Helsinki Railway Station into a GeoDataFrame using osmnx and use that as our origin location:

from shapely.geometry import Point 

address = "Railway station, Helsinki, Finland"
lat, lon = ox.geocode(address)

# Create a GeoDataFrame out of the coordinates
origin = gpd.GeoDataFrame({"geometry": [Point(lon, lat)], "name": "Helsinki Railway station", "id": [0]}, index=[0], crs="epsg:4326")
origin.explore(max_zoom=13, color="red", marker_kwds={"radius": 12})
Make this Notebook Trusted to load map: File -> Trust Notebook

Load transport network#

Virtually all operations of r5py require a transport network. In this example, we use data from Helsinki metropolitan area, which you can easily obtain from the r5py.sampledata.helsinki library. The files will be downloaded automatically to a temporary folder on your computer when you call the variables *.osm_pbf and *.gtfs:

# Download OSM data
r5py.sampledata.helsinki.osm_pbf
DataSet('/home/hentenka/.cache/r5py/kantakaupunki.osm.pbf')
# Download GTFS data
r5py.sampledata.helsinki.gtfs
DataSet('/home/hentenka/.cache/r5py/helsinki_gtfs.zip')

To import the street and public transport networks, instantiate an r5py.TransportNetwork with the file paths to the OSM extract and the GTFS files:

from r5py import TransportNetwork

# Get the filepaths to sample data (OSM and GTFS)
helsinki_osm = r5py.sampledata.helsinki.osm_pbf
helsinki_gtfs = r5py.sampledata.helsinki.gtfs

transport_network = TransportNetwork(
    # OSM data
    helsinki_osm,
    
    # A list of GTFS file(s)
    [
        helsinki_gtfs
    ]
)
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.mapdb.Volume$ByteBufferVol (file:/home/hentenka/.cache/r5py/r5-v6.9-all.jar) to method java.nio.DirectByteBuffer.cleaner()
WARNING: Please consider reporting this to the maintainers of org.mapdb.Volume$ByteBufferVol
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

At this stage, r5py has created the routable transport network and it is stored in the transport_network variable. We can now start using this network for doing the travel time calculations.

Info regarding “An illegal reflective access operation has occurred”

If you receive a Warning when running the cell above (“WARNING: An illegal reflective access operation has occurred”), it is due to using an older version of OpenJDK library (version 11). r5r only supports the version 11 of OpenJDK at this stage which is the reason why we also use it here. If you plan to use only r5py and want to get rid of the warning, you can update the OpenJDK to it’s latest version with mamba:

$ conda activate geo

$ mamba install -c conda-forge openjdk=20

Compute travel time matrix from one to all locations#

A travel time matrix is a dataset detailing the travel costs (e.g., time) between given locations (origins and destinations) in a study area. To compute a travel time matrix with r5py based on public transportation, we first need to initialize an r5py.TravelTimeMatrixComputer -object. As inputs, we pass following arguments for the TravelTimeMatrixComputer:

  • transport_network, which we created in the previous step representing the routable transport network.

  • origins, which is a GeoDataFrame with one location that we created earlier (however, you can also use multiple locations as origins).

  • destinations, which is a GeoDataFrame representing the destinations (in our case, the points GeoDataFrame).

  • departure, which should be Python’s datetime -object (in our case standing for “22nd of February 2022 at 08:30”) to tell r5py that the schedules of this specific time and day should be used for doing the calculations.

    • Note: By default, r5py summarizes and calculates a median travel time from all possible connections within 10 minutes from given depature time (with 1 minute frequency). It is possible to adjust this time window using departure_time_window -parameter (see details here). For robust spatial accessibility assessment (e.g. in scientific works), we recommend to use 60 minutes departure_time_window.

  • transport_modes, which determines the travel modes that will be used in the calculations. These can be passed using the options from the r5py.TransportMode -class.

    • Hint: To see all available options, run help(r5py.TransportMode).

Note

In addition to these ones, the constructor also accepts many other parameters listed here, such as walking and cycling speed, maximum trip duration, maximum number of transit connections used during the trip, etc.

Now, we will first create a travel_time_matrix_computer instance as described above:

import datetime
from r5py import TravelTimeMatrixComputer, TransportMode

# Initialize the tool
travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=origin,
    destinations=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransportMode.TRANSIT, TransportMode.WALK]
)
# To see all available transport modes, uncomment following
# help(TransportMode)

Running this initializes the TravelTimeMatrixComputer, but any calculations were not done yet. To actually run the computations, we need to call .compute_travel_times() on the instance, which will calculate the travel times between all points:

travel_time_matrix = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix.head()
from_id to_id travel_time
0 0 0 19
1 0 1 21
2 0 2 22
3 0 3 25
4 0 4 21

As a result, this returns a pandas.DataFrame which we stored in the travel_time_matrix -variable. The values in the travel_time column are travel times in minutes between the points identified by from_id and to_id. As you can see, the id value in the from_id column is the same for all rows because we only used one origin location as input.

To get a better sense of the results, let’s create a travel time map based on our results. We can do this easily by making a table join between the pop_grid GeoDataFrame and the travel_time_matrix. The key in the travel_time_matrix table is the column to_id and the corresponding key in pop_grid GeoDataFrame is the column id. Notice that here we do the table join with the original the Polygons layer (for visualization purposes). However, the join could also be done in a similar manner with the points GeoDataFrame.

join = pop_grid.merge(travel_time_matrix, left_on="id", right_on="to_id")
join.head()
id population geometry from_id to_id travel_time
0 0 389 POLYGON ((24.90545 60.16086, 24.90545 60.16311... 0 0 19
1 1 296 POLYGON ((24.90546 60.15862, 24.90545 60.16086... 0 1 21
2 2 636 POLYGON ((24.90547 60.15638, 24.90546 60.15862... 0 2 22
3 3 1476 POLYGON ((24.90547 60.15413, 24.90547 60.15638... 0 3 25
4 4 23 POLYGON ((24.90994 60.16535, 24.90994 60.16760... 0 4 21

Now we have the travel times attached to each point, and we can easily visualize them on a map:

m = join.explore("travel_time", cmap="Greens", max_zoom=13)
m = origin.explore(m=m, color="red", marker_kwds={"radius": 10})
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Compute travel time matrix from all to all locations#

Running the calculations between all points in our sample dataset can be done in a similar manner as calculating the travel times from one origin to all destinations. Since, calculating these kind of all-to-all travel time matrices is quite typical when doing accessibility analyses, it is actually possible to calculate a cross-product between all points just by using the origins parameter (i.e. without needing to specify a separate set for destinations). r5py will use the same points as destinations and produce a full set of origins and destinations:

travel_time_matrix_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=points,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransportMode.TRANSIT, TransportMode.WALK]
)
travel_time_matrix_all = travel_time_matrix_computer.compute_travel_times()
travel_time_matrix_all.head()
from_id to_id travel_time
0 0 0 0
1 0 1 7
2 0 2 10
3 0 3 18
4 0 4 13
travel_time_matrix_all.tail()
from_id to_id travel_time
8459 91 87 27
8460 91 88 23
8461 91 89 10
8462 91 90 6
8463 91 91 0
len(travel_time_matrix_all)
8464

As we can see from the outputs above, now we have calculated travel times between all points (n=92) in the study area. Hence, the resulting DataFrame has almost 8500 rows (92x92=8464). Based on these results, we can for example calculate the median travel time to or from a certain point, which gives a good estimate of the overall accessibility of the location in relation to other points:

median_times = travel_time_matrix_all.groupby("from_id")["travel_time"].median()
median_times
from_id
0     22.0
1     25.0
2     27.0
3     28.0
4     26.0
      ... 
87    25.0
88    25.0
89    23.0
90    26.0
91    28.0
Name: travel_time, Length: 92, dtype: float64

To estimate, how long does it take in general to travel between locations in our study area (i.e. what is the baseline accessibility in the area), we can calculate the mean (or median) of the median travel times showing that it is approximately 22 minutes:

median_times.mean()
21.918478260869566

Naturally, we can also visualize these values on a map:

overall_access = pop_grid.merge(median_times.reset_index(), left_on="id", right_on="from_id")
overall_access.head()
id population geometry from_id travel_time
0 0 389 POLYGON ((24.90545 60.16086, 24.90545 60.16311... 0 22.0
1 1 296 POLYGON ((24.90546 60.15862, 24.90545 60.16086... 1 25.0
2 2 636 POLYGON ((24.90547 60.15638, 24.90546 60.15862... 2 27.0
3 3 1476 POLYGON ((24.90547 60.15413, 24.90547 60.15638... 3 28.0
4 4 23 POLYGON ((24.90994 60.16535, 24.90994 60.16760... 4 26.0
overall_access.explore("travel_time", cmap="Blues", scheme="natural_breaks", k=4)
Make this Notebook Trusted to load map: File -> Trust Notebook

In out study area, there seems to be a bit poorer accessibility in the Southern areas and on the edges of the region (i.e. we wittness a classic edge-effect here).

Advanced usage#

Compute travel times with a detailed information about the routing results#

In case you are interested in more detailed routing results, it is possible to use DetailedItinerariesComputer. This will provide not only the same information as in the previous examples, but it also brings much more detailed information about the routings. When using this functionality, r5py produces information about the used routes for each origin-destination pair (with possibly multiple alternative routes), as well as individual trip segments and information about the used modes, public transport route-id information (e.g. bus-line number), distanes, waiting times and the actual geometry used.

Important

Computing detailed itineraries is significantly more time-consuming than calculating simple travel times. As such, think twice whether you actually need the detailed information output from this function, and how you might be able to limit the number of origins and destinations you need to compute.

from r5py import DetailedItinerariesComputer

# Take a small sample of destinations for demo purposes
points_sample = points.sample(3)

travel_time_matrix_computer = DetailedItinerariesComputer(
    transport_network,
    origins=origin,
    destinations=points_sample,
    departure=datetime.datetime(2022,2,22,8,30),
    transport_modes=[TransportMode.TRANSIT, TransportMode.WALK],
    
    # With following attempts to snap all origin and destination points to the transport network before routing
    snap_to_network=True,
)
travel_details = travel_time_matrix_computer.compute_travel_details()
travel_details.head()
/home/hentenka/.conda/envs/mamba/envs/geo/lib/python3.11/site-packages/r5py/r5/detailed_itineraries_computer.py:135: RuntimeWarning: R5 has been compiled with `TransitLayer.SAVE_SHAPES = false` (the default). The geometries of public transport routes are inaccurate (straight lines between stops), and distances can not be computed.
  warnings.warn(
from_id to_id option segment transport_mode departure_time distance travel_time wait_time route geometry
0 0 54 0 0 TransportMode.WALK NaT 1096.987 0 days 00:18:47 NaT None LINESTRING (24.94128 60.17285, 24.94143 60.171...
1 0 54 1 0 TransportMode.WALK 2022-02-22 08:32:12 368.816 0 days 00:06:21 0 days 00:00:00 None LINESTRING (24.94128 60.17285, 24.94143 60.171...
2 0 54 1 1 TransportMode.TRAM 2022-02-22 08:48:00 NaN 0 days 00:02:00 0 days 00:02:55 10 LINESTRING (24.93798 60.17023, 24.94159 60.16776)
3 0 54 1 2 TransportMode.WALK 2022-02-22 08:43:00 518.831 0 days 00:08:55 0 days 00:00:00 None LINESTRING (24.93952 60.16410, 24.93951 60.164...
4 0 54 2 0 TransportMode.WALK 2022-02-22 08:35:06 197.389 0 days 00:03:25 0 days 00:00:00 None LINESTRING (24.94128 60.17285, 24.94143 60.171...

As you can see, the result contains much more information than earlier, see the following table for explanations:

Column

Description

Data type

from_id

the origin of the trip this segment belongs to

any, user defined

to_id

the destination of the trip this segment belongs to

any, user defined

option

sequential number for different trip options found

int

segment

sequential number for segments of the current trip options

int

transport_mode

the transport mode used on the current segment

r5py.TransportMode

departure_time

the transit departure date and time used for current segment

datetime.datetime

distance

the travel distance in metres for the current segment

float

travel_time

The travel time for the current segment

datetime.timedelta

wait_time

The wait time between connections when using public transport

datetime.timedelta

route

The route number or id for public transport route used on a segment

str

geometry

The path travelled on a current segment (with transit, stops connected with straight lines by default)

shapely.LineString

Visualize the routes on a map#

In the following, we will make a nice interactive visualization out of the results, that shows the fastest routes and the mode of transport between the given origin-destination pairs (with multiple alternative trips/routes):

import folium 
import folium.plugins

# Convert travel mode to string (from r5py.TransportMode object)
travel_details["mode"] = travel_details["transport_mode"].astype(str)

# Calculate travel time in minutes (from timedelta)
travel_details["travel time (min)"] = (travel_details["travel_time"].dt.total_seconds() / 60).round(2)

# Generate text for given trip ("origin" to "destination")
travel_details["trip"] = travel_details["from_id"].astype(str) + " to " + travel_details["to_id"].astype(str)

# Choose columns for visualization
selected_cols = ["geometry", "distance", "mode", "route", "travel time (min)", "trip", "from_id", "to_id", "option", "segment"  ]

# Generate the map
m = travel_details[selected_cols].explore(
    tooltip=["trip", "option", "segment", "mode", "route", "travel time (min)", "distance"],
    column="mode",
    tiles="CartoDB.Positron",
    )

# Add marker for the origin
m = origin.explore(m=m, marker_type="marker", marker_kwds=dict(icon=folium.Icon(color="green", icon="train", prefix="fa", )))

# Add customized markers for destinations
points_sample.apply(lambda row: (
        # Marker with destination ID number attached to the icon
        folium.Marker(
            (row["geometry"].y, row["geometry"].x),
            icon=folium.plugins.BeautifyIcon(
                icon_shape="marker",
                number=row["id"],
                border_color="#728224",
                text_color="#728224",
            )
        # Add the marker to existing map    
        ).add_to(m)), axis=1,
)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

As a result, now we have a nice map that shows alternative routes between Railway station and the given destinations in the study area. If you hover over the lines, you can see details about the selected routes with useful information about the travel time, distance, route id (line number) etc. Hence, as such, if you’re feeling nerdy (and happen to have Python installed to your phone 😛), you could replace your Google Maps navigator or other journey planners with r5py! đŸ€“đŸ˜‰

Geometries of public transport routes, and distances travelled

The default version of R⁔ is configured for performance reasons in a way that it does not read the geometries included in GTFS data sets.

As a consequence, the geometry reported by DetailedItinerariesComputer are straight lines in-between the stops of a public transport line, and do not reflect the actual path travelled in public transport modes.

With this in mind, r5py does not attempt to compute the distance of public transport segments if SAVE_SHAPES = false, as distances would be very crude approximations, only. Instead it reports NaN/None.

The Digital Geography Lab maintains a patched version of R⁔ in its GitHub repositories. If you want to refrain from compiling your own R⁔ jar, but still would like to use detailed geometries of public transport routes, follow the instructions in Advanced use of r5py documentation.

Where to go next?#

In case you want to learn more, we recommend reading: