Using Sparse MAZ Skims

Contents

Using Sparse MAZ Skims#

import numpy as np
import pandas as pd

import sharrow as sh

This notebook walks through using sparse MAZ to MAZ skims with sharrow. The example data we’ll use to demonstrate this feature starts with regular TAZ-based skims.

skims = sh.example_data.get_skims()
skims
<xarray.Dataset> Size: 2MB
Dimensions:               (otaz: 25, dtaz: 25, time_period: 5)
Coordinates:
  * dtaz                  (dtaz) int64 200B 1 2 3 4 5 6 7 ... 20 21 22 23 24 25
  * otaz                  (otaz) int64 200B 1 2 3 4 5 6 7 ... 20 21 22 23 24 25
  * time_period           (time_period) <U2 40B 'EA' 'AM' 'MD' 'PM' 'EV'
Data variables: (12/170)
    DIST                  (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTBIKE              (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTWALK              (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DRV_COM_WLK_BOARDS    (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DDIST     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DTIM      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    ...                    ...
    WLK_TRN_WLK_IVT       (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_IWAIT     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WACC      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WAUX      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WEGR      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_XWAIT     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>

We’ll also load a MAZ-to-TAZ mapping file, which defines the MAZ’s and which TAZ is used for each MAZ.

maz_taz = sh.example_data.get_maz_to_taz()
maz_taz
TAZ
MAZ
100 1
101 1
102 1
103 1
104 1
... ...
270 25
271 25
272 25
273 25
274 25

175 rows × 1 columns

Lastly, we’ll load a sparse MAZ-to-MAZ skim table. This table defines origin and destination MAZ’s, and the walk distance between them. The data is “sparse” in that only a limited number of OMAZ-DMAZ pairs are included. Unlike traditional sparse arrays, the missing elements are not assumed to be zero, but instead we implicitly use the walk distance from the matching TAZ’s in the TAZ-based skims for those zone pairs.

maz_to_maz_walk = sh.example_data.get_maz_to_maz_walk()
maz_to_maz_walk
OMAZ DMAZ DISTWALK
0 100 100 0.01
1 100 101 0.20
2 100 102 0.30
3 102 100 0.40
4 102 101 0.50
5 108 118 0.60
6 108 120 0.70
7 200 200 3.10
8 200 201 3.20

To integrate these data sources, we will set a redirection on the skims. This will add the MAZ dimensions to the skims, MAZ id’s as additional coordinates, and will set attribute flags to tell sharrow which dimensions have been redirected.

skims.redirection.set(
    maz_taz,
    map_to="otaz",
    name="omaz",
    map_also={"dtaz": "dmaz"},
)
skims
<xarray.Dataset> Size: 2MB
Dimensions:                  (otaz: 25, dtaz: 25, time_period: 5, omaz: 175,
                              dmaz: 175)
Coordinates:
  * dtaz                     (dtaz) int64 200B 1 2 3 4 5 6 ... 20 21 22 23 24 25
  * otaz                     (otaz) int64 200B 1 2 3 4 5 6 ... 20 21 22 23 24 25
  * time_period              (time_period) <U2 40B 'EA' 'AM' 'MD' 'PM' 'EV'
  * omaz                     (omaz) int64 1kB 100 101 102 103 ... 272 273 274
  * dmaz                     (dmaz) int64 1kB 100 101 102 103 ... 272 273 274
Data variables: (12/172)
    DIST                     (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTBIKE                 (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTWALK                 (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DRV_COM_WLK_BOARDS       (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DDIST        (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DTIM         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    ...                       ...
    WLK_TRN_WLK_WACC         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WAUX         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WEGR         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_XWAIT        (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    _digitized_otaz_of_omaz  (omaz) int64 1kB 0 0 0 0 0 0 ... 24 24 24 24 24 24
    _digitized_dtaz_of_dmaz  (dmaz) int64 1kB 0 0 0 0 0 0 ... 24 24 24 24 24 24
Attributes:
    dim_redirection_omaz:  otaz
    dim_redirection_dmaz:  dtaz

Next, we can attach the sparse skims using redirection.sparse_blender. This formats the sparse skim table into compressed sparse row format, and attaches the resulting arrays to the Dataset.

skims.redirection.sparse_blender(
    "DISTWALK",
    maz_to_maz_walk.OMAZ,
    maz_to_maz_walk.DMAZ,
    maz_to_maz_walk.DISTWALK,
    max_blend_distance=1.0,
    index=maz_taz.index,
)
skims
<xarray.Dataset> Size: 2MB
Dimensions:                  (otaz: 25, dtaz: 25, time_period: 5, omaz: 175,
                              dmaz: 175)
Coordinates:
  * dtaz                     (dtaz) int64 200B 1 2 3 4 5 6 ... 20 21 22 23 24 25
  * otaz                     (otaz) int64 200B 1 2 3 4 5 6 ... 20 21 22 23 24 25
  * time_period              (time_period) <U2 40B 'EA' 'AM' 'MD' 'PM' 'EV'
  * omaz                     (omaz) int64 1kB 100 101 102 103 ... 272 273 274
  * dmaz                     (dmaz) int64 1kB 100 101 102 103 ... 272 273 274
Data variables: (12/173)
    DIST                     (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTBIKE                 (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTWALK                 (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DRV_COM_WLK_BOARDS       (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DDIST        (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DTIM         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    ...                       ...
    WLK_TRN_WLK_WAUX         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WEGR         (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_XWAIT        (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    _digitized_otaz_of_omaz  (omaz) int64 1kB 0 0 0 0 0 0 ... 24 24 24 24 24 24
    _digitized_dtaz_of_dmaz  (dmaz) int64 1kB 0 0 0 0 0 0 ... 24 24 24 24 24 24
    _s_DISTWALK              (omaz, dmaz) float64 2kB <GCXS: nnz=9, fill_value=0.0>
Attributes:
    dim_redirection_omaz:  otaz
    dim_redirection_dmaz:  dtaz
    blend_DISTWALK_max:    1.0
    blend_DISTWALK_dist:   None

Now the skims are ready to use!

For demonstration purposes, let’s construct a trips dataframe with just a few origin-destination pairs. Note that we’re using the zone id’s from the more detailed MAZ system.

trips = pd.DataFrame(
    {
        "orig_maz": [100, 100, 100, 200, 200],
        "dest_maz": [100, 101, 103, 201, 202],
    }
)
trips
orig_maz dest_maz
0 100 100
1 100 101
2 100 103
3 200 201
4 200 202

We’ll then put the trips together with the skims into a DataTree, as usual for sharrow.

tree = sh.DataTree(
    base=trips,
    skims=skims,
    relationships=(
        "base.orig_maz @ skims.omaz",
        "base.dest_maz @ skims.dmaz",
    ),
)

Now we can setup flows on this tree.

flow = tree.setup_flow(
    {
        "plain_distance": "DISTWALK",
    },
    boundscheck=True,
)
flow.load()
array([[0.0111],
       [0.184 ],
       [0.12  ],
       [0.17  ],
       [0.17  ]], dtype=float32)

Where the sparse (maz) data is missing or exceeds the max blending distance, the dense (taz) data is returned. Otherwise, the output is not strictly taken from the sparse or dense skims, but it is a blended mixture of the two.

We can apply all the transformation we like, as usual.

flow2 = tree.setup_flow(
    {
        "plain_distance": "DISTWALK",
        "clip_distance": "DISTWALK.clip(upper=0.15)",
        "square_distance": "DISTWALK**2",
    }
)
flow2.load_dataframe()
plain_distance clip_distance square_distance
0 0.0111 0.0111 0.000123
1 0.1840 0.1500 0.033856
2 0.1200 0.1200 0.014400
3 0.1700 0.1500 0.028900
4 0.1700 0.1500 0.028900

Using at and iat#

The at and iat accessors work even when sparse matrix tables are attached to a Dataset, with a few caveats. First, only 2-dimension sparse tables are supported at this time. Second, these accessors rely on the ability to reference the sparse data, which is lost if the dataset is naively filtered for variable names; filtering should instead be done in the _names argument, which filters the output of the accessor instead of the input, without needing to build the entire filtered dataset first. For example:

skims.at(
    omaz=trips.orig_maz,
    dmaz=trips.dest_maz,
    _names=["DIST", "DISTWALK"],
)
<xarray.Dataset> Size: 40B
Dimensions:   (index: 5)
Dimensions without coordinates: index
Data variables:
    DIST      (index) float32 20B dask.array<chunksize=(5,), meta=np.ndarray>
    DISTWALK  (index) float32 20B 0.0111 0.184 0.12 0.17 0.17
Attributes:
    dim_redirection_omaz:  otaz
    dim_redirection_dmaz:  dtaz
    blend_DISTWALK_max:    1.0
    blend_DISTWALK_dist:   None
skims.iat(
    omaz=[0, 0, 0, 100, 100],
    dmaz=[0, 1, 3, 101, 102],
    _names=["DIST", "DISTWALK"],
)
<xarray.Dataset> Size: 40B
Dimensions:   (index: 5)
Dimensions without coordinates: index
Data variables:
    DIST      (index) float32 20B 0.12 0.12 0.12 0.17 0.17
    DISTWALK  (index) float32 20B 0.0111 0.184 0.12 0.17 0.17
Attributes:
    dim_redirection_omaz:  otaz
    dim_redirection_dmaz:  dtaz
    blend_DISTWALK_max:    1.0
    blend_DISTWALK_dist:   None

To circumvent the redirection, and sparse lookup and blending, simply point the accessor lookups to the dense dimensions:

skims.at(
    otaz=[1, 1, 1, 16, 16],
    dtaz=[1, 1, 1, 16, 16],
    _names=["DIST", "DISTWALK"],
    _load=True,
)
<xarray.Dataset> Size: 120B
Dimensions:   (index: 5)
Coordinates:
    dtaz      (index) int64 40B 1 1 1 16 16
    otaz      (index) int64 40B 1 1 1 16 16
Dimensions without coordinates: index
Data variables:
    DIST      (index) float32 20B 0.12 0.12 0.12 0.17 0.17
    DISTWALK  (index) float32 20B 0.12 0.12 0.12 0.17 0.17
Attributes:
    dim_redirection_omaz:  otaz
    dim_redirection_dmaz:  dtaz
    blend_DISTWALK_max:    1.0
    blend_DISTWALK_dist:   None
skims.at(
    otaz=[1, 1, 1, 16, 16],
    dtaz=[1, 1, 1, 16, 16],
    _name="DISTWALK",
)
<xarray.DataArray 'DISTWALK' (index: 5)> Size: 20B
array([0.12, 0.12, 0.12, 0.17, 0.17], dtype=float32)
Coordinates:
    dtaz     (index) int64 40B 1 1 1 16 16
    otaz     (index) int64 40B 1 1 1 16 16
Dimensions without coordinates: index