Multi-Dimensional Analysis

Multi-Dimensional Analysis#

This notebook provides a walkthrough of some of the multi-dimensional analysis capabilities of the sharrow library.

import numpy as np
import xarray as xr

import sharrow as sh

sh.__version__

'2.14.0'

Example Data#

We’ll begin by again importing some example data to work with. We’ll be using some test data taken from the MTC example in the ActivitySim project, including tables of data for households and persons, as well as a set of skims containing transportation level of service information for travel around a tiny slice of San Francisco.

The households and persons are typical tabular data, and each can be read in and stored as a pandas.DataFrame.

households = sh.example_data.get_households()
households.head()

	TAZ	SERIALNO	PUMA5	income	PERSONS	HHT	UNITTYPE	NOC	BLDGSZ	TENURE	...	hschpred	hschdriv	htypdwel	hownrent	hadnwst	hadwpst	hadkids	bucketBin	originalPUMA	hmultiunit
HHID
2717868	25	2715386	2202	361000	2	1	0	0	9	1	...	0	0	2	1	0	0	0	3	2202	1
763899	6	5360279	2203	59220	1	4	0	0	9	3	...	0	0	2	2	0	0	0	4	2203	1
2222791	9	77132	2203	197000	2	2	0	0	9	1	...	0	0	2	1	0	0	1	5	2203	1
112477	17	3286812	2203	2200	1	6	0	0	8	3	...	0	0	2	2	0	0	0	7	2203	1
370491	21	6887183	2203	16500	3	1	0	1	8	3	...	1	0	2	2	0	0	0	7	2203	1

5 rows × 46 columns

persons = sh.example_data.get_persons()
persons.head()

	household_id	age	RELATE	ESR	GRADE	PNUM	PAUG	DDP	sex	WEEKS	HOURS	MSP	POVERTY	EARNS	pagecat	pemploy	pstudent	ptype	padkid
PERID
25671	25671	47	1	6	0	1	0	0	1	0	0	6	39	0	6	3	3	4	2
25675	25675	27	1	6	7	1	0	0	2	52	40	2	84	7200	5	3	2	3	2
25678	25678	30	1	6	0	1	0	0	2	0	0	6	84	0	5	3	3	4	2
25683	25683	23	1	6	0	1	0	0	1	0	0	6	1	0	4	3	3	4	2
25684	25684	52	1	6	0	1	0	0	1	0	0	6	94	0	7	3	3	4	2

The skims, on the other hand, are not just simple tabular data, but rather a multi-dimensional representation of the transportation system, indexed by origin. destination, and time of day. Rather than using a single DataFrame for this data, we store it as a multi-dimensional xarray.Dataset.

skims = sh.example_data.get_skims()
skims

<xarray.Dataset> Size: 2MB
Dimensions:               (otaz: 25, dtaz: 25, time_period: 5)
Coordinates:
  * dtaz                  (dtaz) int64 200B 1 2 3 4 5 6 7 ... 20 21 22 23 24 25
  * otaz                  (otaz) int64 200B 1 2 3 4 5 6 7 ... 20 21 22 23 24 25
  * time_period           (time_period) <U2 40B 'EA' 'AM' 'MD' 'PM' 'EV'
Data variables: (12/170)
    DIST                  (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTBIKE              (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DISTWALK              (otaz, dtaz) float32 2kB dask.array<chunksize=(25, 25), meta=np.ndarray>
    DRV_COM_WLK_BOARDS    (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DDIST     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    DRV_COM_WLK_DTIM      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    ...                    ...
    WLK_TRN_WLK_IVT       (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_IWAIT     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WACC      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WAUX      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_WEGR      (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>
    WLK_TRN_WLK_XWAIT     (otaz, dtaz, time_period) float32 12kB dask.array<chunksize=(25, 25, 5), meta=np.ndarray>

xarray.Dataset

Dimensions:
- otaz: 25
- dtaz: 25
- time_period: 5

Coordinates: (3)

dtaz

(dtaz)

int64

1 2 3 4 5 6 7 ... 20 21 22 23 24 25

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25])

otaz

(otaz)

int64

1 2 3 4 5 6 7 ... 20 21 22 23 24 25

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25])

time_period
(time_period)
<U2
'EA' 'AM' 'MD' 'PM' 'EV'
```
array(['EA', 'AM', 'MD', 'PM', 'EV'], dtype='<U2')
```

Data variables: (170)

DIST

(otaz, dtaz)

float32

dask.array<chunksize=(25, 25), meta=np.ndarray>

	Array	Chunk
Bytes	2.44 kiB	2.44 kiB
Shape	(25, 25)	(25, 25)
Dask graph	1 chunks in 2 graph layers
Data type	float32 numpy.ndarray

DISTBIKE

(otaz, dtaz)

float32

dask.array<chunksize=(25, 25), meta=np.ndarray>

	Array	Chunk
Bytes	2.44 kiB	2.44 kiB
Shape	(25, 25)	(25, 25)
Dask graph	1 chunks in 2 graph layers
Data type	float32 numpy.ndarray

DISTWALK

(otaz, dtaz)

float32

dask.array<chunksize=(25, 25), meta=np.ndarray>

	Array	Chunk
Bytes	2.44 kiB	2.44 kiB
Shape	(25, 25)	(25, 25)
Dask graph	1 chunks in 2 graph layers
Data type	float32 numpy.ndarray

DRV_COM_WLK_BOARDS

(otaz, dtaz, time_period)

float32