Core Components

ActivitySim’s core components include features for data management, utility expressions, choice models, person time window management, and helper functions. These core components include the skim matrix manager, the data pipeline manager, the random number manager, the tracer, sampling methods, simulation methods, model specification readers and expression evaluators, choice models, timetable, and helper functions.

Data Management

Skim

Skim matrix data access

API

class activitysim.core.skim.DataFrameMatrix(df)

Utility class to allow a pandas dataframe to be treated like a 2-D array, indexed by rowid, colname

For use in vectorized expressions where the desired values depend on both a row column selector e.g. size_terms.get(df.dest_taz, df.purpose)

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [10,20,30,40,50]}, index=[100,101,102,103,104])

dfm = DataFrameMatrix(df)

dfm.get(row_ids=[100,100,103], col_ids=['a', 'b', 'a'])

returns [1, 10,  4]
get(row_ids, col_ids)
Parameters:
row_ids - list of row_ids (df index values)
col_ids - list of column names, one per row_id,

specifying column from which the value for that row should be retrieved

Returns:
series with one row per row_id, with the value from the column specified in col_ids
class activitysim.core.skim.SkimDict

A SkimDict object is a wrapper around a dict of multiple skim objects, where each object is identified by a key. It operates like a dictionary - i.e. use brackets to add and get skim objects.

Note that keys are either strings or tuples of two strings (to support stacking of skims.)

get(key)

Get an available skim object (not the lookup)

Parameters:
key : hashable

The key (identifier) for this skim object

Returns:
skim: Skim

The skim object

set(key, skim_data)

Set skim data for key

Parameters:
key : hashable

The key (identifier) for this skim object

skim_data : Skim

The skim object

Returns:
Nothing
wrap(left_key, right_key)

return a SkimDictWrapper for self

class activitysim.core.skim.SkimDictWrapper(skim_dict, left_key, right_key)

A SkimDictWrapper object is an access wrapper around a SkimDict of multiple skim objects, where each object is identified by a key. It operates like a dictionary - i.e. use brackets to add and get skim objects - but also has information on how to lookup against the skim objects. Specifically, this object has a dataframe, a left_key and right_key. It is assumed that left_key and right_key identify columns in df. The parameter df is usually set by the simulation itself as it’s a result of interacting choosers and alternatives.

When the user calls skims[key], key is an identifier for which skim to use, and the object automatically looks up impedances of that skim using the specified left_key column in df as the origin and the right_key column in df as the destination. In this way, the user does not do the O-D lookup by hand and only specifies which skim to use for this lookup. This is the only purpose of this object: to abstract away the O-D lookup and use skims by specifying which skim to use in the expressions.

Note that keys are either strings or tuples of two strings (to support stacking of skims.)

lookup(key, reverse=False)

Generally not called by the user - use __getitem__ instead

Parameters:
key : hashable

The key (identifier) for this skim object

od : bool (optional)

od=True means lookup standard origin-destination skim value od=False means lookup destination-origin skim value

Returns:
impedances: pd.Series

A Series of impedances which are elements of the Skim object and with the same index as df

max(key)

return max skim value in either o-d or d-o direction

reverse(key)

return skim value in reverse (d-o) direction

set_df(df)

Set the dataframe

Parameters:
df : DataFrame

The dataframe which contains the origin and destination ids

Returns:
Nothing
class activitysim.core.skim.SkimStackWrapper(stack, left_key, right_key, skim_key)

A SkimStackWrapper object wraps a skims object to add an additional wrinkle of lookup functionality. Upon init the separate skims objects are processed into a 3D matrix so that lookup of the different skims can be performed quickly for each row in the dataframe. In this very particular formulation, the keys are assumed to be tuples with two elements - the second element of which will be taken from the different rows in the dataframe. The first element can then be dereferenced like an array. This is useful, for instance, to have a certain skim vary by time of day - the skims are set with keys of (‘SOV’, ‘AM”), (‘SOV’, ‘PM’) etc. The time of day is then taken to be different for every row in the tours table, and the ‘SOV’ portion of the key can be used in __getitem__.

To be more explicit, the input is a dictionary of Skims objects, each of which contains a 2D matrix. These are stacked into a 3D matrix with a mapping of keys to indexes which is applied using pandas .map to a third column in the object dataframe. The three columns - left_key and right_key from the Skims object and skim_key from this one, are then used to dereference the 3D matrix. The tricky part comes in defining the key which matches the 3rd dimension of the matrix, and the key which is passed into __getitem__ below (i.e. the one used in the specs). By convention, every key in the Skims object that is passed in MUST be a tuple with 2 items. The second item in the tuple maps to the items in the dataframe referred to by the skim_key column and the first item in the tuple is then available to pass directly to __getitem__.

The sum conclusion of this is that in the specs, you can say something like out_skim[‘SOV’] and it will automatically dereference the 3D matrix using origin, destination, and time of day.

Parameters:
skims: Skims

This is the Skims object to wrap

skim_key : str

This identifies the column in the dataframe which is used to select among Skim object using the SECOND item in each tuple (see above for a more complete description)

set_df(df)

Set the dataframe

Parameters:
df : DataFrame

The dataframe which contains the origin and destination ids

Returns:
Nothing
class activitysim.core.skim.SkimWrapper(data, offset_mapper=None)

Container for skim arrays.

Parameters:
data : 2D array
offset : int, optional

An optional offset that will be added to origin/destination values to turn them into array indices. For example, if zone IDs are 1-based, an offset of -1 would turn them into 0-based array indices.

get(orig, dest)

Get impedence values for a set of origin, destination pairs.

Parameters:
orig : 1D array
dest : 1D array
Returns:
values : 1D array

Pipeline

Data pipeline manager, which manages the list of model steps, runs them via orca, reads and writes data tables from/to the pipeline datastore, and supports restarting of the pipeline at any model step.

API

activitysim.core.pipeline.add_checkpoint(checkpoint_name)

Create a new checkpoint with specified name, write all data required to restore the simulation to its current state.

Detect any changed tables , re-wrap them and write the current version to the pipeline store. Write the current state of the random number generator.

Parameters:
checkpoint_name : str
activitysim.core.pipeline.checkpointed_tables()

Return a list of the names of all checkpointed tables

activitysim.core.pipeline.close_pipeline()

Close any known open files

activitysim.core.pipeline.extend_table(table_name, df)

add new table or extend (add rows) to an existing table

Parameters:
table_name : str

orca/inject table name

df : pandas DataFrame
activitysim.core.pipeline.get_checkpoints()

Get pandas dataframe of info about all checkpoints stored in pipeline

Returns:
checkpoints_df : pandas.DataFrame
activitysim.core.pipeline.get_pipeline_store()

Return the open pipeline hdf5 checkpoint store or return None if it not been opened

activitysim.core.pipeline.get_rn_generator()

Return the singleton random number object

Returns:
activitysim.random.Random
activitysim.core.pipeline.get_table(table_name, checkpoint_name=None)

Return pandas dataframe corresponding to table_name

if checkpoint_name is None, return the current (most recent) version of the table. The table can be a checkpointed table or any registered orca table (e.g. function table)

if checkpoint_name is specified, return table as it was at that checkpoint (the most recently checkpointed version of the table at or before checkpoint_name)

Parameters:
table_name : str
checkpoint_name : str or None
Returns:
df : pandas.DataFrame
activitysim.core.pipeline.load_checkpoint(checkpoint_name)

Load dataframes and restore random number channel state from pipeline hdf5 file. This restores the pipeline state that existed at the specified checkpoint in a prior simulation. This allows us to resume the simulation after the specified checkpoint

Parameters:
checkpoint_name : str

model_name of checkpoint to load (resume_after argument to open_pipeline)

activitysim.core.pipeline.open_pipeline(resume_after=None)

Start pipeline, either for a new run or, if resume_after, loading checkpoint from pipeline.

If resume_after, then we expect the pipeline hdf5 file to exist and contain checkpoints from a previous run, including a checkpoint with name specified in resume_after

Parameters:
resume_after : str or None

name of checkpoint to load from pipeline store

activitysim.core.pipeline.open_pipeline_store(overwrite=False)

Open the pipeline checkpoint store

Parameters:
overwrite : bool

delete file before opening (unless resuming)

activitysim.core.pipeline.orca_dataframe_tables()

Return a list of the neames of all currently registered dataframe tables

activitysim.core.pipeline.read_df(table_name, checkpoint_name=None)

Read a pandas dataframe from the pipeline store.

We store multiple versions of all simulation tables, for every checkpoint in which they change, so we need to know both the table_name and the checkpoint_name of hte desired table.

The only exception is the checkpoints dataframe, which just has a table_name

An error will be raised by HDFStore if the table is not found

Parameters:
table_name : str
checkpoint_name : str
Returns:
df : pandas.DataFrame

the dataframe read from the store

activitysim.core.pipeline.replace_table(table_name, df)

Add or replace a orca table, removing any existing added orca columns

The use case for this function is a method that calls to_frame on an orca table, modifies it and then saves the modified.

orca.to_frame returns a copy, so no changes are saved, and adding multiple column with add_column adds them in an indeterminate order.

Simply replacing an existing the table “behind the pipeline’s back” by calling orca.add_table risks pipeline to failing to detect that it has changed, and thus not checkpoint the changes.

Parameters:
table_name : str

orca/pipeline table name

df : pandas DataFrame
activitysim.core.pipeline.rewrap(table_name, df=None)

Add or replace an orca registered table as a unitary DataFrame-backed DataFrameWrapper table

if df is None, then get the dataframe from orca (table_name should be registered, or an error will be thrown) which may involve evaluating added columns, etc.

If the orca table already exists, deregister it along with any associated columns before re-registering it.

The net result is that the dataframe is a registered orca DataFrameWrapper table with no computed or added columns.

Parameters:
table_name
df
Returns:
the underlying df of the rewrapped table
activitysim.core.pipeline.run(models, resume_after=None)

run the specified list of models, optionally loading checkpoint and resuming after specified checkpoint.

Since we use model_name as checkpoint name, the same model may not be run more than once.

If resume_after checkpoint is specified and a model with that name appears in the models list, then we only run the models after that point in the list. This allows the user always to pass the same list of models, but specify a resume_after point if desired.

Parameters:
models : [str]

list of model_names

resume_after : str or None

model_name of checkpoint to load checkpoint and AFTER WHICH to resume model run

activitysim.core.pipeline.run_model(model_name)

Run the specified model and add checkpoint for model_name

Since we use model_name as checkpoint name, the same model may not be run more than once.

Parameters:
model_name : str

model_name is assumed to be the name of a registered orca step

activitysim.core.pipeline.set_rn_generator_base_seed(seed)

Like seed for numpy.random.RandomState, but generalized for use with all random streams.

Provide a base seed that will be added to the seeds of all random streams. The default base seed value is 0, so set_base_seed(0) is a NOP

set_rn_generator_base_seed(1) will (e.g.) provide a different set of random streams than the default, but will provide repeatable results re-running or resuming the simulation

set_rn_generator_base_seed(None) will set the base seed to a random and unpredictable integer and so provides “fully pseudo random” non-repeatable streams with different results every time

Must be called before open_pipeline() or pipeline.run()

Parameters:
seed : int or None
activitysim.core.pipeline.split_arg(s, sep, default='')

split str s in two at first sep, returning empty string as second result if no sep

activitysim.core.pipeline.write_df(df, table_name, checkpoint_name=None)

Write a pandas dataframe to the pipeline store.

We store multiple versions of all simulation tables, for every checkpoint in which they change, so we need to know both the table_name and the checkpoint_name to label the saved table

The only exception is the checkpoints dataframe, which just has a table_name

Parameters:
df : pandas.DataFrame

dataframe to store

table_name : str

also conventionally the orca table name

checkpoint_name : str

the checkpoint at which the table was created/modified

Random

ActivitySim’s random number generation has a number of important features unique to AB modeling:

  • Regression testing, debugging - run the exact model with the same inputs and get exactly the same results.
  • Debugging models - run the exact model with the same inputs but with changes to expression files and get the same results except where the equations differ.
  • Since runs can take a while, the above cases need to work with a restartable pipeline.
  • Debugging Multithreading - run the exact model with different multithreading configurations and get the same results.
  • Repeatable household-level choices - results for a household are repeatable when run with different sample sizes
  • Repeatable household level results with different scenarios - results for a household are repeatable with different scenario configurations sequentially up to the point at which those differences emerge, and in alternate submodels in which those differences do not apply.

Random number generation is done using the numpy Mersenne Twister PNRG. ActivitySim seeds on-the-fly and uses a stream of random numbers seeded by the household id, person id, tour id, trip id, the model step offset, and the global seed. The logic for calculating the seed is something along the lines of:

chooser_table.index * number_of_models_for_chooser + chooser_model_offset + global_seed_offset

for example
  1425 * 2 + 0 + 1
where:
  1425 = household table index - households.id
  2 = number of household level models - auto ownership and cdap
  0 = first household model - auto ownership
  1 = global seed offset for testing the same model under different random global seeds

ActivitySim generates a separate, distinct, and stable random number stream for each tour type and tour number in order to maintain as much stability as is possible across alternative scenarios. This is done for trips as well, by direction (inbound versus outbound).

Note

The Random module contains max model steps constants by chooser type - household, person, tour, trip - needs to be equal to the number of chooser sub-models.

API

class activitysim.core.random.SimpleChannel(channel_name, base_seed, domain_df, step_num)

We need to ensure that we generate the same random streams (when re-run or even across different simulations.) We do this by generating a random seed for each domain_df row that is based on the domain_df index (which implies that generated tables like tours and trips are also created with stable, predictable, repeatable row indexes.

Because we need to generate a distinct stream for each step, we can’t just use the domain_df index - we need a strategy for handling multiple steps without generating collisions between streams (i.e. choosing the same seed for more than one stream.)

The easiest way to do this would be to use an array of integers to seed the generator, with a global seed, a channel seed, a row seed, and a step seed. Unfortunately, seeding numpy RandomState with arrays is a LOT slower than with a single integer seed, and speed matters because we reseed on-the-fly for every call because creating a different RandomState object for each row uses too much memory (5K per RandomState object)

So instead, multiply the domain_df index by the number of steps required for the channel add the step_num to the row_seed to get a unique seed for each (domain_df index, step_num) tuple.

Currently, it is possible that random streams for rows in different tables may coincide. This would be easy to avoid with either seed arrays or fast jump/offset.

numpy random seeds are unsigned int32 so there are 4,294,967,295 available seeds. That is probably just about enough to distribute evenly, for most cities, depending on the number of households, persons, tours, trips, and steps.

We do read in the whole households and persons tables at start time, so we could note the max index values. But we might then want a way to ensure stability between the test, example, and full datasets. I am punting on this for now.

begin_step(step_num)

Reset channel state for a new state

Parameters:
step_name : str

pipeline step name for this step

choice_for_df(df, step_name, a, size, replace)

Apply numpy.random.choice once for each row in df using the appropriate random channel for each row.

Concatenate the the choice arrays for every row into a single 1-D ndarray The resulting array will be of length: size * len(df.index) This method is designed to support creation of a interaction_dataset

The columns in df are ignored; the index name and values are used to determine which random number sequence to to use.

Parameters:
df : pandas.DataFrame

df with index name and values corresponding to a registered channel

step_name : str

current step name so we can update row_states seed info

The remaining parameters are passed through as arguments to numpy.random.choice
a : 1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n)

size : int or tuple of ints

Output shape

replace : boolean

Whether the sample is with or without replacement

Returns:
choices : 1-D ndarray of length: size * len(df.index)

The generated random samples for each row concatenated into a single (flat) array

create_row_states_for_domain(domain_df)

Create a dataframe with same index as domain_df and a single column with stable, predictable, repeatable row_seeds for that domain_df index value

See notes on the seed generation strategy in class comment above.

Parameters:
domain_df : pandas.dataframe

domain dataframe with index values for which random streams are to be generated

Returns:
row_states : pandas.DataFrame
extend_domain(domain_df)

Extend existing row_state df by adding seed info for each row in domain_df

It is assumed that the index values of the component tables are disjoint and there will be no ambiguity/collisions between them

Parameters:
domain_df : pandas.DataFrame

domain dataframe with index values for which random streams are to be generated and well-known index name corresponding to the channel

step_name : str or None

provided when reloading so we can restore step_name and step_num

step_num : int or None
random_for_df(df, step_name, n=1)

Return n floating point random numbers in range [0, 1) for each row in df using the appropriate random channel for each row.

Subsequent calls (in the same step) will return the next rand for each df row

The resulting array will be the same length (and order) as df This method is designed to support alternative selection from a probability array

The columns in df are ignored; the index name and values are used to determine which random number sequence to to use.

If “true pseudo random” behavior is desired (i.e. NOT repeatable) the set_base_seed method (q.v.) may be used to globally reseed all random streams.

Parameters:
df : pandas.DataFrame

df with index name and values corresponding to a registered channel

n : int

number of rands desired per df row

Returns:
rands : 2-D ndarray

array the same length as df, with n floats in range [0, 1) for each df row

Tracing

Household tracer. If a household trace ID is specified, then ActivitySim will output a comprehensive set of trace files for all calculations for all household members:

  • hhtrace.log - household trace log file, which specifies the CSV files traced. The order of output files is consistent with the model sequence.
  • various CSV files - every input, intermediate, and output data table - chooser, expressions/utilities, probabilities, choices, etc. - for the trace household for every sub-model

With the set of output CSV files, the user can trace ActivitySim’s calculations in order to ensure they are correct and/or to help debug data and/or logic errors.

API

activitysim.core.tracing.config_logger(custom_config_file=None, basic=False)

Configure logger

if log_config_file is not supplied then look for conf file in configs_dir

if not found use basicConfig

Parameters:
custom_config_file: str

custom config filename

basic: boolean

basic setup

Returns:
Nothing
activitysim.core.tracing.delete_csv_files(output_dir)

Delete CSV files

Parameters:
output_dir: str

Directory of trace output CSVs

Returns:
Nothing
activitysim.core.tracing.get_trace_target(df, slicer)

get target ids and column or index to identify target trace rows in df

Parameters:
df: pandas.DataFrame

dataframe to slice

slicer: str

name of column or index to use for slicing

Returns:
(target, column) tuple
target : int or list of ints

id or ids that identify tracer target rows

column : str

name of column to search for targets or None to search index

activitysim.core.tracing.hh_id_for_chooser(id, choosers)
Parameters:
id - scalar id (or list of ids) from chooser index
choosers - pandas dataframe whose index contains ids
Returns:
scalar household_id or series of household_ids
activitysim.core.tracing.interaction_trace_rows(interaction_df, choosers, sample_size=None)

Trace model design for interaction_simulate

Parameters:
interaction_df: pandas.DataFrame

traced model_design dataframe

choosers: pandas.DataFrame

interaction_simulate choosers (needed to filter the model_design dataframe by traced hh or person id)

sample_size int or None

int for constant sample size, or None if choosers have different numbers of alternatives

Returns
——-
trace_rows : numpy.ndarray

array of booleans to flag which rows in interaction_df to trace

trace_ids : tuple (str, numpy.ndarray)

column name and array of trace_ids mapping trace_rows to their target_id for use by trace_interaction_eval_results which needs to know target_id so it can create separate tables for each distinct target for readability

activitysim.core.tracing.log_file_path(name)

For use in logging.yaml tag to inject log file path

filename: !!python/object/apply:activitysim.defaults.tracing.log_file_path [‘asim.log’]

Parameters:
name: str

output folder name

Returns:
f: str

output folder name

activitysim.core.tracing.no_results(trace_label)

standard no-op to write tracing when a model produces no results

activitysim.core.tracing.print_summary(label, df, describe=False, value_counts=False)

Print summary

Parameters:
label: str

tracer name

df: pandas.DataFrame

traced dataframe

describe: boolean

print describe?

value_counts: boolean

print value counts?

Returns:
Nothing
activitysim.core.tracing.register_households(df, trace_hh_id)

Register with orca households for tracing

Parameters:
df: pandas.DataFrame

traced dataframe

trace_hh_id: int

household id we are tracing

Returns:
Nothing
activitysim.core.tracing.register_participants(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_participant_ids’ with a list of participant_ids in household we are tracing. This allows us to slice by participant_ids without requiring presence of household_id column

Parameters:
df: pandas.DataFrame

traced dataframe

trace_hh_id: int

household id we are tracing

Returns:
Nothing
activitysim.core.tracing.register_persons(df, trace_hh_id)

Register with orca persons for tracing

Parameters:
df: pandas.DataFrame

traced dataframe

trace_hh_id: int

household id we are tracing

Returns:
Nothing
activitysim.core.tracing.register_tours(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_tour_ids’ with a list of tour_ids in household we are tracing. This allows us to slice by tour_id without requiring presence of person_id column

Parameters:
df: pandas.DataFrame

traced dataframe

trace_hh_id: int

household id we are tracing

Returns:
Nothing
activitysim.core.tracing.register_traceable_table(table_name, df)

Register traceable table

Parameters:
df: pandas.DataFrame

traced dataframe

Returns:
Nothing
activitysim.core.tracing.register_trips(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_trip_ids’ with a list of tour_ids in household we are tracing. This allows us to slice by trip_id without requiring presence of person_id column

Parameters:
df: pandas.DataFrame

traced dataframe

trace_hh_id: int

household id we are tracin

Returns:
Nothing
activitysim.core.tracing.slice_canonically(df, slicer, label, warn_if_empty=False)

Slice dataframe by traced household or person id dataframe and write to CSV

Parameters:
df: pandas.DataFrame

dataframe to slice

slicer: str

name of column or index to use for slicing

label: str

tracer name - only used to report bad slicer

Returns:
sliced subset of dataframe
activitysim.core.tracing.slice_ids(df, ids, column=None)

slice a dataframe to select only records with the specified ids

Parameters:
df: pandas.DataFrame

traced dataframe

ids: int or list of ints

slice ids

column: str

column to slice (slice using index if None)

Returns:
df: pandas.DataFrame

sliced dataframe

activitysim.core.tracing.trace_df(df, label, slicer=None, columns=None, index_label=None, column_labels=None, transpose=True, warn_if_empty=False)

Slice dataframe by traced household or person id dataframe and write to CSV

Parameters:
df: pandas.DataFrame

traced dataframe

label: str

tracer name

slicer: Object

slicer for subsetting

columns: list

columns to write

index_label: str

index name

column_labels: [str, str]

labels for columns in csv

transpose: boolean

whether to transpose file for legibility

warn_if_empty: boolean

write warning if sliced df is empty

Returns:
Nothing
activitysim.core.tracing.trace_interaction_eval_results(trace_results, trace_ids, label)

Trace model design eval results for interaction_simulate

Parameters:
trace_results: pandas.DataFrame

traced model_design dataframe

trace_ids : tuple (str, numpy.ndarray)

column name and array of trace_ids from interaction_trace_rows() used to filter the trace_results dataframe by traced hh or person id

label: str

tracer name

Returns:
Nothing
activitysim.core.tracing.write_csv(df, file_name, index_label=None, columns=None, column_labels=None, transpose=True)

Print write_csv

Parameters:
df: pandas.DataFrame or pandas.Series

traced dataframe

file_name: str

output file name

index_label: str

index name

columns: list

columns to write

transpose: bool

whether to transpose dataframe (ignored for series)

Returns
——-
Nothing

Utility Expressions

Much of the power of ActivitySim comes from being able to specify Python, pandas, and numpy expressions for calculations. Refer to the pandas help for a general introduction to expressions. ActivitySim provides two ways to evaluate expressions:

  • Simple table expressions are evaluated using DataFrame.eval(). pandas’ eval operates on the current table.
  • Python expressions, denoted by beginning with @, are evaluated with Python’s eval().

Simple table expressions can only refer to columns in the current DataFrame. Python expressions can refer to any Python objects urrently in memory.

Conventions

There are a few conventions for writing expressions in ActivitySim:

  • each expression is applied to all rows in the table being operated on
  • expressions must be vectorized expressions and can use most numpy and pandas expressions
  • global constants are specified in the settings file
  • comments are specified with #
  • you can refer to the current table being operated on as df
  • often an object called skims, skims_od, or similar is available and is used to lookup the relevant skim information. See Skim for more information.
  • when editing the CSV files in Excel, use single quote ‘ or space at the start of a cell to get Excel to accept the expression

Example Expressions File

An expressions file has the following basic form:

Description Expression cars0 cars1
2 Adults (age 16+) drivers==2 0 3.0773
Persons age 35-34 num_young_adults 0 -0.4849
Number of workers, capped at 3 @df.workers.clip(upper=3) 0 0.2936
Distance, from 0 to 1 miles @skims[‘DIST’].clip(1) -3.2451 -0.9523
  • Rows are vectorized expressions that will be calculated for every record in the current table being operated on
  • The Description column describes the expression
  • The Expression column contains a valid vectorized Python/pandas/numpy expression. In the example above, drivers is a column in the current table. Use @ to refer to data outside the current table
  • There is a column for each alternative and its relevant coefficient

There are some variations on this setup, but the functionality is similar. For example, in the example destination choice model, the size terms expressions file has market segments as rows and employment type coefficients as columns. Broadly speaking, there are currently four types of model expression configurations:

  • Simple Simulate choice model - select from a fixed set of choices defined in the specification file, such as the example above.
  • Simulate with Interaction choice model - combine the choice expressions with the choice alternatives files since the alternatives are not listed in the expressions file. The Non-Mandatory Tour Destination Choice model implements this approach.
  • Complex choice model - an expressions file, a coefficients file, and a YAML settings file with model structural definition. The Tour Mode Choice models are examples of this and are illustrated below.
  • Combinatorial choice model - first generate a set of alternatives based on a combination of alternatives across choosers, and then make choices. The Coordinated Daily Activity Pattern model implements this approach.

The Tour Mode Choice model is a complex choice model since the expressions file is structured a little bit differently, as shown below. Each row is an expression for one of the alternatives, and each column contains either -999, 1, or blank. The coefficients for each expression is in a separate file, with a separate column for each alternative. In the example below, the @c_ivt*(@odt_skims['SOV_TIME'] + dot_skims['SOV_TIME']) expression is travel time for the tour origin to desination at the tour start time plus the tour destination to tour origin at the tour end time. The odt_skims and dot_skims objects are setup ahead-of-time to refer to the relevant skims for this model. The @c_ivt comes from the tour mode choice coefficient file. The tour mode choice model is a nested logit (NL) model and the nesting structure (including nesting coefficients) is specified in the YAML settings file.

Description Expression DRIVEALONEFREE DRIVEALONEPAY
DA - Unavailable sov_available == False -999  
DA - In-vehicle time @c_ivt*(@odt_skims[‘SOV_TIME’] + dot_skims[‘SOV_TIME’]) 1  
DAP - Unavailable for age less than 16 age < 16   -999
DAP - Unavailable for joint tours is_joint == True   -999

Sampling with Interaction

Methods for expression handling, solving, and sampling (i.e. making multiple choices), with interaction with the chooser table.

Sampling is done with replacement and a sample correction factor is calculated. The factor is calculated as follows:

freq = how often an alternative is sampled (i.e. the pick_count)
prob = probability of the alternative
correction_factor = log(freq/prob)

#for example:

freq              1.00        2.00    3.00    4.00    5.00
prob              0.30        0.30    0.30    0.30    0.30
correction factor 1.20        1.90    2.30    2.59    2.81

As the alternative is oversampled, its utility goes up for final selection. The unique set of alternatives is passed to the final choice model and the correction factor is included in the utility.

API

activitysim.core.interaction_sample.interaction_sample(choosers, alternatives, spec, sample_size, alt_col_name=None, allow_zero_probs=False, skims=None, locals_d=None, chunk_size=0, trace_label=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters:
choosers : pandas.DataFrame

DataFrame of choosers

alternatives : pandas.DataFrame

DataFrame of alternatives - will be merged with choosers and sampled

spec : pandas.DataFrame

A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients.

sample_size : int, optional

Sample alternatives with sample of given size. By default is None, which does not sample alternatives.

alt_col_name: str or None

name to give the sampled_alternative column

skims : Skims object

The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used.

locals_d : Dict

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

chunk_size : int

if chunk_size > 0 iterates over choosers in chunk_size chunks

trace_label: str

This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None.

Returns:
choices_df : pandas.DataFrame

A DataFrame where index should match the index of the choosers DataFrame (except with sample_size rows for each choser row, one row for each alt sample) and columns alt_col_name, prob, rand, pick_count

<alt_col_name>:

alt identifier from alternatives[<alt_col_name>

prob: float

the probability of the chosen alternative

pick_count : int

number of duplicate picks for chooser, alt

activitysim.core.interaction_sample.make_sample_choices(choosers, probs, alternatives, sample_size, alternative_count, alt_col_name, allow_zero_probs, trace_label)
Parameters:
choosers
probs : pandas DataFrame

one row per chooser and one column per alternative

alternatives

dataframe with index containing alt ids

sample_size : int

number of samples/choices to make

alternative_count
alt_col_name : str
trace_label

Simulate

Methods for expression handling, solving, choosing (i.e. making choices) from a fixed set of choices defined in the specification file.

API

activitysim.core.simulate.compute_base_probabilities(nested_probabilities, nests, spec)

compute base probabilities for nest leaves Base probabilities will be the nest-adjusted probabilities of all leaves This flattens or normalizes all the nested probabilities so that they have the proper global relative values (the leaf probabilities sum to 1 for each row.)

Parameters:
nested_probabilities : pandas.DataFrame

dataframe with the nested probabilities for nest leafs and nodes

nest_spec : dict

Nest tree dict from the model spec yaml file

spec : pandas.Dataframe

simple simulate spec so we can return columns in appropriate order

Returns
——-
base_probabilities : pandas.DataFrame

Will have the index of nested_probabilities and columns for leaf base probabilities

activitysim.core.simulate.compute_nested_exp_utilities(raw_utilities, nest_spec)

compute exponentiated nest utilities based on nesting coefficients

For nest nodes this is the exponentiated logsum of alternatives adjusted by nesting coefficient

leaf <- exp( raw_utility ) nest <- exp( ln(sum of exponentiated raw_utility of leaves) * nest_coefficient)

Parameters:
raw_utilities : pandas.DataFrame

dataframe with the raw alternative utilities of all leaves (what in non-nested logit would be the utilities of all the alternatives)

nest_spec : dict

Nest tree dict from the model spec yaml file

Returns:
nested_utilities : pandas.DataFrame

Will have the index of raw_utilities and columns for exponentiated leaf and node utilities

activitysim.core.simulate.compute_nested_probabilities(nested_exp_utilities, nest_spec, trace_label)

compute nested probabilities for nest leafs and nodes probability for nest alternatives is simply the alternatives’s local (to nest) probability computed in the same way as the probability of non-nested alternatives in multinomial logit i.e. the fractional share of the sum of the exponentiated utility of itself and its siblings except in nested logit, its sib group is restricted to the nest

Parameters:
nested_exp_utilities : pandas.DataFrame

dataframe with the exponentiated nested utilities of all leaves and nodes

nest_spec : dict

Nest tree dict from the model spec yaml file

Returns
——-
nested_probabilities : pandas.DataFrame

Will have the index of nested_exp_utilities and columns for leaf and node probabilities

activitysim.core.simulate.eval_mnl(choosers, spec, locals_d, custom_chooser, trace_label=None, trace_choice_name=None)

Run a simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

Each row in spec computes a partial utility for each alternative, by providing a spec expression (often a boolean 0-1 trigger) and a column of utility coefficients for each alternative.

We compute the utility of each alternative by matrix-multiplication of eval results with the utility coefficients in the spec alternative columns yielding one row per chooser and one column per alternative

Parameters:
choosers : pandas.DataFrame
spec : pandas.DataFrame

A table of variable specifications and coefficient values. Variable expressions should be in the table index and the table should have a column for each alternative.

locals_d : Dict or None

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

custom_chooser : function(probs, choosers, spec, trace_label) returns choices, rands

custom alternative to logit.make_choices

trace_label: str

This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None.

trace_choice_name: str

This is the column label to be used in trace file csv dump of choices

Returns:
choices : pandas.Series

Index will be that of choosers, values will match the columns of spec.

activitysim.core.simulate.eval_mnl_logsums(choosers, spec, locals_d, trace_label=None)

like eval_nl except return logsums instead of making choices

Returns:
logsums : pandas.Series

Index will be that of choosers, values will be logsum across spec column values

activitysim.core.simulate.eval_nl(choosers, spec, nest_spec, locals_d, custom_chooser, trace_label=None, trace_choice_name=None)

Run a nested-logit simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

Parameters:
choosers : pandas.DataFrame
spec : pandas.DataFrame

A table of variable specifications and coefficient values. Variable expressions should be in the table index and the table should have a column for each alternative.

nest_spec:

dictionary specifying nesting structure and nesting coefficients (from the model spec yaml file)

locals_d : Dict or None

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

custom_chooser : function(probs, choosers, spec, trace_label) returns choices, rands

custom alternative to logit.make_choices

trace_label: str

This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None.

trace_choice_name: str

This is the column label to be used in trace file csv dump of choices

Returns:
choices : pandas.Series

Index will be that of choosers, values will match the columns of spec.

activitysim.core.simulate.eval_nl_logsums(choosers, spec, nest_spec, locals_d, trace_label=None)

like eval_nl except return logsums instead of making choices

Returns:
logsums : pandas.Series

Index will be that of choosers, values will be nest logsum based on spec column values

activitysim.core.simulate.eval_variables(exprs, df, locals_d=None, target_type=<type 'numpy.float64'>)

Evaluate a set of variable expressions from a spec in the context of a given data table.

There are two kinds of supported expressions: “simple” expressions are evaluated in the context of the DataFrame using DataFrame.eval. This is the default type of expression.

Python expressions are evaluated in the context of this function using Python’s eval function. Because we use Python’s eval this type of expression supports more complex operations than a simple expression. Python expressions are denoted by beginning with the @ character. Users should take care that these expressions must result in a Pandas Series.

Parameters:
exprs : sequence of str
df : pandas.DataFrame
locals_d : Dict

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

target_type: dtype or None

type to coerce results or None if no coercion desired

Returns:
variables : pandas.DataFrame

Will have the index of df and columns of eval results of exprs.

activitysim.core.simulate.read_model_spec(fpath, fname, description_name='Description', expression_name='Expression')

Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions and expressions, plus one or more alternatives.

The CSV is required to have a header with column names. For example:

Description,Expression,alt0,alt1,alt2
Parameters:
fpath : str

path to directory containing file.

fname : str

Name of a CSV spec file

description_name : str, optional

Name of the column in fname that contains the component description.

expression_name : str, optional

Name of the column in fname that contains the component expression.

Returns:
spec : pandas.DataFrame

The description column is dropped from the returned data and the expression values are set as the table index.

activitysim.core.simulate.set_skim_wrapper_targets(df, skims)

Add the dataframe to the SkimDictWrapper object so that it can be dereferenced using the parameters of the skims object.

Parameters:
df : pandas.DataFrame

Table to which to add skim data as new columns. df is modified in-place.

skims : SkimDictWrapper or SkimStackWrapper object, or a list or dict of skims

The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used.

activitysim.core.simulate.simple_simulate(choosers, spec, nest_spec, skims=None, locals_d=None, chunk_size=0, custom_chooser=None, trace_label=None, trace_choice_name=None)

Run an MNL or NL simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

activitysim.core.simulate.simple_simulate_logsums(choosers, spec, nest_spec, skims=None, locals_d=None, chunk_size=0, trace_label=None)

like simple_simulate except return logsums instead of making choices

Returns:
logsums : pandas.Series

Index will be that of choosers, values will be nest logsum based on spec column values

activitysim.core.simulate.simple_simulate_logsums_rpc(chunk_size, choosers, spec, nest_spec, trace_label)

calculate rows_per_chunk for simple_simulate_logsums

activitysim.core.simulate.simple_simulate_rpc(chunk_size, choosers, spec, nest_spec, trace_label)

rows_per_chunk calculator for simple_simulate

Simulate with Interaction

Methods for expression handling, solving, choosing (i.e. making choices), with interaction with the chooser table.

API

activitysim.core.interaction_simulate.eval_interaction_utilities(spec, df, locals_d, trace_label, trace_rows)

Compute the utilities for a single-alternative spec evaluated in the context of df

We could compute the utilities for interaction datasets just as we do for simple_simulate specs with multiple alternative columns byt calling eval_variables and then computing the utilities by matrix-multiplication of eval results with the utility coefficients in the spec alternative columns.

But interaction simulate computes the utilities of each alternative in the context of a separate row in interaction dataset df, and so there is only one alternative in spec. This turns out to be quite a bit faster (in this special case) than the pandas dot function.

For efficiency, we combine eval_variables and multiplication of coefficients into a single step, so we don’t have to create a separate column for each partial utility. Instead, we simply multiply the eval result by a single alternative coefficient and sum the partial utilities.

spec : dataframe
one row per spec expression and one col with utility coefficient
df : dataframe
cross join (cartesian product) of choosers with alternatives combines columns of choosers and alternatives len(df) == len(choosers) * len(alternatives) index values (non-unique) are index values from alternatives df
interaction_utilities : dataframe
the utility of each alternative is sum of the partial utilities determined by the various spec expressions and their corresponding coefficients yielding a dataframe with len(interaction_df) rows and one utility column having the same index as interaction_df (non-unique values from alternatives df)
Returns:
utilities : pandas.DataFrame

Will have the index of df and a single column of utilities

activitysim.core.interaction_simulate.interaction_simulate(choosers, alternatives, spec, skims=None, locals_d=None, sample_size=None, chunk_size=0, trace_label=None, trace_choice_name=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters:
choosers : pandas.DataFrame

DataFrame of choosers

alternatives : pandas.DataFrame

DataFrame of alternatives - will be merged with choosers, currently without sampling

spec : pandas.DataFrame

A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients.

skims : Skims object

The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used.

locals_d : Dict

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

sample_size : int, optional

Sample alternatives with sample of given size. By default is None, which does not sample alternatives.

chunk_size : int

if chunk_size > 0 iterates over choosers in chunk_size chunks

trace_label: str

This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None.

trace_choice_name: str

This is the column label to be used in trace file csv dump of choices

Returns:
choices : pandas.Series

A series where index should match the index of the choosers DataFrame and values will match the index of the alternatives DataFrame - choices are simulated in the standard Monte Carlo fashion

Simulate with Sampling and Interaction

Methods for expression handling, solving, sampling (i.e. making multiple choices), and choosing (i.e. making choices), with interaction with the chooser table.

API

activitysim.core.interaction_sample_simulate.interaction_sample_simulate(choosers, alternatives, spec, choice_column, allow_zero_probs=False, zero_prob_choice_val=None, skims=None, locals_d=None, chunk_size=0, trace_label=None, trace_choice_name=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters:
choosers : pandas.DataFrame

DataFrame of choosers

alternatives : pandas.DataFrame

DataFrame of alternatives - will be merged with choosers index domain same as choosers, but repeated for each alternative

spec : pandas.DataFrame

A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients.

skims : Skims object

The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used.

locals_d : Dict

This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @

chunk_size : int

if chunk_size > 0 iterates over choosers in chunk_size chunks

trace_label: str

This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None.

trace_choice_name: str

This is the column label to be used in trace file csv dump of choices

Returns:
choices : pandas.Series

A series where index should match the index of the choosers DataFrame and values will match the index of the alternatives DataFrame - choices are simulated in the standard Monte Carlo fashion

Assign

Alternative version of the expression evaluators in activitysim.core.simulate that supports temporary variable assignment. Temporary variables are identified in the expressions as starting with “_”, such as “_hh_density_bin”. These fields are not saved to the data pipeline store. This feature is used by the Accessibility model.

API

activitysim.core.assign.assign_variables(assignment_expressions, df, locals_dict, df_alias=None, trace_rows=None)

Evaluate a set of variable expressions from a spec in the context of a given data table.

Expressions are evaluated using Python’s eval function. Python expressions have access to variables in locals_d (and df being accessible as variable df.) They also have access to previously assigned targets as the assigned target name.

lowercase variables starting with underscore are temp variables (e.g. _local_var) and not returned except in trace_results

uppercase variables starting with underscore are temp scalar variables (e.g. _LOCAL_SCALAR) and not returned except in trace_assigned_locals This is useful for defining general purpose local constants in expression file

Users should take care that expressions (other than temp scalar variables) should result in a Pandas Series (scalars will be automatically promoted to series.)

Parameters:
assignment_expressions : pandas.DataFrame of target assignment expressions

target: target column names expression: pandas or python expression to evaluate

df : pandas.DataFrame
locals_d : Dict

This is a dictionary of local variables that will be the environment for an evaluation of “python” expression.

trace_rows: series or array of bools to use as mask to select target rows to trace
Returns:
variables : pandas.DataFrame

Will have the index of df and columns named by target and containing the result of evaluating expression

trace_df : pandas.DataFrame or None

a dataframe containing the eval result values for each assignment expression

activitysim.core.assign.evaluate_constants(expressions, constants)

Evaluate a list of constant expressions - each one can depend on the one before it. These are usually used for the coefficients which have relationships to each other. So ivt=.7 and then ivt_lr=ivt*.9.

Parameters:
expressions : Series

the index are the names of the expressions which are used in subsequent evals - thus naming the expressions is required.

constants : dict

will be passed as the scope of eval - usually a separate set of constants are passed in here

Returns:
d : dict
activitysim.core.assign.local_utilities()

Dict of useful modules and functions to provides as locals for use in eval of expressions

Returns:
utility_dict : dict

name, entity pairs of locals

activitysim.core.assign.read_assignment_spec(fname, description_name='Description', target_name='Target', expression_name='Expression')

Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions targets, and expressions,

The CSV is required to have a header with column names. For example:

Description,Target,Expression
Parameters:
fname : str

Name of a CSV spec file.

description_name : str, optional

Name of the column in fname that contains the component description.

target_name : str, optional

Name of the column in fname that contains the component target.

expression_name : str, optional

Name of the column in fname that contains the component expression.

Returns:
spec : pandas.DataFrame

dataframe with three columns: [‘description’ ‘target’ ‘expression’]

activitysim.core.assign.undupe_column_names(df, template='{} ({})')

rename df column names so there are no duplicates (in place)

e.g. if there are two columns named “dog”, the second column will be reformatted to “dog (2)”

Parameters:
df : pandas.DataFrame

dataframe whose column names should be de-duplicated

template : template taking two arguments (old_name, int) to use to rename columns
Returns:
df : pandas.DataFrame

dataframe that was renamed in place, for convenience in chaining

Choice Models

Logit

Multinomial logit (MNL) or Nested logit (NL) choice model. These choice models depend on the foundational components of ActivitySim, such as the expressions and data handling described in the Execution Flow section.

To specify and solve an MNL model:

  • either specify LOGIT_TYPE: MNL in the model configuration YAML file or omit the setting
  • call either simulate.simple_simulate() or simulate.interaction_simulate() depending if the alternatives are interacted with the choosers or because alternatives are sampled

To specify and solve an NL model:

  • specify LOGIT_TYPE: NL in the model configuration YAML file
  • specify the nesting structure via the NESTS setting in the model configuration YAML file. An example nested logit NESTS entry can be found in example/configs/tour_mode_choice.yaml
  • call simulate.simple_simulate(). The simulate.interaction_simulate() functionality is not yet supported for NL.

API

class activitysim.core.logit.Nest(name=None, level=0)

Data for a nest-logit node or leaf

This object is passed on yield when iterate over nest nodes (branch or leaf) The nested logit design is stored in a yaml file as a tree of dict objects, but using an object to pass the nest data makes the code a little more readable

An example nest specification is in the example tour mode choice model yaml configuration file - example/configs/tour_mode_choice.yaml.

activitysim.core.logit.count_nests(nest_spec, type=None)

count the nests of the specified type (or all nests if type is None) return 0 if nest_spec is none

activitysim.core.logit.each_nest(nest_spec, type=None, post_order=False)

Iterate over each nest or leaf node in the tree (of subtree)

Parameters:
nest_spec : dict

Nest tree dict from the model spec yaml file

type : str

Nest class type to yield None yields all nests ‘leaf’ yields only leaf nodes ‘branch’ yields only branch nodes

post_order : Bool

Should we iterate over the nodes of the tree in post-order or pre-order? (post-order means we yield the alternatives sub-tree before current node.)

Yields:
nest : Nest

Nest object with info about the current node (nest or leaf)

activitysim.core.logit.interaction_dataset(choosers, alternatives, sample_size=None)

Combine choosers and alternatives into one table for the purposes of creating interaction variables and/or sampling alternatives.

Any duplicate column names in alternatives table will be renamed with an ‘_r’ suffix. (e.g. TAZ field in alternatives will appear as TAZ_r so that it can be targeted in a skim)

Parameters:
choosers : pandas.DataFrame
alternatives : pandas.DataFrame
sample_size : int, optional

If sampling from alternatives for each chooser, this is how many to sample.

Returns:
alts_sample : pandas.DataFrame

Merged choosers and alternatives with data repeated either len(alternatives) or sample_size times.

activitysim.core.logit.make_choices(probs, trace_label=None, trace_choosers=None)

Make choices for each chooser from among a set of alternatives.

Parameters:
probs : pandas.DataFrame

Rows for choosers and columns for the alternatives from which they are choosing. Values are expected to be valid probabilities across each row, e.g. they should sum to 1.

trace_choosers : pandas.dataframe

the choosers df (for interaction_simulate) to facilitate the reporting of hh_id by report_bad_choices because it can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df

Returns:
choices : pandas.Series

Maps chooser IDs (from probs index) to a choice, where the choice is an index into the columns of probs.

rands : pandas.Series

The random numbers used to make the choices (for debugging, tracing)

activitysim.core.logit.report_bad_choices(bad_row_map, df, trace_label, msg, trace_choosers=None, raise_error=True)
Parameters:
bad_row_map
df : pandas.DataFrame

utils or probs dataframe

msg : str

message describing the type of bad choice that necessitates error being thrown

trace_choosers : pandas.dataframe

the choosers df (for interaction_simulate) to facilitate the reporting of hh_id because we can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df

Returns:
raises RuntimeError
activitysim.core.logit.utils_to_probs(utils, trace_label=None, exponentiated=False, allow_zero_probs=False, trace_choosers=None)

Convert a table of utilities to probabilities.

Parameters:
utils : pandas.DataFrame

Rows should be choosers and columns should be alternatives.

trace_label : str

label for tracing bad utility or probability values

exponentiated : bool

True if utilities have already been exponentiated

allow_zero_probs : bool

if True value rows in which all utility alts are EXP_UTIL_MIN will result in rows in probs to have all zero probability (and not sum to 1.0) This is for the benefit of calculating probabilities of nested logit nests

trace_choosers : pandas.dataframe

the choosers df (for interaction_simulate) to facilitate the reporting of hh_id by report_bad_choices because it can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df

Returns:
probs : pandas.DataFrame

Will have the same index and columns as utils.

Person Time Windows

The departure time and duration models require person time windows. Time windows are adjacent time periods that are available for travel. Time windows are stored in a timetable table and each row is a person and each time period (in the case of MTC TM1 is 5am to midnight in 1 hr increments) is a column. Each column is coded as follows:

  • 0 - unscheduled, available
  • 2 - scheduled, start of a tour, is available as the last period of another tour
  • 4 - scheduled, end of a tour, is available as the first period of another tour
  • 6 - scheduled, end or start of a tour, available for this period only
  • 7 - scheduled, unavailable, middle of a tour

A good example of a time window expression is @tt.previous_tour_ends(df.person_id, df.start). This uses the person id and the tour start period to check if a previous tour ends in the same time period.

API

class activitysim.core.timetable.TimeTable(windows_df, tdd_alts_df, table_name=None)
tdd_alts_df      tdd_footprints_df
start  end      '0' '1' '2' '3' '4'...
5      5    ==>  0   6   0   0   0 ...
5      6    ==>  0   2   4   0   0 ...
5      7    ==>  0   2   7   4   0 ...
adjacent_window_after(window_row_ids, periods)

Return number of adjacent periods after specified period that are available (not in the middle of another tour.)

Implements MTC TM1 macro @@adjWindowAfterThisPeriodAlt Function name is kind of a misnomer, but parallels that used in MTC TM1 UECs

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant

Returns:
pandas Series int

Number of adjacent windows indexed by window_row_ids.index

adjacent_window_before(window_row_ids, periods)

Return number of adjacent periods before specified period that are available (not in the middle of another tour.)

Implements MTC TM1 macro @@getAdjWindowBeforeThisPeriodAlt Function name is kind of a misnomer, but parallels that used in MTC TM1 UECs

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant

Returns:
pandas Series int

Number of adjacent windows indexed by window_row_ids.index

adjacent_window_run_length(window_row_ids, periods, before)

Return the number of adjacent periods before or after specified period that are available (not in the middle of another tour.)

Internal DRY method to implement adjacent_window_before and adjacent_window_after

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant

before : bool

Specify desired run length is of adjacent window before (True) or after (False)

assign(window_row_ids, tdds)

Assign tours (represented by tdd alt ids) to persons

Updates self.windows numpy array. Assignments will not ‘take’ outside this object until/unless replace_table called or updated timetable retrieved by get_windows_df

Parameters:
window_row_ids : pandas Series

series of window_row_ids indexed by tour_id

tdds : pandas series

series of tdd_alt ids, index irrelevant

assign_footprints(window_row_ids, footprints)

assign footprints for specified window_row_ids

This method is used for initialization of joint_tour timetables based on the combined availability of the joint tour participants

Parameters:
window_row_ids : pandas Series

series of window_row_ids index irrelevant, but we want to use map()

footprints : numpy array

with one row per window_row_id and one column per time period

assign_subtour_mask(window_row_ids, tdds)
index     window_row_ids   tdds
20973389  20973389           26
44612864  44612864            3
48954854  48954854            7

tour footprints
[[0 0 2 7 7 7 7 7 7 4 0 0 0 0 0 0 0 0 0 0 0]
[0 2 7 7 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 2 7 7 7 7 7 7 4 0 0 0 0 0 0 0 0 0 0 0 0]]

subtour_mask
[[7 7 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7]
[7 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7]
[7 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7]]
previous_tour_begins(window_row_ids, periods)

Does a previously scheduled tour begin in the specified period?

Implements MTC TM1 @@prevTourBeginsThisArrivalPeriodAlt

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant

Returns:
pandas Series boolean

indexed by window_row_ids.index

previous_tour_ends(window_row_ids, periods)

Does a previously scheduled tour end in the specified period?

Implements MTC TM1 @@prevTourEndsThisDeparturePeriodAlt

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant (one period per window_row_id)

Returns:
pandas Series boolean

indexed by window_row_ids.index

remaining_periods_available(window_row_ids, starts, ends)

Determine number of periods remaining available after the time window from starts to ends is hypothetically scheduled

Implements MTC TM1 @@remainingPeriodsAvailableAlt

The start and end periods will always be available after scheduling, so ignore them. The periods between start and end must be currently unscheduled, so assume they will become unavailable after scheduling this window.

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

starts : pandas series int

series of tdd_alt ids, index irrelevant (one per window_row_id)

ends : pandas series int

series of tdd_alt ids, index irrelevant (one per window_row_id)

Returns:
available : pandas Series int

number periods available indexed by window_row_ids.index

replace_table()

Save or replace windows_df DataFrame to pipeline with saved table name (specified when object instantiated.)

This is a convenience function in case caller instantiates object in one context (e.g. dependency injection) where it knows the pipeline table name, but wants to checkpoint the table in another context where it does not know that name.

slice_windows_by_row_id(window_row_ids)

return windows array slice containing rows for specified window_row_ids (in window_row_ids order)

tour_available(window_row_ids, tdds)

test whether time window allows tour with specific tdd alt’s time window

Parameters:
window_row_ids : pandas Series

series of window_row_ids indexed by tour_id

tdds : pandas series

series of tdd_alt ids, index irrelevant

Returns:
available : pandas Series of bool

with same index as window_row_ids.index (presumably tour_id, but we don’t care)

window_periods_in_states(window_row_ids, periods, states)

Return boolean array indicating whether specified window periods are in list of states.

Internal DRY method to implement previous_tour_ends and previous_tour_begins

Parameters:
window_row_ids : pandas Series int

series of window_row_ids indexed by tour_id

periods : pandas series int

series of tdd_alt ids, index irrelevant (one period per window_row_id)

states : list of int

presumably (e.g. I_EMPTY, I_START…)

Returns:
pandas Series boolean

indexed by window_row_ids.index

activitysim.core.timetable.create_timetable_windows(rows, tdd_alts)

create an empty (all available) timetable with one window row per rows.index

Parameters:
rows - pd.DataFrame or Series or orca.DataFrameWrapper

all we care about is the index

tdd_alts - pd.DataFrame

We expect a start and end column, and create a timetable to accomodate all alts (with on window of padding at each end)

so if start is 5 and end is 23, we return something like this:

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

PERID
30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Returns:
pd.DataFrame indexed by rows.index, and one column of int8 for each time window (plus padding)

Helpers

Chunk

Chunking management

API

activitysim.core.chunk.chunked_choosers_and_alts(choosers, alternatives, rows_per_chunk)

generator to iterate over choosers and alternatives in chunk_size chunks

like chunked_choosers, but also chunks alternatives for use with sampled alternatives which will have different alternatives (and numbers of alts)

There may be up to sample_size (or as few as one) alternatives for each chooser because alternatives may have been sampled more than once, but pick_count for those alternatives will always sum to sample_size.

When we chunk the choosers, we need to take care chunking the alternatives as there are varying numbers of them for each chooser. Since alternatives appear in the same order as choosers, we can use cumulative pick_counts to identify boundaries of sets of alternatives

Parameters:
choosers
alternatives : pandas DataFrame

sample alternatives including pick_count column in same order as choosers

rows_per_chunk : int
Yields:
i : int

one-based index of current chunk

num_chunks : int

total number of chunks that will be yielded

choosers : pandas DataFrame slice

chunk of choosers

alternatives : pandas DataFrame slice

chunk of alternatives for chooser chunk

Utilities

Vectorized helper functions

API

activitysim.core.util.assign_in_place(df, df2)

update existing row values in df from df2, adding columns to df if they are not there

Parameters:
df : pd.DataFrame

assignment left-hand-side (dest)

df2: pd.DataFrame

assignment right-hand-side (source)

Returns
——-
activitysim.core.util.left_merge_on_index_and_col(left_df, right_df, join_col, target_col)

like pandas left merge, but join on both index and a specified join_col

FIXME - for now return a series of ov values from specified right_df target_col

Parameters:
left_df : pandas DataFrame

index name assumed to be same as that of right_df

right_df : pandas DataFrame

index name assumed to be same as that of left_df

join_col : str

name of column to join on (in addition to index values) should have same name in both dataframes

target_col : str

name of column from right_df whose joined values should be returned as series

Returns:
target_series : pandas Series

series of target_col values with same index as left_df i.e. values joined to left_df from right_df with index of left_df

activitysim.core.util.other_than(groups, bools)

Construct a Series that has booleans indicating the presence of something- or someone-else with a certain property within a group.

Parameters:
groups : pandas.Series

A column with the same index as bools that defines the grouping of bools. The bools Series will be used to index groups and then the grouped values will be counted.

bools : pandas.Series

A boolean Series indicating where the property of interest is present. Should have the same index as groups.

Returns:
others : pandas.Series

A boolean Series with the same index as groups and bools indicating whether there is something- or something-else within a group with some property (as indicated by bools).

activitysim.core.util.quick_loc_df(loc_list, target_df, attribute=None)

faster replacement for target_df.loc[loc_list] or target_df.loc[loc_list][attribute]

pandas DataFrame.loc[] indexing doesn’t scale for large arrays (e.g. > 1,000,000 elements)

Parameters:
loc_list : list-like (numpy.ndarray, pandas.Int64Index, or pandas.Series)
target_df : pandas.DataFrame containing column named attribute
attribute : name of column from loc_list to return (or none for all columns)
Returns:
pandas.DataFrame or, if attribbute specified, pandas.Series
activitysim.core.util.quick_loc_series(loc_list, target_series)

faster replacement for target_series.loc[loc_list]

pandas Series.loc[] indexing doesn’t scale for large arrays (e.g. > 1,000,000 elements)

Parameters:
loc_list : list-like (numpy.ndarray, pandas.Int64Index, or pandas.Series)
target_series : pandas.Series
Returns:
pandas.Series
activitysim.core.util.reindex(series1, series2)

This reindexes the first series by the second series. This is an extremely common operation that does not appear to be in Pandas at this time. If anyone knows of an easier way to do this in Pandas, please inform the UrbanSim developers.

The canonical example would be a parcel series which has an index which is parcel_ids and a value which you want to fetch, let’s say it’s land_area. Another dataset, let’s say of buildings has a series which indicate the parcel_ids that the buildings are located on, but which does not have land_area. If you pass parcels.land_area as the first series and buildings.parcel_id as the second series, this function returns a series which is indexed by buildings and has land_area as values and can be added to the buildings dataset.

In short, this is a join on to a different table using a foreign key stored in the current table, but with only one attribute rather than for a full dataset.

This is very similar to the pandas “loc” function or “reindex” function, but neither of those functions return the series indexed on the current table. In both of those cases, the series would be indexed on the foreign table and would require a second step to change the index.

Parameters:
series1, series2 : pandas.Series
Returns:
reindexed : pandas.Series

Config

Helper functions for configuring a model run

API

activitysim.core.config.get_logit_model_settings(model_settings)

Read nest spec (for nested logit) from model settings file

Returns:
nests : dict

dictionary specifying nesting structure and nesting coefficients

constants : dict

dictionary of constants to add to locals for use by expressions in model spec

activitysim.core.config.get_model_constants(model_settings)

Read constants from model settings file

Returns:
constants : dict

dictionary of constants to add to locals for use by expressions in model spec

activitysim.core.config.handle_standard_args(parser=None)
Adds ‘standard’ activitysim arguments:
–config : specify path to config_dir –output : specify path to output_dir –data : specify path to data_dir
Parameters:
parser : argparse.ArgumentParser or None

to custom argument handling, pass in a parser with arguments added and handle them based on returned args. This method will hand the args it adds

Returns
——-
args : parser.parse_args() result

Inject

Wrap orca class to make it easier to track and manage interaction with the data pipeline.

API

activitysim.core.inject.reinject_decorated_tables()

reinject the decorated tables (and columns)

Inject_Defaults

Default file and folder settings are injected into the orca model runner if needed.

API

activitysim.core.inject_defaults.pipeline_path(output_dir, settings)

Orca injectable to return the path to the pipeline hdf5 file based on output_dir and settings

Output

Write output files.

API

activitysim.core.steps.output.write_data_dictionary(output_dir)

Write table_name, number of rows, columns, and bytes for each checkpointed table

Parameters:
output_dir: str
activitysim.core.steps.output.write_tables(output_dir)

Write pipeline tables as csv files (in output directory) as specified by output_tables list in settings file.

‘output_tables’ can specify either a list of output tables to include or to skip if no output_tables list is specified, then no checkpointed tables will be written

To write all output tables EXCEPT the households and persons tables:

output_tables:
  action: skip
  tables:
    - households
    - persons

To write ONLY the households table:

output_tables:
  action: include
  tables:
     - households
Parameters:
output_dir: str

Tests

See activitysim.core.test