# Core Components¶

ActivitySim’s core components include features for data management, utility expressions, choice models, person time window management, and helper functions. These core components include the skim matrix manager, the data pipeline manager, the random number manager, the tracer, sampling methods, simulation methods, model specification readers and expression evaluators, choice models, timetable, and helper functions.

## Data Management¶

### Skim¶

Skim matrix data access

#### API¶

class activitysim.core.skim.DataFrameMatrix(df)

Utility class to allow a pandas dataframe to be treated like a 2-D array, indexed by rowid, colname

For use in vectorized expressions where the desired values depend on both a row column selector e.g. size_terms.get(df.dest_taz, df.purpose)

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [10,20,30,40,50]}, index=[100,101,102,103,104])

dfm = DataFrameMatrix(df)

dfm.get(row_ids=[100,100,103], col_ids=['a', 'b', 'a'])

returns [1, 10,  4]

get(row_ids, col_ids)
Parameters: row_ids - list of row_ids (df index values) col_ids - list of column names, one per row_id, specifying column from which the value for that row should be retrieved series with one row per row_id, with the value from the column specified in col_ids
class activitysim.core.skim.SkimDict

A SkimDict object is a wrapper around a dict of multiple skim objects, where each object is identified by a key. It operates like a dictionary - i.e. use brackets to add and get skim objects.

Note that keys are either strings or tuples of two strings (to support stacking of skims.)

get(key)

Get an available skim object (not the lookup)

Parameters: key : hashable The key (identifier) for this skim object skim: Skim The skim object
set(key, skim_data)

Set skim data for key

Parameters: key : hashable The key (identifier) for this skim object skim_data : Skim The skim object Nothing
wrap(left_key, right_key)

return a SkimDictWrapper for self

class activitysim.core.skim.SkimDictWrapper(skim_dict, left_key, right_key)

A SkimDictWrapper object is an access wrapper around a SkimDict of multiple skim objects, where each object is identified by a key. It operates like a dictionary - i.e. use brackets to add and get skim objects - but also has information on how to lookup against the skim objects. Specifically, this object has a dataframe, a left_key and right_key. It is assumed that left_key and right_key identify columns in df. The parameter df is usually set by the simulation itself as it’s a result of interacting choosers and alternatives.

When the user calls skims[key], key is an identifier for which skim to use, and the object automatically looks up impedances of that skim using the specified left_key column in df as the origin and the right_key column in df as the destination. In this way, the user does not do the O-D lookup by hand and only specifies which skim to use for this lookup. This is the only purpose of this object: to abstract away the O-D lookup and use skims by specifying which skim to use in the expressions.

Note that keys are either strings or tuples of two strings (to support stacking of skims.)

lookup(key, reverse=False)

Generally not called by the user - use __getitem__ instead

Parameters: key : hashable The key (identifier) for this skim object od : bool (optional) od=True means lookup standard origin-destination skim value od=False means lookup destination-origin skim value impedances: pd.Series A Series of impedances which are elements of the Skim object and with the same index as df
max(key)

return max skim value in either o-d or d-o direction

reverse(key)

return skim value in reverse (d-o) direction

set_df(df)

Set the dataframe

Parameters: df : DataFrame The dataframe which contains the origin and destination ids Nothing
class activitysim.core.skim.SkimStackWrapper(stack, left_key, right_key, skim_key)

A SkimStackWrapper object wraps a skims object to add an additional wrinkle of lookup functionality. Upon init the separate skims objects are processed into a 3D matrix so that lookup of the different skims can be performed quickly for each row in the dataframe. In this very particular formulation, the keys are assumed to be tuples with two elements - the second element of which will be taken from the different rows in the dataframe. The first element can then be dereferenced like an array. This is useful, for instance, to have a certain skim vary by time of day - the skims are set with keys of (‘SOV’, ‘AM”), (‘SOV’, ‘PM’) etc. The time of day is then taken to be different for every row in the tours table, and the ‘SOV’ portion of the key can be used in __getitem__.

To be more explicit, the input is a dictionary of Skims objects, each of which contains a 2D matrix. These are stacked into a 3D matrix with a mapping of keys to indexes which is applied using pandas .map to a third column in the object dataframe. The three columns - left_key and right_key from the Skims object and skim_key from this one, are then used to dereference the 3D matrix. The tricky part comes in defining the key which matches the 3rd dimension of the matrix, and the key which is passed into __getitem__ below (i.e. the one used in the specs). By convention, every key in the Skims object that is passed in MUST be a tuple with 2 items. The second item in the tuple maps to the items in the dataframe referred to by the skim_key column and the first item in the tuple is then available to pass directly to __getitem__.

The sum conclusion of this is that in the specs, you can say something like out_skim[‘SOV’] and it will automatically dereference the 3D matrix using origin, destination, and time of day.

Parameters: skims: Skims This is the Skims object to wrap skim_key : str This identifies the column in the dataframe which is used to select among Skim object using the SECOND item in each tuple (see above for a more complete description)
set_df(df)

Set the dataframe

Parameters: df : DataFrame The dataframe which contains the origin and destination ids Nothing
class activitysim.core.skim.SkimWrapper(data, offset_mapper=None)

Container for skim arrays.

Parameters: data : 2D array offset : int, optional An optional offset that will be added to origin/destination values to turn them into array indices. For example, if zone IDs are 1-based, an offset of -1 would turn them into 0-based array indices.
get(orig, dest)

Get impedence values for a set of origin, destination pairs.

Parameters: orig : 1D array dest : 1D array values : 1D array

### Pipeline¶

Data pipeline manager, which manages the list of model steps, runs them via orca, reads and writes data tables from/to the pipeline datastore, and supports restarting of the pipeline at any model step.

#### API¶

activitysim.core.pipeline.add_checkpoint(checkpoint_name)

Create a new checkpoint with specified name, write all data required to restore the simulation to its current state.

Detect any changed tables , re-wrap them and write the current version to the pipeline store. Write the current state of the random number generator.

Parameters: checkpoint_name : str
activitysim.core.pipeline.checkpointed_tables()

Return a list of the names of all checkpointed tables

activitysim.core.pipeline.close_pipeline()

Close any known open files

activitysim.core.pipeline.extend_table(table_name, df)

Parameters: table_name : str orca/inject table name df : pandas DataFrame
activitysim.core.pipeline.get_checkpoints()

Get pandas dataframe of info about all checkpoints stored in pipeline

Returns: checkpoints_df : pandas.DataFrame
activitysim.core.pipeline.get_pipeline_store()

Return the open pipeline hdf5 checkpoint store or return None if it not been opened

activitysim.core.pipeline.get_rn_generator()

Return the singleton random number object

Returns: activitysim.random.Random
activitysim.core.pipeline.get_table(table_name, checkpoint_name=None)

Return pandas dataframe corresponding to table_name

if checkpoint_name is None, return the current (most recent) version of the table. The table can be a checkpointed table or any registered orca table (e.g. function table)

if checkpoint_name is specified, return table as it was at that checkpoint (the most recently checkpointed version of the table at or before checkpoint_name)

Parameters: table_name : str checkpoint_name : str or None df : pandas.DataFrame
activitysim.core.pipeline.load_checkpoint(checkpoint_name)

Load dataframes and restore random number channel state from pipeline hdf5 file. This restores the pipeline state that existed at the specified checkpoint in a prior simulation. This allows us to resume the simulation after the specified checkpoint

Parameters: checkpoint_name : str model_name of checkpoint to load (resume_after argument to open_pipeline)
activitysim.core.pipeline.open_pipeline(resume_after=None)

Start pipeline, either for a new run or, if resume_after, loading checkpoint from pipeline.

If resume_after, then we expect the pipeline hdf5 file to exist and contain checkpoints from a previous run, including a checkpoint with name specified in resume_after

Parameters: resume_after : str or None name of checkpoint to load from pipeline store
activitysim.core.pipeline.open_pipeline_store(overwrite=False)

Open the pipeline checkpoint store

Parameters: overwrite : bool delete file before opening (unless resuming)
activitysim.core.pipeline.orca_dataframe_tables()

Return a list of the neames of all currently registered dataframe tables

activitysim.core.pipeline.read_df(table_name, checkpoint_name=None)

Read a pandas dataframe from the pipeline store.

We store multiple versions of all simulation tables, for every checkpoint in which they change, so we need to know both the table_name and the checkpoint_name of hte desired table.

The only exception is the checkpoints dataframe, which just has a table_name

An error will be raised by HDFStore if the table is not found

Parameters: table_name : str checkpoint_name : str df : pandas.DataFrame the dataframe read from the store
activitysim.core.pipeline.replace_table(table_name, df)

Add or replace a orca table, removing any existing added orca columns

The use case for this function is a method that calls to_frame on an orca table, modifies it and then saves the modified.

orca.to_frame returns a copy, so no changes are saved, and adding multiple column with add_column adds them in an indeterminate order.

Simply replacing an existing the table “behind the pipeline’s back” by calling orca.add_table risks pipeline to failing to detect that it has changed, and thus not checkpoint the changes.

Parameters: table_name : str orca/pipeline table name df : pandas DataFrame
activitysim.core.pipeline.rewrap(table_name, df=None)

Add or replace an orca registered table as a unitary DataFrame-backed DataFrameWrapper table

if df is None, then get the dataframe from orca (table_name should be registered, or an error will be thrown) which may involve evaluating added columns, etc.

If the orca table already exists, deregister it along with any associated columns before re-registering it.

The net result is that the dataframe is a registered orca DataFrameWrapper table with no computed or added columns.

Parameters: table_name df the underlying df of the rewrapped table
activitysim.core.pipeline.run(models, resume_after=None)

run the specified list of models, optionally loading checkpoint and resuming after specified checkpoint.

Since we use model_name as checkpoint name, the same model may not be run more than once.

If resume_after checkpoint is specified and a model with that name appears in the models list, then we only run the models after that point in the list. This allows the user always to pass the same list of models, but specify a resume_after point if desired.

Parameters: models : [str] list of model_names resume_after : str or None model_name of checkpoint to load checkpoint and AFTER WHICH to resume model run
activitysim.core.pipeline.run_model(model_name)

Run the specified model and add checkpoint for model_name

Since we use model_name as checkpoint name, the same model may not be run more than once.

Parameters: model_name : str model_name is assumed to be the name of a registered orca step
activitysim.core.pipeline.set_rn_generator_base_seed(seed)

Like seed for numpy.random.RandomState, but generalized for use with all random streams.

Provide a base seed that will be added to the seeds of all random streams. The default base seed value is 0, so set_base_seed(0) is a NOP

set_rn_generator_base_seed(1) will (e.g.) provide a different set of random streams than the default, but will provide repeatable results re-running or resuming the simulation

set_rn_generator_base_seed(None) will set the base seed to a random and unpredictable integer and so provides “fully pseudo random” non-repeatable streams with different results every time

Must be called before open_pipeline() or pipeline.run()

Parameters: seed : int or None
activitysim.core.pipeline.split_arg(s, sep, default='')

split str s in two at first sep, returning empty string as second result if no sep

activitysim.core.pipeline.write_df(df, table_name, checkpoint_name=None)

Write a pandas dataframe to the pipeline store.

We store multiple versions of all simulation tables, for every checkpoint in which they change, so we need to know both the table_name and the checkpoint_name to label the saved table

The only exception is the checkpoints dataframe, which just has a table_name

Parameters: df : pandas.DataFrame dataframe to store table_name : str also conventionally the orca table name checkpoint_name : str the checkpoint at which the table was created/modified

### Random¶

ActivitySim’s random number generation has a number of important features unique to AB modeling:

• Regression testing, debugging - run the exact model with the same inputs and get exactly the same results.
• Debugging models - run the exact model with the same inputs but with changes to expression files and get the same results except where the equations differ.
• Since runs can take a while, the above cases need to work with a restartable pipeline.
• Debugging Multithreading - run the exact model with different multithreading configurations and get the same results.
• Repeatable household-level choices - results for a household are repeatable when run with different sample sizes
• Repeatable household level results with different scenarios - results for a household are repeatable with different scenario configurations sequentially up to the point at which those differences emerge, and in alternate submodels in which those differences do not apply.

Random number generation is done using the numpy Mersenne Twister PNRG. ActivitySim seeds on-the-fly and uses a stream of random numbers seeded by the household id, person id, tour id, trip id, the model step offset, and the global seed. The logic for calculating the seed is something along the lines of:

chooser_table.index * number_of_models_for_chooser + chooser_model_offset + global_seed_offset

for example
1425 * 2 + 0 + 1
where:
1425 = household table index - households.id
2 = number of household level models - auto ownership and cdap
0 = first household model - auto ownership
1 = global seed offset for testing the same model under different random global seeds


ActivitySim generates a separate, distinct, and stable random number stream for each tour type and tour number in order to maintain as much stability as is possible across alternative scenarios. This is done for trips as well, by direction (inbound versus outbound).

Note

The Random module contains max model steps constants by chooser type - household, person, tour, trip - needs to be equal to the number of chooser sub-models.

#### API¶

class activitysim.core.random.SimpleChannel(channel_name, base_seed, domain_df, step_num)

We need to ensure that we generate the same random streams (when re-run or even across different simulations.) We do this by generating a random seed for each domain_df row that is based on the domain_df index (which implies that generated tables like tours and trips are also created with stable, predictable, repeatable row indexes.

Because we need to generate a distinct stream for each step, we can’t just use the domain_df index - we need a strategy for handling multiple steps without generating collisions between streams (i.e. choosing the same seed for more than one stream.)

The easiest way to do this would be to use an array of integers to seed the generator, with a global seed, a channel seed, a row seed, and a step seed. Unfortunately, seeding numpy RandomState with arrays is a LOT slower than with a single integer seed, and speed matters because we reseed on-the-fly for every call because creating a different RandomState object for each row uses too much memory (5K per RandomState object)

So instead, multiply the domain_df index by the number of steps required for the channel add the step_num to the row_seed to get a unique seed for each (domain_df index, step_num) tuple.

Currently, it is possible that random streams for rows in different tables may coincide. This would be easy to avoid with either seed arrays or fast jump/offset.

numpy random seeds are unsigned int32 so there are 4,294,967,295 available seeds. That is probably just about enough to distribute evenly, for most cities, depending on the number of households, persons, tours, trips, and steps.

We do read in the whole households and persons tables at start time, so we could note the max index values. But we might then want a way to ensure stability between the test, example, and full datasets. I am punting on this for now.

begin_step(step_num)

Reset channel state for a new state

Parameters: step_name : str pipeline step name for this step
choice_for_df(df, step_name, a, size, replace)

Apply numpy.random.choice once for each row in df using the appropriate random channel for each row.

Concatenate the the choice arrays for every row into a single 1-D ndarray The resulting array will be of length: size * len(df.index) This method is designed to support creation of a interaction_dataset

The columns in df are ignored; the index name and values are used to determine which random number sequence to to use.

Parameters: df : pandas.DataFrame df with index name and values corresponding to a registered channel step_name : str current step name so we can update row_states seed info The remaining parameters are passed through as arguments to numpy.random.choice a : 1-D array-like or int If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n) size : int or tuple of ints Output shape replace : boolean Whether the sample is with or without replacement choices : 1-D ndarray of length: size * len(df.index) The generated random samples for each row concatenated into a single (flat) array
create_row_states_for_domain(domain_df)

Create a dataframe with same index as domain_df and a single column with stable, predictable, repeatable row_seeds for that domain_df index value

See notes on the seed generation strategy in class comment above.

Parameters: domain_df : pandas.dataframe domain dataframe with index values for which random streams are to be generated row_states : pandas.DataFrame
extend_domain(domain_df)

Extend existing row_state df by adding seed info for each row in domain_df

It is assumed that the index values of the component tables are disjoint and there will be no ambiguity/collisions between them

Parameters: domain_df : pandas.DataFrame domain dataframe with index values for which random streams are to be generated and well-known index name corresponding to the channel step_name : str or None provided when reloading so we can restore step_name and step_num step_num : int or None
random_for_df(df, step_name, n=1)

Return n floating point random numbers in range [0, 1) for each row in df using the appropriate random channel for each row.

Subsequent calls (in the same step) will return the next rand for each df row

The resulting array will be the same length (and order) as df This method is designed to support alternative selection from a probability array

The columns in df are ignored; the index name and values are used to determine which random number sequence to to use.

If “true pseudo random” behavior is desired (i.e. NOT repeatable) the set_base_seed method (q.v.) may be used to globally reseed all random streams.

Parameters: df : pandas.DataFrame df with index name and values corresponding to a registered channel n : int number of rands desired per df row rands : 2-D ndarray array the same length as df, with n floats in range [0, 1) for each df row

### Tracing¶

Household tracer. If a household trace ID is specified, then ActivitySim will output a comprehensive set of trace files for all calculations for all household members:

• hhtrace.log - household trace log file, which specifies the CSV files traced. The order of output files is consistent with the model sequence.
• various CSV files - every input, intermediate, and output data table - chooser, expressions/utilities, probabilities, choices, etc. - for the trace household for every sub-model

With the set of output CSV files, the user can trace ActivitySim’s calculations in order to ensure they are correct and/or to help debug data and/or logic errors.

#### API¶

activitysim.core.tracing.config_logger(custom_config_file=None, basic=False)

Configure logger

if log_config_file is not supplied then look for conf file in configs_dir

Parameters: custom_config_file: str custom config filename basic: boolean basic setup Nothing
activitysim.core.tracing.delete_csv_files(output_dir)

Delete CSV files

Parameters: output_dir: str Directory of trace output CSVs Nothing
activitysim.core.tracing.get_trace_target(df, slicer)

get target ids and column or index to identify target trace rows in df

Parameters: df: pandas.DataFrame dataframe to slice slicer: str name of column or index to use for slicing (target, column) tuple target : int or list of ints id or ids that identify tracer target rows column : str name of column to search for targets or None to search index
activitysim.core.tracing.hh_id_for_chooser(id, choosers)
Parameters: id - scalar id (or list of ids) from chooser index choosers - pandas dataframe whose index contains ids scalar household_id or series of household_ids
activitysim.core.tracing.interaction_trace_rows(interaction_df, choosers, sample_size=None)

Trace model design for interaction_simulate

Parameters: interaction_df: pandas.DataFrame traced model_design dataframe choosers: pandas.DataFrame interaction_simulate choosers (needed to filter the model_design dataframe by traced hh or person id) sample_size int or None int for constant sample size, or None if choosers have different numbers of alternatives Returns ——- trace_rows : numpy.ndarray array of booleans to flag which rows in interaction_df to trace trace_ids : tuple (str, numpy.ndarray) column name and array of trace_ids mapping trace_rows to their target_id for use by trace_interaction_eval_results which needs to know target_id so it can create separate tables for each distinct target for readability
activitysim.core.tracing.log_file_path(name)

For use in logging.yaml tag to inject log file path

filename: !!python/object/apply:activitysim.defaults.tracing.log_file_path [‘asim.log’]

Parameters: name: str output folder name f: str output folder name
activitysim.core.tracing.no_results(trace_label)

activitysim.core.tracing.print_summary(label, df, describe=False, value_counts=False)

Print summary

Parameters: label: str tracer name df: pandas.DataFrame traced dataframe describe: boolean print describe? value_counts: boolean print value counts? Nothing
activitysim.core.tracing.register_households(df, trace_hh_id)

Register with orca households for tracing

Parameters: df: pandas.DataFrame traced dataframe trace_hh_id: int household id we are tracing Nothing
activitysim.core.tracing.register_participants(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_participant_ids’ with a list of participant_ids in household we are tracing. This allows us to slice by participant_ids without requiring presence of household_id column

Parameters: df: pandas.DataFrame traced dataframe trace_hh_id: int household id we are tracing Nothing
activitysim.core.tracing.register_persons(df, trace_hh_id)

Register with orca persons for tracing

Parameters: df: pandas.DataFrame traced dataframe trace_hh_id: int household id we are tracing Nothing
activitysim.core.tracing.register_tours(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_tour_ids’ with a list of tour_ids in household we are tracing. This allows us to slice by tour_id without requiring presence of person_id column

Parameters: df: pandas.DataFrame traced dataframe trace_hh_id: int household id we are tracing Nothing
activitysim.core.tracing.register_traceable_table(table_name, df)

Register traceable table

Parameters: df: pandas.DataFrame traced dataframe Nothing
activitysim.core.tracing.register_trips(df, trace_hh_id)

Register with inject for tracing

create an injectable ‘trace_trip_ids’ with a list of tour_ids in household we are tracing. This allows us to slice by trip_id without requiring presence of person_id column

Parameters: df: pandas.DataFrame traced dataframe trace_hh_id: int household id we are tracin Nothing
activitysim.core.tracing.slice_canonically(df, slicer, label, warn_if_empty=False)

Slice dataframe by traced household or person id dataframe and write to CSV

Parameters: df: pandas.DataFrame dataframe to slice slicer: str name of column or index to use for slicing label: str tracer name - only used to report bad slicer sliced subset of dataframe
activitysim.core.tracing.slice_ids(df, ids, column=None)

slice a dataframe to select only records with the specified ids

Parameters: df: pandas.DataFrame traced dataframe ids: int or list of ints slice ids column: str column to slice (slice using index if None) df: pandas.DataFrame sliced dataframe
activitysim.core.tracing.trace_df(df, label, slicer=None, columns=None, index_label=None, column_labels=None, transpose=True, warn_if_empty=False)

Slice dataframe by traced household or person id dataframe and write to CSV

Parameters: df: pandas.DataFrame traced dataframe label: str tracer name slicer: Object slicer for subsetting columns: list columns to write index_label: str index name column_labels: [str, str] labels for columns in csv transpose: boolean whether to transpose file for legibility warn_if_empty: boolean write warning if sliced df is empty Nothing
activitysim.core.tracing.trace_interaction_eval_results(trace_results, trace_ids, label)

Trace model design eval results for interaction_simulate

Parameters: trace_results: pandas.DataFrame traced model_design dataframe trace_ids : tuple (str, numpy.ndarray) column name and array of trace_ids from interaction_trace_rows() used to filter the trace_results dataframe by traced hh or person id label: str tracer name Nothing
activitysim.core.tracing.write_csv(df, file_name, index_label=None, columns=None, column_labels=None, transpose=True)

Print write_csv

Parameters: df: pandas.DataFrame or pandas.Series traced dataframe file_name: str output file name index_label: str index name columns: list columns to write transpose: bool whether to transpose dataframe (ignored for series) Returns ——- Nothing

## Utility Expressions¶

Much of the power of ActivitySim comes from being able to specify Python, pandas, and numpy expressions for calculations. Refer to the pandas help for a general introduction to expressions. ActivitySim provides two ways to evaluate expressions:

• Simple table expressions are evaluated using DataFrame.eval(). pandas’ eval operates on the current table.
• Python expressions, denoted by beginning with @, are evaluated with Python’s eval().

Simple table expressions can only refer to columns in the current DataFrame. Python expressions can refer to any Python objects urrently in memory.

### Conventions¶

There are a few conventions for writing expressions in ActivitySim:

• each expression is applied to all rows in the table being operated on
• expressions must be vectorized expressions and can use most numpy and pandas expressions
• global constants are specified in the settings file
• comments are specified with #
• you can refer to the current table being operated on as df
• often an object called skims, skims_od, or similar is available and is used to lookup the relevant skim information. See Skim for more information.
• when editing the CSV files in Excel, use single quote ‘ or space at the start of a cell to get Excel to accept the expression

### Example Expressions File¶

An expressions file has the following basic form:

Description Expression cars0 cars1
2 Adults (age 16+) drivers==2 0 3.0773
Persons age 35-34 num_young_adults 0 -0.4849
Number of workers, capped at 3 @df.workers.clip(upper=3) 0 0.2936
Distance, from 0 to 1 miles @skims[‘DIST’].clip(1) -3.2451 -0.9523
• Rows are vectorized expressions that will be calculated for every record in the current table being operated on
• The Description column describes the expression
• The Expression column contains a valid vectorized Python/pandas/numpy expression. In the example above, drivers is a column in the current table. Use @ to refer to data outside the current table
• There is a column for each alternative and its relevant coefficient

There are some variations on this setup, but the functionality is similar. For example, in the example destination choice model, the size terms expressions file has market segments as rows and employment type coefficients as columns. Broadly speaking, there are currently four types of model expression configurations:

• Simple Simulate choice model - select from a fixed set of choices defined in the specification file, such as the example above.
• Simulate with Interaction choice model - combine the choice expressions with the choice alternatives files since the alternatives are not listed in the expressions file. The Non-Mandatory Tour Destination Choice model implements this approach.
• Complex choice model - an expressions file, a coefficients file, and a YAML settings file with model structural definition. The Tour Mode Choice models are examples of this and are illustrated below.
• Combinatorial choice model - first generate a set of alternatives based on a combination of alternatives across choosers, and then make choices. The Coordinated Daily Activity Pattern model implements this approach.

The Tour Mode Choice model is a complex choice model since the expressions file is structured a little bit differently, as shown below. Each row is an expression for one of the alternatives, and each column contains either -999, 1, or blank. The coefficients for each expression is in a separate file, with a separate column for each alternative. In the example below, the @c_ivt*(@odt_skims['SOV_TIME'] + dot_skims['SOV_TIME']) expression is travel time for the tour origin to desination at the tour start time plus the tour destination to tour origin at the tour end time. The odt_skims and dot_skims objects are setup ahead-of-time to refer to the relevant skims for this model. The @c_ivt comes from the tour mode choice coefficient file. The tour mode choice model is a nested logit (NL) model and the nesting structure (including nesting coefficients) is specified in the YAML settings file.

Description Expression DRIVEALONEFREE DRIVEALONEPAY
DA - Unavailable sov_available == False -999
DA - In-vehicle time @c_ivt*(@odt_skims[‘SOV_TIME’] + dot_skims[‘SOV_TIME’]) 1
DAP - Unavailable for age less than 16 age < 16   -999
DAP - Unavailable for joint tours is_joint == True   -999

### Sampling with Interaction¶

Methods for expression handling, solving, and sampling (i.e. making multiple choices), with interaction with the chooser table.

Sampling is done with replacement and a sample correction factor is calculated. The factor is calculated as follows:

freq = how often an alternative is sampled (i.e. the pick_count)
prob = probability of the alternative
correction_factor = log(freq/prob)

#for example:

freq              1.00        2.00    3.00    4.00    5.00
prob              0.30        0.30    0.30    0.30    0.30
correction factor 1.20        1.90    2.30    2.59    2.81


As the alternative is oversampled, its utility goes up for final selection. The unique set of alternatives is passed to the final choice model and the correction factor is included in the utility.

#### API¶

activitysim.core.interaction_sample.interaction_sample(choosers, alternatives, spec, sample_size, alt_col_name=None, allow_zero_probs=False, skims=None, locals_d=None, chunk_size=0, trace_label=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters: choosers : pandas.DataFrame DataFrame of choosers alternatives : pandas.DataFrame DataFrame of alternatives - will be merged with choosers and sampled spec : pandas.DataFrame A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients. sample_size : int, optional Sample alternatives with sample of given size. By default is None, which does not sample alternatives. alt_col_name: str or None name to give the sampled_alternative column skims : Skims object The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used. locals_d : Dict This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ chunk_size : int if chunk_size > 0 iterates over choosers in chunk_size chunks trace_label: str This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None. choices_df : pandas.DataFrame A DataFrame where index should match the index of the choosers DataFrame (except with sample_size rows for each choser row, one row for each alt sample) and columns alt_col_name, prob, rand, pick_count : alt identifier from alternatives[ prob: float the probability of the chosen alternative pick_count : int number of duplicate picks for chooser, alt
activitysim.core.interaction_sample.make_sample_choices(choosers, probs, alternatives, sample_size, alternative_count, alt_col_name, allow_zero_probs, trace_label)
Parameters: choosers probs : pandas DataFrame one row per chooser and one column per alternative alternatives dataframe with index containing alt ids sample_size : int number of samples/choices to make alternative_count alt_col_name : str trace_label

### Simulate¶

Methods for expression handling, solving, choosing (i.e. making choices) from a fixed set of choices defined in the specification file.

#### API¶

activitysim.core.simulate.compute_base_probabilities(nested_probabilities, nests, spec)

compute base probabilities for nest leaves Base probabilities will be the nest-adjusted probabilities of all leaves This flattens or normalizes all the nested probabilities so that they have the proper global relative values (the leaf probabilities sum to 1 for each row.)

Parameters: nested_probabilities : pandas.DataFrame dataframe with the nested probabilities for nest leafs and nodes nest_spec : dict Nest tree dict from the model spec yaml file spec : pandas.Dataframe simple simulate spec so we can return columns in appropriate order Returns ——- base_probabilities : pandas.DataFrame Will have the index of nested_probabilities and columns for leaf base probabilities
activitysim.core.simulate.compute_nested_exp_utilities(raw_utilities, nest_spec)

compute exponentiated nest utilities based on nesting coefficients

For nest nodes this is the exponentiated logsum of alternatives adjusted by nesting coefficient

leaf <- exp( raw_utility ) nest <- exp( ln(sum of exponentiated raw_utility of leaves) * nest_coefficient)

Parameters: raw_utilities : pandas.DataFrame dataframe with the raw alternative utilities of all leaves (what in non-nested logit would be the utilities of all the alternatives) nest_spec : dict Nest tree dict from the model spec yaml file nested_utilities : pandas.DataFrame Will have the index of raw_utilities and columns for exponentiated leaf and node utilities
activitysim.core.simulate.compute_nested_probabilities(nested_exp_utilities, nest_spec, trace_label)

compute nested probabilities for nest leafs and nodes probability for nest alternatives is simply the alternatives’s local (to nest) probability computed in the same way as the probability of non-nested alternatives in multinomial logit i.e. the fractional share of the sum of the exponentiated utility of itself and its siblings except in nested logit, its sib group is restricted to the nest

Parameters: nested_exp_utilities : pandas.DataFrame dataframe with the exponentiated nested utilities of all leaves and nodes nest_spec : dict Nest tree dict from the model spec yaml file Returns ——- nested_probabilities : pandas.DataFrame Will have the index of nested_exp_utilities and columns for leaf and node probabilities
activitysim.core.simulate.eval_mnl(choosers, spec, locals_d, custom_chooser, trace_label=None, trace_choice_name=None)

Run a simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

Each row in spec computes a partial utility for each alternative, by providing a spec expression (often a boolean 0-1 trigger) and a column of utility coefficients for each alternative.

We compute the utility of each alternative by matrix-multiplication of eval results with the utility coefficients in the spec alternative columns yielding one row per chooser and one column per alternative

Parameters: choosers : pandas.DataFrame spec : pandas.DataFrame A table of variable specifications and coefficient values. Variable expressions should be in the table index and the table should have a column for each alternative. locals_d : Dict or None This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ custom_chooser : function(probs, choosers, spec, trace_label) returns choices, rands custom alternative to logit.make_choices trace_label: str This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None. trace_choice_name: str This is the column label to be used in trace file csv dump of choices choices : pandas.Series Index will be that of choosers, values will match the columns of spec.
activitysim.core.simulate.eval_mnl_logsums(choosers, spec, locals_d, trace_label=None)

like eval_nl except return logsums instead of making choices

Returns: logsums : pandas.Series Index will be that of choosers, values will be logsum across spec column values
activitysim.core.simulate.eval_nl(choosers, spec, nest_spec, locals_d, custom_chooser, trace_label=None, trace_choice_name=None)

Run a nested-logit simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

Parameters: choosers : pandas.DataFrame spec : pandas.DataFrame A table of variable specifications and coefficient values. Variable expressions should be in the table index and the table should have a column for each alternative. nest_spec: dictionary specifying nesting structure and nesting coefficients (from the model spec yaml file) locals_d : Dict or None This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ custom_chooser : function(probs, choosers, spec, trace_label) returns choices, rands custom alternative to logit.make_choices trace_label: str This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None. trace_choice_name: str This is the column label to be used in trace file csv dump of choices choices : pandas.Series Index will be that of choosers, values will match the columns of spec.
activitysim.core.simulate.eval_nl_logsums(choosers, spec, nest_spec, locals_d, trace_label=None)

like eval_nl except return logsums instead of making choices

Returns: logsums : pandas.Series Index will be that of choosers, values will be nest logsum based on spec column values
activitysim.core.simulate.eval_variables(exprs, df, locals_d=None, target_type=<type 'numpy.float64'>)

Evaluate a set of variable expressions from a spec in the context of a given data table.

There are two kinds of supported expressions: “simple” expressions are evaluated in the context of the DataFrame using DataFrame.eval. This is the default type of expression.

Python expressions are evaluated in the context of this function using Python’s eval function. Because we use Python’s eval this type of expression supports more complex operations than a simple expression. Python expressions are denoted by beginning with the @ character. Users should take care that these expressions must result in a Pandas Series.

Parameters: exprs : sequence of str df : pandas.DataFrame locals_d : Dict This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ target_type: dtype or None type to coerce results or None if no coercion desired variables : pandas.DataFrame Will have the index of df and columns of eval results of exprs.
activitysim.core.simulate.read_model_spec(fpath, fname, description_name='Description', expression_name='Expression')

Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions and expressions, plus one or more alternatives.

The CSV is required to have a header with column names. For example:

Description,Expression,alt0,alt1,alt2
Parameters: fpath : str path to directory containing file. fname : str Name of a CSV spec file description_name : str, optional Name of the column in fname that contains the component description. expression_name : str, optional Name of the column in fname that contains the component expression. spec : pandas.DataFrame The description column is dropped from the returned data and the expression values are set as the table index.
activitysim.core.simulate.set_skim_wrapper_targets(df, skims)

Add the dataframe to the SkimDictWrapper object so that it can be dereferenced using the parameters of the skims object.

Parameters: df : pandas.DataFrame Table to which to add skim data as new columns. df is modified in-place. skims : SkimDictWrapper or SkimStackWrapper object, or a list or dict of skims The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used.
activitysim.core.simulate.simple_simulate(choosers, spec, nest_spec, skims=None, locals_d=None, chunk_size=0, custom_chooser=None, trace_label=None, trace_choice_name=None)

Run an MNL or NL simulation for when the model spec does not involve alternative specific data, e.g. there are no interactions with alternative properties and no need to sample from alternatives.

activitysim.core.simulate.simple_simulate_logsums(choosers, spec, nest_spec, skims=None, locals_d=None, chunk_size=0, trace_label=None)

like simple_simulate except return logsums instead of making choices

Returns: logsums : pandas.Series Index will be that of choosers, values will be nest logsum based on spec column values
activitysim.core.simulate.simple_simulate_logsums_rpc(chunk_size, choosers, spec, nest_spec, trace_label)

calculate rows_per_chunk for simple_simulate_logsums

activitysim.core.simulate.simple_simulate_rpc(chunk_size, choosers, spec, nest_spec, trace_label)

rows_per_chunk calculator for simple_simulate

### Simulate with Interaction¶

Methods for expression handling, solving, choosing (i.e. making choices), with interaction with the chooser table.

#### API¶

activitysim.core.interaction_simulate.eval_interaction_utilities(spec, df, locals_d, trace_label, trace_rows)

Compute the utilities for a single-alternative spec evaluated in the context of df

We could compute the utilities for interaction datasets just as we do for simple_simulate specs with multiple alternative columns byt calling eval_variables and then computing the utilities by matrix-multiplication of eval results with the utility coefficients in the spec alternative columns.

But interaction simulate computes the utilities of each alternative in the context of a separate row in interaction dataset df, and so there is only one alternative in spec. This turns out to be quite a bit faster (in this special case) than the pandas dot function.

For efficiency, we combine eval_variables and multiplication of coefficients into a single step, so we don’t have to create a separate column for each partial utility. Instead, we simply multiply the eval result by a single alternative coefficient and sum the partial utilities.

spec : dataframe
one row per spec expression and one col with utility coefficient
df : dataframe
cross join (cartesian product) of choosers with alternatives combines columns of choosers and alternatives len(df) == len(choosers) * len(alternatives) index values (non-unique) are index values from alternatives df
interaction_utilities : dataframe
the utility of each alternative is sum of the partial utilities determined by the various spec expressions and their corresponding coefficients yielding a dataframe with len(interaction_df) rows and one utility column having the same index as interaction_df (non-unique values from alternatives df)
Returns: utilities : pandas.DataFrame Will have the index of df and a single column of utilities
activitysim.core.interaction_simulate.interaction_simulate(choosers, alternatives, spec, skims=None, locals_d=None, sample_size=None, chunk_size=0, trace_label=None, trace_choice_name=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters: choosers : pandas.DataFrame DataFrame of choosers alternatives : pandas.DataFrame DataFrame of alternatives - will be merged with choosers, currently without sampling spec : pandas.DataFrame A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients. skims : Skims object The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used. locals_d : Dict This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ sample_size : int, optional Sample alternatives with sample of given size. By default is None, which does not sample alternatives. chunk_size : int if chunk_size > 0 iterates over choosers in chunk_size chunks trace_label: str This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None. trace_choice_name: str This is the column label to be used in trace file csv dump of choices choices : pandas.Series A series where index should match the index of the choosers DataFrame and values will match the index of the alternatives DataFrame - choices are simulated in the standard Monte Carlo fashion

### Simulate with Sampling and Interaction¶

Methods for expression handling, solving, sampling (i.e. making multiple choices), and choosing (i.e. making choices), with interaction with the chooser table.

#### API¶

activitysim.core.interaction_sample_simulate.interaction_sample_simulate(choosers, alternatives, spec, choice_column, allow_zero_probs=False, zero_prob_choice_val=None, skims=None, locals_d=None, chunk_size=0, trace_label=None, trace_choice_name=None)

Run a simulation in the situation in which alternatives must be merged with choosers because there are interaction terms or because alternatives are being sampled.

optionally (if chunk_size > 0) iterates over choosers in chunk_size chunks

Parameters: choosers : pandas.DataFrame DataFrame of choosers alternatives : pandas.DataFrame DataFrame of alternatives - will be merged with choosers index domain same as choosers, but repeated for each alternative spec : pandas.DataFrame A Pandas DataFrame that gives the specification of the variables to compute and the coefficients for each variable. Variable specifications must be in the table index and the table should have only one column of coefficients. skims : Skims object The skims object is used to contain multiple matrices of origin-destination impedances. Make sure to also add it to the locals_d below in order to access it in expressions. The only job of this method in regards to skims is to call set_df with the dataframe that comes back from interacting choosers with alternatives. See the skims module for more documentation on how the skims object is intended to be used. locals_d : Dict This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ chunk_size : int if chunk_size > 0 iterates over choosers in chunk_size chunks trace_label: str This is the label to be used for trace log file entries and dump file names when household tracing enabled. No tracing occurs if label is empty or None. trace_choice_name: str This is the column label to be used in trace file csv dump of choices choices : pandas.Series A series where index should match the index of the choosers DataFrame and values will match the index of the alternatives DataFrame - choices are simulated in the standard Monte Carlo fashion

### Assign¶

Alternative version of the expression evaluators in activitysim.core.simulate that supports temporary variable assignment. Temporary variables are identified in the expressions as starting with “_”, such as “_hh_density_bin”. These fields are not saved to the data pipeline store. This feature is used by the Accessibility model.

#### API¶

activitysim.core.assign.assign_variables(assignment_expressions, df, locals_dict, df_alias=None, trace_rows=None)

Evaluate a set of variable expressions from a spec in the context of a given data table.

Expressions are evaluated using Python’s eval function. Python expressions have access to variables in locals_d (and df being accessible as variable df.) They also have access to previously assigned targets as the assigned target name.

lowercase variables starting with underscore are temp variables (e.g. _local_var) and not returned except in trace_results

uppercase variables starting with underscore are temp scalar variables (e.g. _LOCAL_SCALAR) and not returned except in trace_assigned_locals This is useful for defining general purpose local constants in expression file

Users should take care that expressions (other than temp scalar variables) should result in a Pandas Series (scalars will be automatically promoted to series.)

Parameters: assignment_expressions : pandas.DataFrame of target assignment expressions target: target column names expression: pandas or python expression to evaluate df : pandas.DataFrame locals_d : Dict This is a dictionary of local variables that will be the environment for an evaluation of “python” expression. trace_rows: series or array of bools to use as mask to select target rows to trace variables : pandas.DataFrame Will have the index of df and columns named by target and containing the result of evaluating expression trace_df : pandas.DataFrame or None a dataframe containing the eval result values for each assignment expression
activitysim.core.assign.evaluate_constants(expressions, constants)

Evaluate a list of constant expressions - each one can depend on the one before it. These are usually used for the coefficients which have relationships to each other. So ivt=.7 and then ivt_lr=ivt*.9.

Parameters: expressions : Series the index are the names of the expressions which are used in subsequent evals - thus naming the expressions is required. constants : dict will be passed as the scope of eval - usually a separate set of constants are passed in here d : dict
activitysim.core.assign.local_utilities()

Dict of useful modules and functions to provides as locals for use in eval of expressions

Returns: utility_dict : dict name, entity pairs of locals
activitysim.core.assign.read_assignment_spec(fname, description_name='Description', target_name='Target', expression_name='Expression')

Read a CSV model specification into a Pandas DataFrame or Series.

The CSV is expected to have columns for component descriptions targets, and expressions,

The CSV is required to have a header with column names. For example:

Description,Target,Expression
Parameters: fname : str Name of a CSV spec file. description_name : str, optional Name of the column in fname that contains the component description. target_name : str, optional Name of the column in fname that contains the component target. expression_name : str, optional Name of the column in fname that contains the component expression. spec : pandas.DataFrame dataframe with three columns: [‘description’ ‘target’ ‘expression’]
activitysim.core.assign.undupe_column_names(df, template='{} ({})')

rename df column names so there are no duplicates (in place)

e.g. if there are two columns named “dog”, the second column will be reformatted to “dog (2)”

Parameters: df : pandas.DataFrame dataframe whose column names should be de-duplicated template : template taking two arguments (old_name, int) to use to rename columns df : pandas.DataFrame dataframe that was renamed in place, for convenience in chaining

## Choice Models¶

### Logit¶

Multinomial logit (MNL) or Nested logit (NL) choice model. These choice models depend on the foundational components of ActivitySim, such as the expressions and data handling described in the Execution Flow section.

To specify and solve an MNL model:

• either specify LOGIT_TYPE: MNL in the model configuration YAML file or omit the setting
• call either simulate.simple_simulate() or simulate.interaction_simulate() depending if the alternatives are interacted with the choosers or because alternatives are sampled

To specify and solve an NL model:

• specify LOGIT_TYPE: NL in the model configuration YAML file
• specify the nesting structure via the NESTS setting in the model configuration YAML file. An example nested logit NESTS entry can be found in example/configs/tour_mode_choice.yaml
• call simulate.simple_simulate(). The simulate.interaction_simulate() functionality is not yet supported for NL.

#### API¶

class activitysim.core.logit.Nest(name=None, level=0)

Data for a nest-logit node or leaf

This object is passed on yield when iterate over nest nodes (branch or leaf) The nested logit design is stored in a yaml file as a tree of dict objects, but using an object to pass the nest data makes the code a little more readable

An example nest specification is in the example tour mode choice model yaml configuration file - example/configs/tour_mode_choice.yaml.

activitysim.core.logit.count_nests(nest_spec, type=None)

count the nests of the specified type (or all nests if type is None) return 0 if nest_spec is none

activitysim.core.logit.each_nest(nest_spec, type=None, post_order=False)

Iterate over each nest or leaf node in the tree (of subtree)

Parameters: nest_spec : dict Nest tree dict from the model spec yaml file type : str Nest class type to yield None yields all nests ‘leaf’ yields only leaf nodes ‘branch’ yields only branch nodes post_order : Bool Should we iterate over the nodes of the tree in post-order or pre-order? (post-order means we yield the alternatives sub-tree before current node.) nest : Nest Nest object with info about the current node (nest or leaf)
activitysim.core.logit.interaction_dataset(choosers, alternatives, sample_size=None)

Combine choosers and alternatives into one table for the purposes of creating interaction variables and/or sampling alternatives.

Any duplicate column names in alternatives table will be renamed with an ‘_r’ suffix. (e.g. TAZ field in alternatives will appear as TAZ_r so that it can be targeted in a skim)

Parameters: choosers : pandas.DataFrame alternatives : pandas.DataFrame sample_size : int, optional If sampling from alternatives for each chooser, this is how many to sample. alts_sample : pandas.DataFrame Merged choosers and alternatives with data repeated either len(alternatives) or sample_size times.
activitysim.core.logit.make_choices(probs, trace_label=None, trace_choosers=None)

Make choices for each chooser from among a set of alternatives.

Parameters: probs : pandas.DataFrame Rows for choosers and columns for the alternatives from which they are choosing. Values are expected to be valid probabilities across each row, e.g. they should sum to 1. trace_choosers : pandas.dataframe the choosers df (for interaction_simulate) to facilitate the reporting of hh_id by report_bad_choices because it can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df choices : pandas.Series Maps chooser IDs (from probs index) to a choice, where the choice is an index into the columns of probs. rands : pandas.Series The random numbers used to make the choices (for debugging, tracing)
activitysim.core.logit.report_bad_choices(bad_row_map, df, trace_label, msg, trace_choosers=None, raise_error=True)
Parameters: bad_row_map df : pandas.DataFrame utils or probs dataframe msg : str message describing the type of bad choice that necessitates error being thrown trace_choosers : pandas.dataframe the choosers df (for interaction_simulate) to facilitate the reporting of hh_id because we can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df raises RuntimeError
activitysim.core.logit.utils_to_probs(utils, trace_label=None, exponentiated=False, allow_zero_probs=False, trace_choosers=None)

Convert a table of utilities to probabilities.

Parameters: utils : pandas.DataFrame Rows should be choosers and columns should be alternatives. trace_label : str label for tracing bad utility or probability values exponentiated : bool True if utilities have already been exponentiated allow_zero_probs : bool if True value rows in which all utility alts are EXP_UTIL_MIN will result in rows in probs to have all zero probability (and not sum to 1.0) This is for the benefit of calculating probabilities of nested logit nests trace_choosers : pandas.dataframe the choosers df (for interaction_simulate) to facilitate the reporting of hh_id by report_bad_choices because it can’t deduce hh_id from the interaction_dataset which is indexed on index values from alternatives df probs : pandas.DataFrame Will have the same index and columns as utils.

## Person Time Windows¶

The departure time and duration models require person time windows. Time windows are adjacent time periods that are available for travel. Time windows are stored in a timetable table and each row is a person and each time period (in the case of MTC TM1 is 5am to midnight in 1 hr increments) is a column. Each column is coded as follows:

• 0 - unscheduled, available
• 2 - scheduled, start of a tour, is available as the last period of another tour
• 4 - scheduled, end of a tour, is available as the first period of another tour
• 6 - scheduled, end or start of a tour, available for this period only
• 7 - scheduled, unavailable, middle of a tour

A good example of a time window expression is @tt.previous_tour_ends(df.person_id, df.start). This uses the person id and the tour start period to check if a previous tour ends in the same time period.

### API¶

class activitysim.core.timetable.TimeTable(windows_df, tdd_alts_df, table_name=None)
tdd_alts_df      tdd_footprints_df
start  end      '0' '1' '2' '3' '4'...
5      5    ==>  0   6   0   0   0 ...
5      6    ==>  0   2   4   0   0 ...
5      7    ==>  0   2   7   4   0 ...

adjacent_window_after(window_row_ids, periods)

Return number of adjacent periods after specified period that are available (not in the middle of another tour.)

Implements MTC TM1 macro @@adjWindowAfterThisPeriodAlt Function name is kind of a misnomer, but parallels that used in MTC TM1 UECs

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant pandas Series int Number of adjacent windows indexed by window_row_ids.index
adjacent_window_before(window_row_ids, periods)

Return number of adjacent periods before specified period that are available (not in the middle of another tour.)

Implements MTC TM1 macro @@getAdjWindowBeforeThisPeriodAlt Function name is kind of a misnomer, but parallels that used in MTC TM1 UECs

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant pandas Series int Number of adjacent windows indexed by window_row_ids.index
adjacent_window_run_length(window_row_ids, periods, before)

Return the number of adjacent periods before or after specified period that are available (not in the middle of another tour.)

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant before : bool Specify desired run length is of adjacent window before (True) or after (False)
assign(window_row_ids, tdds)

Assign tours (represented by tdd alt ids) to persons

Updates self.windows numpy array. Assignments will not ‘take’ outside this object until/unless replace_table called or updated timetable retrieved by get_windows_df

Parameters: window_row_ids : pandas Series series of window_row_ids indexed by tour_id tdds : pandas series series of tdd_alt ids, index irrelevant
assign_footprints(window_row_ids, footprints)

assign footprints for specified window_row_ids

This method is used for initialization of joint_tour timetables based on the combined availability of the joint tour participants

Parameters: window_row_ids : pandas Series series of window_row_ids index irrelevant, but we want to use map() footprints : numpy array with one row per window_row_id and one column per time period
assign_subtour_mask(window_row_ids, tdds)
index     window_row_ids   tdds
20973389  20973389           26
44612864  44612864            3
48954854  48954854            7

tour footprints
[[0 0 2 7 7 7 7 7 7 4 0 0 0 0 0 0 0 0 0 0 0]
[0 2 7 7 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 2 7 7 7 7 7 7 4 0 0 0 0 0 0 0 0 0 0 0 0]]

[[7 7 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7]
[7 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7]
[7 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7]]

previous_tour_begins(window_row_ids, periods)

Does a previously scheduled tour begin in the specified period?

Implements MTC TM1 @@prevTourBeginsThisArrivalPeriodAlt

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant pandas Series boolean indexed by window_row_ids.index
previous_tour_ends(window_row_ids, periods)

Does a previously scheduled tour end in the specified period?

Implements MTC TM1 @@prevTourEndsThisDeparturePeriodAlt

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant (one period per window_row_id) pandas Series boolean indexed by window_row_ids.index
remaining_periods_available(window_row_ids, starts, ends)

Determine number of periods remaining available after the time window from starts to ends is hypothetically scheduled

Implements MTC TM1 @@remainingPeriodsAvailableAlt

The start and end periods will always be available after scheduling, so ignore them. The periods between start and end must be currently unscheduled, so assume they will become unavailable after scheduling this window.

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id starts : pandas series int series of tdd_alt ids, index irrelevant (one per window_row_id) ends : pandas series int series of tdd_alt ids, index irrelevant (one per window_row_id) available : pandas Series int number periods available indexed by window_row_ids.index
replace_table()

Save or replace windows_df DataFrame to pipeline with saved table name (specified when object instantiated.)

This is a convenience function in case caller instantiates object in one context (e.g. dependency injection) where it knows the pipeline table name, but wants to checkpoint the table in another context where it does not know that name.

slice_windows_by_row_id(window_row_ids)

return windows array slice containing rows for specified window_row_ids (in window_row_ids order)

tour_available(window_row_ids, tdds)

test whether time window allows tour with specific tdd alt’s time window

Parameters: window_row_ids : pandas Series series of window_row_ids indexed by tour_id tdds : pandas series series of tdd_alt ids, index irrelevant available : pandas Series of bool with same index as window_row_ids.index (presumably tour_id, but we don’t care)
window_periods_in_states(window_row_ids, periods, states)

Return boolean array indicating whether specified window periods are in list of states.

Internal DRY method to implement previous_tour_ends and previous_tour_begins

Parameters: window_row_ids : pandas Series int series of window_row_ids indexed by tour_id periods : pandas series int series of tdd_alt ids, index irrelevant (one period per window_row_id) states : list of int presumably (e.g. I_EMPTY, I_START…) pandas Series boolean indexed by window_row_ids.index
activitysim.core.timetable.create_timetable_windows(rows, tdd_alts)

create an empty (all available) timetable with one window row per rows.index

Parameters: rows - pd.DataFrame or Series or orca.DataFrameWrapper all we care about is the index tdd_alts - pd.DataFrame We expect a start and end column, and create a timetable to accomodate all alts (with on window of padding at each end) so if start is 5 and end is 23, we return something like this: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 PERID 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pd.DataFrame indexed by rows.index, and one column of int8 for each time window (plus padding)

## Helpers¶

### Chunk¶

Chunking management

#### API¶

activitysim.core.chunk.chunked_choosers_and_alts(choosers, alternatives, rows_per_chunk)

generator to iterate over choosers and alternatives in chunk_size chunks

like chunked_choosers, but also chunks alternatives for use with sampled alternatives which will have different alternatives (and numbers of alts)

There may be up to sample_size (or as few as one) alternatives for each chooser because alternatives may have been sampled more than once, but pick_count for those alternatives will always sum to sample_size.

When we chunk the choosers, we need to take care chunking the alternatives as there are varying numbers of them for each chooser. Since alternatives appear in the same order as choosers, we can use cumulative pick_counts to identify boundaries of sets of alternatives

Parameters: choosers alternatives : pandas DataFrame sample alternatives including pick_count column in same order as choosers rows_per_chunk : int i : int one-based index of current chunk num_chunks : int total number of chunks that will be yielded choosers : pandas DataFrame slice chunk of choosers alternatives : pandas DataFrame slice chunk of alternatives for chooser chunk

### Utilities¶

Vectorized helper functions

#### API¶

activitysim.core.util.assign_in_place(df, df2)

update existing row values in df from df2, adding columns to df if they are not there

Parameters: df : pd.DataFrame assignment left-hand-side (dest) df2: pd.DataFrame assignment right-hand-side (source) Returns ——-
activitysim.core.util.left_merge_on_index_and_col(left_df, right_df, join_col, target_col)

like pandas left merge, but join on both index and a specified join_col

FIXME - for now return a series of ov values from specified right_df target_col

Parameters: left_df : pandas DataFrame index name assumed to be same as that of right_df right_df : pandas DataFrame index name assumed to be same as that of left_df join_col : str name of column to join on (in addition to index values) should have same name in both dataframes target_col : str name of column from right_df whose joined values should be returned as series target_series : pandas Series series of target_col values with same index as left_df i.e. values joined to left_df from right_df with index of left_df
activitysim.core.util.other_than(groups, bools)

Construct a Series that has booleans indicating the presence of something- or someone-else with a certain property within a group.

Parameters: groups : pandas.Series A column with the same index as bools that defines the grouping of bools. The bools Series will be used to index groups and then the grouped values will be counted. bools : pandas.Series A boolean Series indicating where the property of interest is present. Should have the same index as groups. others : pandas.Series A boolean Series with the same index as groups and bools indicating whether there is something- or something-else within a group with some property (as indicated by bools).
activitysim.core.util.quick_loc_df(loc_list, target_df, attribute=None)

faster replacement for target_df.loc[loc_list] or target_df.loc[loc_list][attribute]

pandas DataFrame.loc[] indexing doesn’t scale for large arrays (e.g. > 1,000,000 elements)

Parameters: loc_list : list-like (numpy.ndarray, pandas.Int64Index, or pandas.Series) target_df : pandas.DataFrame containing column named attribute attribute : name of column from loc_list to return (or none for all columns) pandas.DataFrame or, if attribbute specified, pandas.Series
activitysim.core.util.quick_loc_series(loc_list, target_series)

faster replacement for target_series.loc[loc_list]

pandas Series.loc[] indexing doesn’t scale for large arrays (e.g. > 1,000,000 elements)

Parameters: loc_list : list-like (numpy.ndarray, pandas.Int64Index, or pandas.Series) target_series : pandas.Series pandas.Series
activitysim.core.util.reindex(series1, series2)

This reindexes the first series by the second series. This is an extremely common operation that does not appear to be in Pandas at this time. If anyone knows of an easier way to do this in Pandas, please inform the UrbanSim developers.

The canonical example would be a parcel series which has an index which is parcel_ids and a value which you want to fetch, let’s say it’s land_area. Another dataset, let’s say of buildings has a series which indicate the parcel_ids that the buildings are located on, but which does not have land_area. If you pass parcels.land_area as the first series and buildings.parcel_id as the second series, this function returns a series which is indexed by buildings and has land_area as values and can be added to the buildings dataset.

In short, this is a join on to a different table using a foreign key stored in the current table, but with only one attribute rather than for a full dataset.

This is very similar to the pandas “loc” function or “reindex” function, but neither of those functions return the series indexed on the current table. In both of those cases, the series would be indexed on the foreign table and would require a second step to change the index.

Parameters: series1, series2 : pandas.Series reindexed : pandas.Series

### Config¶

Helper functions for configuring a model run

#### API¶

activitysim.core.config.get_logit_model_settings(model_settings)

Read nest spec (for nested logit) from model settings file

Returns: nests : dict dictionary specifying nesting structure and nesting coefficients constants : dict dictionary of constants to add to locals for use by expressions in model spec
activitysim.core.config.get_model_constants(model_settings)

Read constants from model settings file

Returns: constants : dict dictionary of constants to add to locals for use by expressions in model spec
activitysim.core.config.handle_standard_args(parser=None)
–config : specify path to config_dir –output : specify path to output_dir –data : specify path to data_dir
Parameters: parser : argparse.ArgumentParser or None to custom argument handling, pass in a parser with arguments added and handle them based on returned args. This method will hand the args it adds Returns ——- args : parser.parse_args() result

### Inject¶

Wrap orca class to make it easier to track and manage interaction with the data pipeline.

#### API¶

activitysim.core.inject.reinject_decorated_tables()

reinject the decorated tables (and columns)

### Inject_Defaults¶

Default file and folder settings are injected into the orca model runner if needed.

#### API¶

activitysim.core.inject_defaults.pipeline_path(output_dir, settings)

Orca injectable to return the path to the pipeline hdf5 file based on output_dir and settings

### Output¶

Write output files.

#### API¶

activitysim.core.steps.output.write_data_dictionary(output_dir)

Write table_name, number of rows, columns, and bytes for each checkpointed table

Parameters: output_dir: str
activitysim.core.steps.output.write_tables(output_dir)

Write pipeline tables as csv files (in output directory) as specified by output_tables list in settings file.

‘output_tables’ can specify either a list of output tables to include or to skip if no output_tables list is specified, then no checkpointed tables will be written

To write all output tables EXCEPT the households and persons tables:

output_tables:
action: skip
tables:
- households
- persons


To write ONLY the households table:

output_tables:
action: include
tables:
- households

Parameters: output_dir: str

### Tests¶

See activitysim.core.test