Models#
The currently implemented example ActivitySim AB models are described below. See the example model Sub-Model Specification Files, Example ARC Sub-Model Specification Files, and Example SEMCOG Sub-Model Specification Files for more information.
Initialize#
The initialize model isn’t really a model, but rather a few data processing steps in the data pipeline. The initialize data processing steps code variables used in downstream models, such as household and person value-of-time. This step also pre-loads the land_use, households, persons, and person_windows tables because random seeds are set differently for each step and therefore the sampling of households depends on which step they are initially loaded in.
The main interface to the initialize land use step is the initialize_landuse()
function. The main interface to the initialize household step is the initialize_households()
function. The main interface to the initialize tours step is the initialize_tours()
function. These functions are registered as Inject steps in the example Pipeline.
- activitysim.abm.models.initialize.preload_injectables()#
preload bulky injectables up front - stuff that isn’t inserted into the pipeline
Initialize LOS#
The initialize LOS model isn’t really a model, but rather a series of data processing steps in the data pipeline. The initialize LOS model does two things:
Loads skims and cache for later if desired
Loads network LOS inputs for transit virtual path building (see Transit Virtual Path Builder), pre-computes tap-to-tap total utilities and cache for later if desired
The main interface to the initialize LOS step is the initialize_los()
function. The main interface to the initialize TVPB step is the initialize_tvpb()
function. These functions are registered as Inject steps in the example Pipeline.
- activitysim.abm.models.initialize_los.initialize_los(network_los)#
Currently, this step is only needed for THREE_ZONE systems in which the tap_tap_utilities are precomputed in the (presumably subsequent) initialize_tvpb step.
Adds attribute_combinations_df table to the pipeline so that it can be used to as the slicer for multiprocessing the initialize_tvpb s.tep
FIXME - this step is only strictly necessary when multiprocessing, but initialize_tvpb would need to be tweaked FIXME - to instantiate attribute_combinations_df if the pipeline table version were not available.
- activitysim.abm.models.initialize_los.initialize_tvpb(network_los, attribute_combinations, chunk_size)#
Initialize STATIC tap_tap_utility cache and write mmap to disk.
uses pipeline attribute_combinations table created in initialize_los to determine which attribute tuples to compute utilities for.
if we are single-processing, this will be the entire set of attribute tuples required to fully populate cache
if we are multiprocessing, then the attribute_combinations will have been sliced and we compute only a subset of the tuples (and the other processes will compute the rest). All process wait until the cache is fully populated before returning, and the locutor process writes the results.
FIXME - if we did not close this, we could avoid having to reload it from mmap when single-process?
Accessibility#
The accessibilities model is an aggregate model that calculates multiple origin-based accessibility measures by origin zone to all destination zones.
The accessibility measure first multiplies an employment variable by a mode-specific decay function. The product reflects the difficulty of accessing the activities the farther (in terms of round-trip travel time) the jobs are from the location in question. The products to each destination zone are next summed over each origin zone, and the logarithm of the product mutes large differences. The decay function on the walk accessibility measure is steeper than automobile or transit. The minimum accessibility is zero.
Level-of-service variables from three time periods are used, specifically the AM peak period (6 am to 10 am), the midday period (10 am to 3 pm), and the PM peak period (3 pm to 7 pm).
Inputs
Highway skims for the three periods. Each skim is expected to include a table named “TOLLTIMEDA”, which is the drive alone in-vehicle travel time for automobiles willing to pay a “value” (time-savings) toll.
Transit skims for the three periods. Each skim is expected to include the following tables: (i) “IVT”, in-vehicle time; (ii) “IWAIT”, initial wait time; (iii) “XWAIT”, transfer wait time; (iv) “WACC”, walk access time; (v) “WAUX”, auxiliary walk time; and, (vi) “WEGR”, walk egress time.
Zonal data with the following fields: (i) “TOTEMP”, total employment; (ii) “RETEMPN”, retail trade employment per the NAICS classification.
Outputs
taz, travel analysis zone number
autoPeakRetail, the accessibility by automobile during peak conditions to retail employment for this TAZ
autoPeakTotal, the accessibility by automobile during peak conditions to all employment
autoOffPeakRetail, the accessibility by automobile during off-peak conditions to retail employment
autoOffPeakTotal, the accessibility by automobile during off-peak conditions to all employment
transitPeakRetail, the accessibility by transit during peak conditions to retail employment
transitPeakTotal, the accessibility by transit during peak conditions to all employment
transitOffPeakRetail, the accessiblity by transit during off-peak conditions to retail employment
transitOffPeakTotal, the accessiblity by transit during off-peak conditions to all employment
nonMotorizedRetail, the accessibility by walking during all time periods to retail employment
nonMotorizedTotal, the accessibility by walking during all time periods to all employment
The main interface to the accessibility model is the
compute_accessibility()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: skims
| Result Table: accessibility
| Skims Keys: O-D, D-O
- activitysim.abm.models.accessibility.compute_accessibility(land_use, accessibility, network_los, chunk_size, trace_od)#
Compute accessibility for each zone in land use file using expressions from accessibility_spec
The actual results depend on the expressions in accessibility_spec, but this is initially intended to permit implementation of the mtc accessibility calculation as implemented by Accessibility.job
Compute measures of accessibility used by the automobile ownership model. The accessibility measure first multiplies an employment variable by a mode-specific decay function. The product reflects the difficulty of accessing the activities the farther (in terms of round-trip travel time) the jobs are from the location in question. The products to each destination zone are next summed over each origin zone, and the logarithm of the product mutes large differences. The decay function on the walk accessibility measure is steeper than automobile or transit. The minimum accessibility is zero.
Disaggregate Accessibility#
The disaggregate accessibility model is an extension of the base accessibility model. While the base accessibility model is based on a mode-specific decay function and uses fixed market segments in the population (i.e., income), the disaggregate accessibility model extracts the actual destination choice logsums by purpose (i.e., mandatory fixed school/work location and non-mandatory tour destinations by purpose) from the actual model calculations using a user-defined proto-population. This enables users to include features that may be more critical to destination choice than just income (e.g., automobile ownership).
- Inputs:
disaggregate_accessibility.yaml - Configuration settings for disaggregate accessibility model.
annotate.csv [optional] - Users can specify additional annotations specific to disaggregate accessibility. For example, annotating the proto-population tables.
- Outputs:
final_disaggregate_accessibility.csv [optional]
final_non_mandatory_tour_destination_accesibility.csv [optional]
final_workplace_location_accessibility.csv [optional]
final_school_location_accessibility.csv [optional]
final_proto_persons.csv [optional]
final_proto_households.csv [optional]
final_proto_tours.csv [optional]
The above tables are created in the model pipeline, but the model will not save any outputs unless specified in settings.yaml - output_tables. Users can return the proto population tables for inspection, as well as the raw logsum accessibilities for mandatory school/work and non-mandatory destinations. The logsums are then merged at the household level in final_disaggregate_accessibility.csv, which each tour purpose logsums shown as separate columns.
Usage The disaggregate accessibility model is run as a model step in the model list. There are two necessary steps:
- initialize_proto_population
| - compute_disaggregate_accessibility
The reason the steps must be separate is to enable multiprocessing. The proto-population must be fully generated and initialized before activitysim slices the tables into separate threads. These steps must also occur before initialize_households in order to avoid conflict with the shadow_pricing model.
The model steps can be run either as part the activitysim model run, or setup to run as a standalone run to pre-computing the accessibility values. For standalone implementations, the final_disaggregate_accessibility.csv is read into the pipeline and initialized with the initialize_household model step.
- Configuration of disaggregate_accessibility.yaml:
CREATE_TABLES - Users define the variables to be generated for PROTO_HOUSEHOLDS, PROTO_PERSONS, and PROTO_TOURS tables. These tables must include all basic fields necessary for running the actual model. Additional fields can be annotated in pre-processing using the annotation settings of this file. The base variables in each table are defined using the following parameters:
VARIABLES - The base variable, must be a value or a list. Results in the cartesian product (all non-repeating combinations) of the fields.
mapped_fields [optional] - For non-combinatorial fields, users can map a variable to the fields generated in VARIABLES (e.g., income category bins mapped to median dollar values).
filter_rows [optional] - Users can also filter rows using pandas expressions if specific variable combinations are not desired.
JOIN_ON [required only for PROTO_TOURS] - specify the persons variable to join the tours to (e.g., person_number).
MERGE_ON - User specified fields to merge the proto-population logsums onto the full synthetic population. The proto-population should be designed such that the logsums are able to be joined exactly on these variables specified to the full population. Users specify the to join on using:
by: An exact merge will be attempted using these discrete variables.
asof [optional]: The model can peform an “asof” join for continuous variables, which finds the nearest value. This method should not be necessary since synthetic populations are all discrete.
method [optional]: Optional join method can be “soft”, default is None. For cases where a full inner join is not possible, a Naive Bayes clustering method is fast but discretely constrained method. The proto-population is treated as the “training data” to match the synthetic population value to the best possible proto-population candidate. The Some refinement may be necessary to make this procedure work.
annotate_proto_tables [optional] - Annotation configurations if users which to modify the proto-population beyond basic generation in the YAML.
DESTINATION_SAMPLE_SIZE - The destination sample size (0 = all zones), e.g., the number of destination zone alternatives sampled for calculating the destination logsum. Decimal values < 1 will be interpreted as a percentage, e.g., 0.5 = 50% sample.
ORIGIN_SAMPLE_SIZE - The origin sample size (0 = all zones), e.g., the number of origins where logsum is calculated. Origins without a logsum will draw from the nearest zone with a logsum. This parameter is useful for systems with a large number of zones with similar accessibility. Decimal values < 1 will be interpreted as a percentage, e.g., 0.5 = 50% sample.
ORIGIN_SAMPLE_METHOD - The method in which origins are sampled. Population weighted sampling can be TAZ-based or “TAZ-agnostic” using KMeans clustering. The potential advantage of KMeans is to provide a more geographically even spread of MAZs sampled that do not rely on TAZ hierarchies. Unweighted sampling is also possible using ‘uniform’ and ‘uniform-taz’.
None [Default] - Sample zones weighted by population, ensuring at least one TAZ is sampled per MAZ. If n-samples > n-tazs then sample 1 MAZ from each TAZ until n-remaining-samples < n-tazs, then sample n-remaining-samples TAZs and sample an MAZ within each of those TAZs. If n-samples < n-tazs, then it proceeds to the above ‘then’ condition.
“kmeans” - K-Means clustering is performed on the zone centroids (must be provided as maz_centroids.csv), weighted by population. The clustering yields k XY coordinates weighted by zone population for n-samples = k-clusters specified. Once k new cluster centroids are found, these are then approximated into the nearest available zone centroid and used to calculate accessibilities on. By default, the k-means method is run on 10 different initial cluster seeds (n_init) using using “k-means++” seeding algorithm (https://en.wikipedia.org/wiki/K-means%2B%2B). The k-means method runs for max_iter iterations (default=300).
“uniform” - Unweighted sample of N zones independent of each other.
“uniform-taz” - Unweighted sample of 1 zone per taz up to the N samples specified.
Work From Home#
Telecommuting is defined as workers who work from home instead of going to work. It only applies to workers with a regular workplace outside of home. The telecommute model consists of two submodels - this work from home model and a person Telecommute Frequency model. This model predicts for all workers whether they usually work from home.
The work from home model includes the ability to adjust a work from home alternative
constant to attempt to realize a work from home percent for what-if type analysis.
This iterative single process procedure takes as input a number of iterations, a filter on
the choosers to use for the calculation, a target work from home percent, a tolerance percent
for convergence, and the name of the coefficient to adjust. An example setup is provided and
the coefficient adjustment at each iteration is:
new_coefficient = log( target_percent / current_percent ) + current_coefficient
.
The main interface to the work from home model is the
work_from_home()
function. This
function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Field: work_from_home
| Skims Keys: NA
- activitysim.abm.models.work_from_home.work_from_home(persons_merged, persons, chunk_size, trace_hh_id)#
This model predicts whether a person (worker) works from home. The output from this model is TRUE (if works from home) or FALSE (works away from home). The workplace location choice is overridden for workers who work from home and set to -1.
School Location#
The usual school location choice models assign a usual school location for the primary mandatory activity of each child and university student in the synthetic population. The models are composed of a set of accessibility-based parameters (including one-way distance between home and primary destination and the tour mode choice logsum - the expected maximum utility in the mode choice model which is given by the logarithm of the sum of exponentials in the denominator of the logit formula) and size terms, which describe the quantity of grade-school or university opportunities in each possible destination.
- The school location model is made up of four steps:
sampling - selects a sample of alternative school locations for the next model step. This selects X locations from the full set of model zones using a simple utility.
logsums - starts with the table created above and calculates and adds the mode choice logsum expression for each alternative school location.
simulate - starts with the table created above and chooses a final school location, this time with the mode choice logsum included.
shadow prices - compare modeled zonal destinations to target zonal size terms and calculate updated shadow prices.
These steps are repeated until shadow pricing convergence criteria are satisfied or a max number of iterations is reached. See Shadow Pricing.
School location choice for placeholder_multiple_zone models uses Presampling by default.
The main interfaces to the model is the school_location()
function.
This function is registered as an Inject step in the example Pipeline. See Writing Logsums for how to write logsums for estimation.
Core Table: persons
| Result Field: school_taz
| Skims Keys: TAZ, alt_dest, AM time period, MD time period
Work Location#
The usual work location choice models assign a usual work location for the primary mandatory activity of each employed person in the synthetic population. The models are composed of a set of accessibility-based parameters (including one-way distance between home and primary destination and the tour mode choice logsum - the expected maximum utility in the mode choice model which is given by the logarithm of the sum of exponentials in the denominator of the logit formula) and size terms, which describe the quantity of work opportunities in each possible destination.
- The work location model is made up of four steps:
sample - selects a sample of alternative work locations for the next model step. This selects X locations from the full set of model zones using a simple utility.
logsums - starts with the table created above and calculates and adds the mode choice logsum expression for each alternative work location.
simulate - starts with the table created above and chooses a final work location, this time with the mode choice logsum included.
shadow prices - compare modeled zonal destinations to target zonal size terms and calculate updated shadow prices.
These steps are repeated until shadow pricing convergence criteria are satisfied or a max number of iterations is reached. See Shadow Pricing.
Work location choice for placeholder_multiple_zone models uses Presampling by default.
The main interfaces to the model is the workplace_location()
function.
This function is registered as an Inject step in the example Pipeline. See Writing Logsums for how to write logsums for estimation.
Core Table: persons
| Result Field: workplace_taz
| Skims Keys: TAZ, alt_dest, AM time period, PM time period
- activitysim.abm.models.location_choice.iterate_location_choice(model_settings, persons_merged, persons, households, network_los, estimator, chunk_size, trace_hh_id, locutor, trace_label)#
iterate run_location_choice updating shadow pricing until convergence criteria satisfied or max_iterations reached.
(If use_shadow_pricing not enabled, then just iterate once)
- Parameters
- model_settingsdict
- persons_mergedinjected table
- personsinjected table
- network_loslos.Network_LOS
- chunk_sizeint
- trace_hh_idint
- locutorbool
whether this process is the privileged logger of shadow_pricing when multiprocessing
- trace_labelstr
- Returns
- adds choice column model_settings[‘DEST_CHOICE_COLUMN_NAME’]
- adds logsum column model_settings[‘DEST_CHOICE_LOGSUM_COLUMN_NAME’]- if provided
- adds annotations to persons table
- activitysim.abm.models.location_choice.run_location_choice(persons_merged_df, network_los, shadow_price_calculator, want_logsums, want_sample_table, estimator, model_settings, chunk_size, chunk_tag, trace_hh_id, trace_label, skip_choice=False)#
Run the three-part location choice algorithm to generate a location choice for each chooser
Handle the various segments separately and in turn for simplicity of expression files
- Parameters
- persons_merged_dfpandas.DataFrame
persons table merged with households and land_use
- network_loslos.Network_LOS
- shadow_price_calculatorShadowPriceCalculator
to get size terms
- want_logsumsboolean
- want_sample_tableboolean
- estimator: Estimator object
- model_settingsdict
- chunk_sizeint
- trace_hh_idint
- trace_labelstr
- Returns
- choicespandas.DataFrame indexed by persons_merged_df.index
‘choice’ : location choices (zone ids) ‘logsum’ : float logsum of choice utilities across alternatives
- logsums optional & only returned if DEST_CHOICE_LOGSUM_COLUMN_NAME specified in model_settings
- activitysim.abm.models.location_choice.run_location_logsums(segment_name, persons_merged_df, network_los, location_sample_df, model_settings, chunk_size, chunk_tag, trace_label)#
add logsum column to existing location_sample table
logsum is calculated by running the mode_choice model for each sample (person, dest_zone_id) pair in location_sample, and computing the logsum of all the utilities
PERID
dest_zone_id
rand
pick_count
logsum (added)
23750
14
0.565502716034
4
1.85659498857
23750
16
0.711135838871
6
1.92315598631
…
23751
12
0.408038878552
1
2.40612135416
23751
14
0.972732479292
2
1.44009018355
- activitysim.abm.models.location_choice.run_location_sample(segment_name, persons_merged, network_los, dest_size_terms, estimator, model_settings, chunk_size, chunk_tag, trace_label)#
select a sample of alternative locations.
Logsum calculations are expensive, so we build a table of persons * all zones and then select a sample subset of potential locations
The sample subset is generated by making multiple choices (<sample_size> number of choices) which results in sample containing up to <sample_size> choices for each choose (e.g. person) and a pick_count indicating how many times that choice was selected for that chooser.)
person_id, dest_zone_id, rand, pick_count 23750, 14, 0.565502716034, 4 23750, 16, 0.711135838871, 6 … 23751, 12, 0.408038878552, 1 23751, 14, 0.972732479292, 2
- activitysim.abm.models.location_choice.run_location_simulate(segment_name, persons_merged, location_sample_df, network_los, dest_size_terms, want_logsums, estimator, model_settings, chunk_size, chunk_tag, trace_label, skip_choice=False)#
run location model on location_sample annotated with mode_choice logsum to select a dest zone from sample alternatives
- Returns
- choicespandas.DataFrame indexed by persons_merged_df.index
choice : location choices (zone ids) logsum : float logsum of choice utilities across alternatives
- logsums optional & only returned if DEST_CHOICE_LOGSUM_COLUMN_NAME specified in model_settings
- activitysim.abm.models.location_choice.school_location(persons_merged, persons, households, network_los, chunk_size, trace_hh_id, locutor)#
School location choice model
iterate_location_choice adds location choice column and annotations to persons table
- activitysim.abm.models.location_choice.workplace_location(persons_merged, persons, households, network_los, chunk_size, trace_hh_id, locutor)#
workplace location choice model
iterate_location_choice adds location choice column and annotations to persons table
- activitysim.abm.models.location_choice.write_estimation_specs(estimator, model_settings, settings_file)#
write sample_spec, spec, and coefficients to estimation data bundle
- Parameters
- model_settings
- settings_file
Shadow Pricing#
The shadow pricing calculator used by work and school location choice.
Turning on and saving shadow prices
Shadow pricing is activated by setting the use_shadow_pricing
to True in the settings.yaml file.
Once this setting has been activated, ActivitySim will search for shadow pricing configuration in
the shadow_pricing.yaml file. When shadow pricing is activated, the shadow pricing outputs will be
exported by the tracing engine. As a result, the shadow pricing output files will be prepended with
trace
followed by the iteration number the results represent. For example, the shadow pricing
outputs for iteration 3 of the school location model will be called
trace.shadow_price_school_shadow_prices_3.csv
.
In total, ActivitySim generates three types of output files for each model with shadow pricing:
trace.shadow_price_<model>_desired_size.csv
The size terms by zone that the ctramp and daysim methods are attempting to target. These equal the size term columns in the land use data multiplied by size term coefficients.trace.shadow_price_<model>_modeled_size_<iteration>.csv
These are the modeled size terms after the iteration of shadow pricing identified by the <iteration> number. In other words, these are the predicted choices by zone and segment for the model after the iteration completes. (Not applicable forsimulation
option.)trace.shadow_price_<model>_shadow_prices_<iteration>.csv
The actual shadow price for each zone and segment after the <iteration> of shadow pricing. This is the file that can be used to warm start the shadow pricing mechanism in ActivitySim. (Not applicable forsimulation
option.)
There are three shadow pricing methods in activitysim: ctramp
, daysim
, and simulation
.
The first two methods try to match model output with workplace/school location model size terms,
while the last method matches model output with actual employment/enrollmment data.
The simulation approach operates the following steps. First, every worker / student will be assigned without shadow prices applied. The modeled share and the target share for each zone are compared. If the zone is overassigned, a sample of people from the over-assigned zones will be selected for re-simulation. Shadow prices are set to -999 for the next iteration for overassigned zones which removes the zone from the set of alternatives in the next iteration. The sampled people will then be forced to choose from one of the under-assigned zones that still have the initial shadow price of 0. (In this approach, the shadow price variable is really just a switch turning that zone on or off for selection in the subsequent iterations. For this reason, warm-start functionality for this approach is not applicable.) This process repeats until the overall convergence criteria is met or the maximum number of allowed iterations is reached.
Because the simulation approach only re-simulates workers / students who were over-assigned in the previous iteration, run time is significantly less (~90%) than the CTRAMP or DaySim approaches which re-simulate all workers and students at each iteration.
shadow_pricing.yaml Attributes
shadow_pricing_models
List model_selectors and model_names of models that use shadow pricing. This list identifies which size_terms to preload which must be done in single process mode, so predicted_size tables can be scaled to populationLOAD_SAVED_SHADOW_PRICES
global switch to enable/disable loading of saved shadow prices. From the above example, this would be trace.shadow_price_<model>_shadow_prices_<iteration>.csv renamed and stored in thedata_dir
.MAX_ITERATIONS
If no loaded shadow prices, maximum number of times shadow pricing can be run on each model before proceeding to the next model.MAX_ITERATIONS_SAVED
If loaded shadow prices, maximum number of times shadow pricing can be run.SIZE_THRESHOLD
Ignore zones in failure calculation (ctramp or daysim method) with smaller size term value than size_threshold.TARGET_THRESHOLD
Ignore zones in failure calculation (simulation method) with smaller employment/enrollment than target_threshold.PERCENT_TOLERANCE
Maximum percent difference between modeled and desired size termsFAIL_THRESHOLD
percentage of zones exceeding the PERCENT_TOLERANCE considered a failureSHADOW_PRICE_METHOD
[ctramp | daysim | simulation]workplace_segmentation_targets
dict matching school segment to landuse employment column target. Only used as part of simulation option. If mutiple segments list the same target column, the segments will be added together for comparison. (Same with the school option below.)school_segmentation_targets
dict matching school segment to landuse enrollment column target. Only used as part of simulation option.DAMPING_FACTOR
On each iteration, ActivitySim will attempt to adjust the model to match desired size terms. The number is multiplied by adjustment factor to dampen or amplify the ActivitySim calculation. (only for CTRAMP)DAYSIM_ABSOLUTE_TOLERANCE
Absolute tolerance for DaySim optionDAYSIM_PERCENT_TOLERANCE
Relative tolerance for DaySim optionWRITE_ITERATION_CHOICES
[True | False ] Writes the choices of each person out to the trace folder. Used for debugging or checking itration convergence. WARNING: every person is written for each sub-process so the disc space can get large.
- activitysim.abm.tables.shadow_pricing.block_name(model_selector)#
return canonical block name for model_selector
Ordinarily and ideally this would just be model_selector, but since mp_tasks saves all shared data blocks in a common dict to pass to sub-tasks, we want to be able override block naming convention to handle any collisions between model_selector names and skim names. Until and unless that happens, we just use model_selector name.
- Parameters
- model_selector
- Returns
- block_namestr
canonical block name
- activitysim.abm.tables.shadow_pricing.buffers_for_shadow_pricing(shadow_pricing_info)#
Allocate shared_data buffers for multiprocess shadow pricing
Allocates one buffer per model_selector. Buffer datatype and shape specified by shadow_pricing_info
buffers are multiprocessing.Array (RawArray protected by a multiprocessing.Lock wrapper) We don’t actually use the wrapped version as it slows access down and doesn’t provide protection for numpy-wrapped arrays, but it does provide a convenient way to bundle RawArray and an associated lock. (ShadowPriceCalculator uses the lock to coordinate access to the numpy-wrapped RawArray.)
- Parameters
- shadow_pricing_infodict
- Returns
- data_buffersdict {<model_selector><shared_data_buffer>}
- dict of multiprocessing.Array keyed by model_selector
- activitysim.abm.tables.shadow_pricing.buffers_for_shadow_pricing_choice(shadow_pricing_choice_info)#
Same as above buffers_for_shadow_price function except now we need to store the actual choices for the simulation based shadow pricing method
This allocates a multiprocessing.Array that can store the choice for each person and then wraps a dataframe around it. That means the dataframe can be shared and accessed across all threads. Parameters ———- shadow_pricing_info : dict Returns ——-
data_buffers : dict {<model_selector> : <shared_data_buffer>} dict of multiprocessing.Array keyed by model_selector
and wrapped in a pandas dataframe
- activitysim.abm.tables.shadow_pricing.get_shadow_pricing_choice_info()#
return dict with info about dtype and shapes of desired and modeled size tables
block shape is (num_zones, num_segments + 1)
- Returns
- shadow_pricing_info: dict
dtype: <sp_dtype>, block_shapes: dict {<model_selector>: <block_shape>}
- activitysim.abm.tables.shadow_pricing.get_shadow_pricing_info()#
return dict with info about dtype and shapes of desired and modeled size tables
block shape is (num_zones, num_segments + 1)
- Returns
- shadow_pricing_info: dict
dtype: <sp_dtype>, block_shapes: dict {<model_selector>: <block_shape>}
- activitysim.abm.tables.shadow_pricing.load_shadow_price_calculator(model_settings)#
Initialize ShadowPriceCalculator for model_selector (e.g. school or workplace)
If multiprocessing, get the shared_data buffer to coordinate global_desired_size calculation across sub-processes
- Parameters
- model_settingsdict
- Returns
- spcShadowPriceCalculator
- activitysim.abm.tables.shadow_pricing.logger = <Logger activitysim.abm.tables.shadow_pricing (WARNING)>#
ShadowPriceCalculator and associated utility methods
See docstrings for documentation on:
update_shadow_prices how shadow_price coefficients are calculated synchronize_modeled_size interprocess communication to compute aggregate modeled_size check_fit convergence criteria for shadow_pric iteration
Import concepts and variables:
- model_selector: str
Identifies a specific location choice model (e.g. ‘school’, ‘workplace’) The various models work similarly, but use different expression files, model settings, etc.
- segment: str
Identifies a specific demographic segment of a model (e.g. ‘elementary’ segment of ‘school’) Models can have different size term coefficients (in destinatin_choice_size_terms file) and different utility coefficients in models’s location and location_sample csv expression files
size_table: pandas.DataFrame
- activitysim.abm.tables.shadow_pricing.shadow_price_data_from_buffers(data_buffers, shadow_pricing_info, model_selector)#
- Parameters
- data_buffersdict of {<model_selector><multiprocessing.Array>}
multiprocessing.Array is simply a convenient way to bundle Array and Lock we extract the lock and wrap the RawArray in a numpy array for convenience in indexing The shared data buffer has shape (<num_zones, <num_segments> + 1) extra column is for reverse semaphores with TALLY_CHECKIN and TALLY_CHECKOUT
- shadow_pricing_infodict
- dict of useful info
dtype: sp_dtype, block_shapes : OrderedDict({<model_selector>: <shape tuple>}) dict mapping model_selector to block shape (including extra column for semaphores) e.g. {‘school’: (num_zones, num_segments + 1)
- model_selectorstr
location type model_selector (e.g. school or workplace)
- Returns
- shared_data, shared_data_lock
shared_data : multiprocessing.Array or None (if single process) shared_data_lock : numpy array wrapping multiprocessing.RawArray or None (if single process)
- activitysim.abm.tables.shadow_pricing.shadow_price_data_from_buffers_choice(data_buffers, shadow_pricing_info, model_selector)#
- Parameters
- data_buffersdict of {<model_selector><multiprocessing.Array>}
multiprocessing.Array is simply a convenient way to bundle Array and Lock we extract the lock and wrap the RawArray in a numpy array for convenience in indexing The shared data buffer has shape (<num_zones, <num_segments> + 1) extra column is for reverse semaphores with TALLY_CHECKIN and TALLY_CHECKOUT
- shadow_pricing_infodict
- dict of useful info
dtype: sp_dtype, block_shapes : OrderedDict({<model_selector>: <shape tuple>}) dict mapping model_selector to block shape (including extra column for semaphores) e.g. {‘school’: (num_zones, num_segments + 1)
- model_selectorstr
location type model_selector (e.g. school or workplace)
- Returns
- shared_data, shared_data_lock
shared_data : multiprocessing.Array or None (if single process) shared_data_lock : numpy array wrapping multiprocessing.RawArray or None (if single process)
- activitysim.abm.tables.shadow_pricing.size_table_name(model_selector)#
Returns canonical name of injected destination desired_size table
- Parameters
- model_selectorstr
e.g. school or workplace
- Returns
- table_namestr
Transit Pass Subsidy#
The transit fare discount model is defined as persons who purchase or are provided a transit pass. The transit fare discount consists of two submodels - this transit pass subsidy model and a person Transit Pass Ownership model. The result of this model can be used to condition downstream models such as the person Transit Pass Ownership model and the tour and trip mode choice models via fare discount adjustments.
The main interface to the transit pass subsidy model is the
transit_pass_subsidy()
function. This
function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Field: transit_pass_subsidy
| Skims Keys: NA
- activitysim.abm.models.transit_pass_subsidy.transit_pass_subsidy(persons_merged, persons, chunk_size, trace_hh_id)#
Transit pass subsidy model.
Transit Pass Ownership#
The transit fare discount is defined as persons who purchase or are provided a transit pass. The transit fare discount consists of two submodels - this transit pass ownership model and a person Transit Pass Subsidy model. The result of this model can be used to condition downstream models such as the tour and trip mode choice models via fare discount adjustments.
The main interface to the transit pass ownership model is the
transit_pass_ownership()
function. This
function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Field: transit_pass_ownership
| Skims Keys: NA
- activitysim.abm.models.transit_pass_ownership.transit_pass_ownership(persons_merged, persons, chunk_size, trace_hh_id)#
Transit pass ownership model.
Auto Ownership#
The auto ownership model selects a number of autos for each household in the simulation. The primary model components are household demographics, zonal density, and accessibility.
The main interface to the auto ownership model is the
auto_ownership_simulate()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: households
| Result Field: auto_ownership
| Skims Keys: NA
- activitysim.abm.models.auto_ownership.auto_ownership_simulate(households, households_merged, chunk_size, trace_hh_id)#
Auto ownership is a standard model which predicts how many cars a household with given characteristics owns
Vehicle Type Choice#
The vehicle type choice model selects a vehicle type for each household vehicle. A vehicle type is a combination of the vehicle’s body type, age, and fuel type. For example, a 13 year old gas powered van would have a vehicle type of van_13_gas.
There are two vehicle type choice model structures implemented:
Simultaneous choice of body type, age, and fuel type.
Simultaneous choice of body type and age, with fuel type assigned from a probability distribution.
The vehicle_type_choice.yaml file contains the following model specific options:
SPEC
: Filename for input utility expressionsCOEFS
: Filename for input utility expression coefficientsLOGIT_TYPE
: Specifies whether you are using a nested or multinomial logit structurecombinatorial_alts
: Specifies the alternatives for the choice model. Has sub-categories ofbody_type
,age
, andfuel_type
.PROBS_SPEC
: Filename for input fuel type probabilities. Supplying probabilities corresponds to implementation structure 2 above, and not supplying probabilities would correspond to implementation structure 1. If provided, thefuel_type
category incombinatorial_alts
will be excluded from the model alternatives such that only body type and age are selected. InputPROBS_SPEC
table will have an index column named vehicle_type which is a combination of body type and age in the form{body type}_{age}
. Subsequent column names specify the fuel type that will be added and the column values are the probabilities of that fuel type. The vehicle type model will select a fuel type for each vehicle based on the provided probabilities.VEHICLE_TYPE_DATA_FILE
: Filename for input vehicle type data. Must have columnsbody_type
,fuel_type
, andvehicle_year
. Vehicleage
is computed using theFLEET_YEAR
option. Data for every alternative specified in thecombinatorial_alts
option must be included in the file. Vehicle type data file will be joined to the alternatives and can be used in the utility expressions ifPROBS_SPEC
is not provided. IfPROBS_SPEC
is provided, the vehicle type data will be joined after a vehicle type is decided so the data can be used in downstream models.COLS_TO_INCLUDE_IN_VEHICLE_TABLE
: List of columns from the vehicle type data file to include in the vehicle table that can be used in downstream models. Examples of data that might be needed is vehicle range for the Vehicle Allocation model, auto operating costs to use in tour and trip mode choice, and emissions data for post-model-run analysis.FLEET_YEAR
: Integer specifying the fleet year to be used in the model run. This is used to computeage
in the vehicle type data table whereage = (1 + FLEET_YEAR - vehicle_year)
. Computing age on the fly with theFLEET_YEAR
variable allows the user flexibility to compile and share a single vehicle type data file containing all years and simply change theFLEET_YEAR
to run different scenario years.Optional additional settings that work the same in other models are constants, expression preprocessor, and annotate tables.
Input vehicle type data included in prototype_mtc_extended came from a variety of sources. The number of vehicle makes, models, MPG, and electric vehicle range was sourced from the Enivornmental Protection Agency (EPA). Additional data on vehicle costs were derived from the National Household Travel Survey. Auto operating costs in the vehicle type data file were a sum of fuel costs and maintenance costs. Fuel costs were calculated from MPG assuming a $3.00 cost for a gallon of gas. When MPG was not available to calculate fuel costs, the closest year, vehicle type, or body type available was used. Maintenance costs were taken from AAA’s 2017 driving cost study. Size categories within body types were averaged, e.g. car was an average of AAA’s small, medium, and large sedan categories. Motorcycles were assigned the small sedan maintenance costs since they were not included in AAA’s report. Maintenance costs were not varied by vehicle year. (According to data from the U.S. Bureau of Labor Statistics, there was no consistent relationship between vehicle age and maintenance costs.)
Using the above methodology, the average auto operating costs of vehicles output from prototype_mtc_extended was 18.4 cents. This value is very close to the auto operating cost of 18.3 cents used in prototype_mtc. Non-household vehicles in prototype_mtc_extended use the auto operating cost of 18.3 cents used in prototype_mtc. Users are encouraged to make their own assumptions and calculate auto operating costs as they see fit.
The distribution of fuel type probabilities included in prototype_mtc_extended are computed directly from the National Household Travel Survey data and include the entire US. Therefore, there is “lumpiness” in probabilities due to poor statistics in the data for some vehicle types. The user is encouraged to adjust the probabilities to their modeling region and “smooth” them for more consistent results.
Further discussion of output results and model sensitivities can be found here.
- activitysim.abm.models.vehicle_type_choice.annotate_vehicle_type_choice_households(model_settings, trace_label)#
Add columns to the households table in the pipeline according to spec.
- Parameters
- model_settingsdict
- trace_labelstr
- activitysim.abm.models.vehicle_type_choice.annotate_vehicle_type_choice_persons(model_settings, trace_label)#
Add columns to the persons table in the pipeline according to spec.
- Parameters
- model_settingsdict
- trace_labelstr
- activitysim.abm.models.vehicle_type_choice.annotate_vehicle_type_choice_vehicles(model_settings, trace_label)#
Add columns to the vehicles table in the pipeline according to spec.
- Parameters
- model_settingsdict
- trace_labelstr
- activitysim.abm.models.vehicle_type_choice.append_probabilistic_vehtype_type_choices(choices, model_settings, trace_label)#
Select a fuel type for the provided body type and age of the vehicle.
Make probabilistic choices based on the PROBS_SPEC file.
- Parameters
- choicespandas.DataFrame
selection of {body_type}_{age} to append vehicle type to
- probs_spec_filestr
- trace_labelstr
- Returns
- choicespandas.DataFrame
table of chosen vehicle types
- activitysim.abm.models.vehicle_type_choice.construct_model_alternatives(model_settings, alts_cats_dict, vehicle_type_data)#
Construct the table of vehicle type alternatives.
Vehicle type data is joined to the alternatives table for use in utility expressions.
- Parameters
- model_settingsdict
- alts_cats_dictdict
nested dictionary of vehicle body, age, and fuel options
- vehicle_type_datapandas.DataFrame
- Returns
- alts_widepd.DataFrame
includes column indicators and data for each alternative
- alts_longpd.DataFrame
rows just list the alternatives
- activitysim.abm.models.vehicle_type_choice.get_combinatorial_vehicle_alternatives(alts_cats_dict)#
Build a pandas dataframe containing columns for each vehicle alternative.
Rows will correspond to the alternative number and will be 0 except for the 1 in the column corresponding to that alternative.
- Parameters
- alts_cats_dictdict
- model_settingsdict
- Returns
- alts_widepd.DataFrame in wide format expanded using pandas get_dummies function
- alts_longpd.DataFrame in long format
- activitysim.abm.models.vehicle_type_choice.get_vehicle_type_data(model_settings, vehicle_type_data_file)#
Read in the vehicle type data and computes the vehicle age.
- Parameters
- model_settingsdict
- vehicle_type_data_filestr
name of vehicle type data file found in config folder
- Returns
- vehicle_type_datapandas.DataFrame
table of vehicle type data with required body_type, age, and fuel_type columns
- activitysim.abm.models.vehicle_type_choice.iterate_vehicle_type_choice(vehicles_merged, model_settings, model_spec, locals_dict, estimator, chunk_size, trace_label)#
Select vehicle type for each household vehicle sequentially.
Iterate through household vehicle numbers and select a vehicle type of the form {body_type}_{age}_{fuel_type}. The preprocessor is run for each iteration on the entire chooser table, not just the one for the current vehicle number. This allows for computation of terms involving the presence of other household vehicles.
Vehicle type data is read in according to the specification and joined to the alternatives. It can optionally be included in the output vehicles table by specifying the COLS_TO_INCLUDE_IN_VEHICLE_TABLE option in the model yaml.
- Parameters
- vehicles_mergedorca.DataFrameWrapper
vehicle list owned by each household merged with households table
- model_settingsdict
yaml model settings file as dict
- model_specpandas.DataFrame
omnibus spec file with expressions in index and one column per segment
- locals_dictdict
additional variables available when writing expressions
- estimatorEstimator object
- chunk_sizeorca.injectable
- trace_labelstr
- Returns
- all_choicespandas.DataFrame
single table of selected vehicle types and associated data
- all_chooserspandas.DataFrame
single table of chooser data with preprocessor variables included
- activitysim.abm.models.vehicle_type_choice.vehicle_type_choice(persons, households, vehicles, vehicles_merged, chunk_size, trace_hh_id)#
Assign a vehicle type to each vehicle in the vehicles table.
If a “SIMULATION_TYPE” is set to simple_simulate in the vehicle_type_choice.yaml config file, then the model specification .csv file should contain one column of coefficients for each distinct alternative. This format corresponds to ActivitySim’s
activitysim.core.simulate.simple_simulate()
format. Otherwise, this model will construct a table of alternatives, at run time, based on all possible combinations of values of the categorical variables enumerated as “combinatorial_alts” in the .yaml config. In this case, the model leverages ActivitySim’sactivitysim.core.interaction_simulate()
model design, in which the model specification .csv has only one column of coefficients, and the utility expressions can turn coefficients on or off based on attributes of either the chooser _or_ the alternative.As an optional second step, the user may also specify a “PROBS_SPEC” .csv file in the main .yaml config, corresponding to a lookup table of additional vehicle attributes and probabilities to be sampled and assigned to vehicles after the logit choices have been made. The rows of the “PROBS_SPEC” file must include all body type and vehicle age choices assigned in the logit model. These additional attributes are concatenated with the selected alternative from the logit model to form a single vehicle type name to be stored in the vehicles table as the vehicle_type column.
Only one household vehicle is selected at a time to allow for the introduction of owned vehicle related attributes. For example, a household may be less likely to own a second van if they already own one. The model is run sequentially through household vehicle numbers. The preprocessor is run for each iteration on the entire vehicles table to allow for computation of terms involving the presence of other household vehicles.
The user may also augment the households or persons tables with new vehicle type-based fields specified via expressions in “annotate_households_vehicle_type.csv” and “annotate_persons_vehicle_type.csv”, respectively.
- Parameters
- personsorca.DataFrameWrapper
- householdsorca.DataFrameWrapper
- vehiclesorca.DataFrameWrapper
- vehicles_mergedorca.DataFrameWrapper
- chunk_sizeorca.injectable
- trace_hh_idorca.injectable
Telecommute Frequency#
Telecommuting is defined as workers who work from home instead of going to work. It only applies to workers with a regular workplace outside of home. The telecommute model consists of two submodels - a person Work From Home model and this person telecommute frequency model.
For all workers that work out of the home, the telecommute models predicts the level of telecommuting. The model alternatives are the frequency of telecommuting in days per week (0 days, 1 day, 2 to 3 days, 4+ days).
The main interface to the work from home model is the
telecommute_frequency()
function. This
function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Field: telecommute_frequency
| Skims Keys: NA
- activitysim.abm.models.telecommute_frequency.telecommute_frequency(persons_merged, persons, chunk_size, trace_hh_id)#
This model predicts the frequency of telecommute for a person (worker) who does not works from home. The alternatives of this model are ‘No Telecommute’, ‘1 day per week’, ‘2 to 3 days per week’ and ‘4 days per week’. This model reflects the choices of people who prefer a combination of working from home and office during a week.
Free Parking Eligibility#
The Free Parking Eligibility model predicts the availability of free parking at a person’s workplace. It is applied for people who work in zones that have parking charges, which are generally located in the Central Business Districts. The purpose of the model is to adequately reflect the cost of driving to work in subsequent models, particularly in mode choice.
The main interface to the free parking eligibility model is the
free_parking()
function. This function is registered
as an Inject step in the example Pipeline.
Core Table: persons
| Result Field: free_parking_at_work
| Skims Keys: NA
Coordinated Daily Activity Pattern#
The Coordinated Daily Activity Pattern (CDAP) model predicts the choice of daily activity pattern (DAP) for each member in the household, simultaneously. The DAP is categorized in to three types as follows:
Mandatory: the person engages in travel to at least one out-of-home mandatory activity - work, university, or school. The mandatory pattern may also include non-mandatory activities such as separate home-based tours or intermediate stops on mandatory tours.
Non-mandatory: the person engages in only maintenance and discretionary tours, which, by definition, do not contain mandatory activities.
Home: the person does not travel outside the home.
The CDAP model is a sequence of vectorized table operations:
create a person level table and rank each person in the household for inclusion in the CDAP model. Priority is given to full time workers (up to two), then to part time workers (up to two workers, of any type), then to children (youngest to oldest, up to three). Additional members up to five are randomly included for the CDAP calculation.
solve individual M/N/H utilities for each person
take as input an interaction coefficients table and then programmatically produce and write out the expression files for households size 1, 2, 3, 4, and 5 models independent of one another
select households of size 1, join all required person attributes, and then read and solve the automatically generated expressions
repeat for households size 2, 3, 4, and 5. Each model is independent of one another.
The main interface to the CDAP model is the run_cdap()
function. This function is called by the Inject step cdap_simulate
which is
registered as an Inject step in the example Pipeline. There are two cdap class definitions in
ActivitySim. The first is at cdap()
and contains the Inject
wrapper for running it as part of the model pipeline. The second is
at cdap()
and contains CDAP model logic.
Core Table: persons
| Result Field: cdap_activity
| Skims Keys: NA
- activitysim.abm.models.cdap.cdap_simulate(persons_merged, persons, households, chunk_size, trace_hh_id)#
CDAP stands for Coordinated Daily Activity Pattern, which is a choice of high-level activity pattern for each person, in a coordinated way with other members of a person’s household.
Because Python requires vectorization of computation, there are some specialized routines in the cdap directory of activitysim for this purpose. This module simply applies those utilities using the simulation framework.
Mandatory Tour Frequency#
The individual mandatory tour frequency model predicts the number of work and school tours taken by each person with a mandatory DAP. The primary drivers of mandatory tour frequency are demographics, accessibility-based parameters such as drive time to work, and household automobile ownership. It also creates mandatory tours in the data pipeline.
The main interface to the mandatory tour purpose frequency model is the
mandatory_tour_frequency()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Fields: mandatory_tour_frequency
| Skims Keys: NA
- activitysim.abm.models.mandatory_tour_frequency.mandatory_tour_frequency(persons_merged, chunk_size, trace_hh_id)#
This model predicts the frequency of making mandatory trips (see the alternatives above) - these trips include work and school in some combination.
Mandatory Tour Scheduling#
The mandatory tour scheduling model selects a tour departure and duration period (and therefore a start and end period as well) for each mandatory tour. The primary drivers in the model are accessibility-based parameters such as the mode choice logsum for the departure/arrival hour combination, demographics, and time pattern characteristics such as the time windows available from previously scheduled tours. This model uses person Person Time Windows.
Note
For prototype_mtc
, the modeled time periods for all submodels are hourly from 3 am to 3 am the next day, and any times before 5 am are shifted to time period 5, and any times after 11 pm are shifted to time period 23.
If tour_departure_and_duration_segments.csv
is included in the configs, then the model
will use these representative start and end time periods when calculating mode choice logsums
instead of the specific start and end combinations for each alternative to reduce runtime. This
feature, know as representative logsums
, takes advantage of the fact that the mode choice logsum,
say, from 6 am to 2 pm is very similar to the logsum from 6 am to 3 pm, and 6 am to 4 pm, and so using
just 6 am to 3 pm (with the idea that 3 pm is the “representative time period”) for these alternatives is
sufficient for tour scheduling. By reusing the 6 am to 3 pm mode choice logsum, ActivitySim saves
significant runtime.
The main interface to the mandatory tour purpose scheduling model is the
mandatory_tour_scheduling()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: start, end, duration
| Skims Keys: TAZ, workplace_taz, school_taz, start, end
- activitysim.abm.models.mandatory_scheduling.mandatory_tour_scheduling(tours, persons_merged, tdd_alts, chunk_size, trace_hh_id)#
This model predicts the departure time and duration of each activity for mandatory tours
School Escorting#
The school escort model determines whether children are dropped-off at or picked-up from school, simultaneously with the chaperone responsible for chauffeuring the children, which children are bundled together on half-tours, and the type of tour (pure escort versus rideshare). The model is run after work and school locations have been chosen for all household members, and after work and school tours have been generated and scheduled. The model labels household members of driving age as potential ‘chauffeurs’ and children with school tours as potential ‘escortees’. The model then attempts to match potential chauffeurs with potential escortees in a choice model whose alternatives consist of ‘bundles’ of escortees with a chauffeur for each half tour.
School escorting is a household level decision – each household will choose an alternative from the school_escorting_alts.csv
file,
with the first alternative being no escorting. This file contains the following columns:
Column Name |
Column Description |
---|---|
Alt |
Alternative number |
bundle[1,2,3] |
bundle number for child 1,2, and 3 |
chauf[1,2,3] |
chauffeur number for child 1,2, and 3 - 0 = child not escorted - 1 = chauffeur 1 as ride share - 2 = chauffeur 1 as pure escort - 3 = chauffeur 2 as ride share - 4 = chauffeur 3 as pure escort |
nbund[1,2] |
|
nbundles |
|
nrs1 |
|
npe1 |
|
nrs2 |
|
npe2 |
|
Description |
|
The model as currently implemented contains three escortees and two chauffeurs. Escortees are students under age 16 with a mandatory tour whereas chaperones are all persons in the household over the age of 18. For households that have more than three possible escortees, the three youngest children are selected for the model. The two chaperones are selected as the adults of the household with the highest weight according to the following calculation: \(Weight = 100*personType + 10*gender + 1*age(0,1)\) Where personType is the person type number from 1 to 5, gender is 1 for male and 2 for female, and age is a binary indicator equal to 1 if age is over 25 else 0.
The model is run sequentially three times, once in the outbound direction, once in the inbound direction, and again in the outbound direction with additional conditions on what happened in the inbound direction. There are therefore three sets of utility specifications, coefficients, and pre-processor files. Each of these files is specified in the school_escorting.yaml file along with the number of escortees and number of chaperones.
There is also a constants section in the school_escorting.yaml file which contain two constants. One which sets the maximum time bin difference to match school and work tours for ride sharing and another to set the number of minutes per time bin. In the prototype_mtc_extended example, these are set to 1 and 60 respectively.
After a school escorting alternative is chosen for the inbound and outbound direction, the model will create the tours and trips associated with the decision. Pure escort tours are created, and the mandatory tour start and end times are changed to match the school escort bundle start and end times. (Outbound tours have their start times matched and inbound tours have their end times matched.) Escortee drop-off / pick-up order is determined by the distance from home to the school locations. They are ordered from smallest to largest in the outbound direction, and largest to smallest in the inbound direction. Trips are created for each half-tour that includes school escorting according to the provided order.
The created pure escort tours are joined to the already created mandatory tour table in the pipeline and are also saved separately to the pipeline under the table name “school_escort_tours”. Created school escorting trips are saved to the pipeline under the table name “school_escort_trips”. By saving these to the pipeline, their data can be queried in downstream models to set correct purposes, destinations, and schedules to satisfy the school escorting model choice.
There are a host of downstream model changes that are involved when including the school escorting model. The following list contains the models that are changed in some way when school escorting is included:
Joint tour scheduling: Joint tours are not allowed to be scheduled over school escort tours. This happens automatically by updating the timetable object with the updated mandatory tour times and created pure escort tour times after the school escorting model is run. There were no code or config changes in this model, but it is still affected by school escorting.
Non-Mandatory tour frequency: Pure school escort tours are joined to the tours created in the non-mandatory tour frequency model and tour statistics (such as tour_count and tour_num) are re-calculated.
Non-Mandatory tour destination: Since the primary destination of pure school escort tours is known, they are removed from the choosers table and have their destination set according to the destination inschool_escort_tours table. They are also excluded from the estimation data bundle.
Non-Mandatory tour scheduling: Pure escort tours need to have the non-escorting portion of their tour scheduled. This is done by inserting availability conditions in the model specification that ensures the alternative chosen for the start of the tour is equal to the alternative start time for outbound tours and the end time is equal to the alternative end time for the inbound tours. There are additional terms that ensure the tour does not overlap with subsequent school escorting tours as well. Beware – If the availability conditions in the school escorting model are not set correctly, the tours created may not be consistent with each other and this model will fail.
Tour mode choice: Availability conditions are set in tour mode choice to prohibit the drive alone mode if the tour contains an escortee and the shared-ride 2 mode if the tour contains more than one escortee.
Stop Frequency: No stops are allowed on half-tours that include school escorting. This is enforced by adding availability conditions in the stop frequency model. After the stop frequency model is run, the school escorting trips are merged from the trips created by the stop frequency model and a new stop frequency is computed along with updated trip numbers.
Trip purpose, destination, and scheduling: Trip purpose, destination, and departure times are known for school escorting trips. As such they are removed from their respective chooser tables and the estimation data bundles, and set according to the values in the school_escort_trips table residing in the pipeline.
Trip mode choice: Like in tour mode choice, availability conditions are set to prohibit trip containing an escortee to use the drive alone mode or the shared-ride 2 mode for trips with more than one escortee.
Many of the changes discussed in the above list are handled in the code and the user is not required to make any changes when implementing the school escorting model. However, it is the users responsibility to include the changes in the following model configuration files for models downstream of the school escorting model:
File Name(s) |
Change(s) Needed |
---|---|
|
|
|
|
|
Do not allow stops for half-tours that include school escorting |
|
|
When not including the school escorting model, all of the escort trips to and from school are counted implicitly in escort tours determined in the non-mandatory tour frequency model. Thus, when including the school escort model and accounting for these tours explicitly, extra care should be taken not to double count them in the non-mandatory tour frequency model. The non-mandatory tour frequency model should be re-evaluated and likely changed to decrease the number of escort tours generated by that model. This was not implemented in the prototype_mtc_extended implementation due to a lack of data surrounding the number of escort tours in the region.
- activitysim.abm.models.school_escorting.check_alts_consistency(alts)#
Checking to ensure that the alternatives file is consistent with the number of chaperones and escortees set in the model settings.
- activitysim.abm.models.school_escorting.create_bundle_attributes(row)#
Parse a bundle to determine escortee numbers and tour info.
- activitysim.abm.models.school_escorting.create_school_escorting_bundles_table(choosers, tours, stage)#
Creates a table that has one row for every school escorting bundle. Additional calculations are performed to help facilitate tour and trip creation including escortee order, times, etc.
- Parameters
- chooserspd.DataFrame
households pre-processed for the school escorting model
- tourspd.Dataframe
mandatory tours
- stagestr
inbound or outbound_cond
- Returns
- bundlespd.DataFrame
one school escorting bundle per row
- activitysim.abm.models.school_escorting.determine_escorting_participants(choosers, persons, model_settings)#
Determining which persons correspond to chauffer 1..n and escortee 1..n. Chauffers are those with the highest weight given by: weight = 100 * person type + 10 * gender + 1*(age > 25) and escortees are selected youngest to oldest.
- activitysim.abm.models.school_escorting.school_escorting(households, households_merged, persons, tours, chunk_size, trace_hh_id)#
school escorting model
The school escorting model determines whether children are dropped-off at or picked-up from school, simultaneously with the driver responsible for chauffeuring the children, which children are bundled together on half-tours, and the type of tour (pure escort versus rideshare).
Run iteratively for an outbound choice, an inbound choice, and an outbound choice conditional on the inbound choice. The choices for inbound and outbound conditional are used to create school escort tours and trips.
Updates / adds the following tables to the pipeline:
- households with school escorting choice - tours including pure school escorting - school_escort_tours which contains only pure school escort tours - school_escort_trips - timetable to avoid joint tours scheduled over school escort tours
Joint Tour Frequency#
The joint tour generation models are divided into three sub-models: the joint tour frequency model, the party composition model, and the person participation model. In the joint tour frequency model, the household chooses the purposes and number (up to two) of its fully joint travel tours. It also creates joints tours in the data pipeline.
The main interface to the joint tour purpose frequency model is the
joint_tour_frequency()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: households
| Result Fields: num_hh_joint_tours
| Skims Keys: NA
- activitysim.abm.models.joint_tour_frequency.joint_tour_frequency(households, persons, chunk_size, trace_hh_id)#
This model predicts the frequency of making fully joint trips (see the alternatives above).
Joint Tour Composition#
In the joint tour party composition model, the makeup of the travel party (adults, children, or mixed - adults and children) is determined for each joint tour. The party composition determines the general makeup of the party of participants in each joint tour in order to allow the micro-simulation to faithfully represent the prevalence of adult-only, children-only, and mixed joint travel tours for each purpose while permitting simplicity in the subsequent person participation model.
The main interface to the joint tour composition model is the
joint_tour_composition()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Fields: composition
| Skims Keys: NA
- activitysim.abm.models.joint_tour_composition.joint_tour_composition(tours, households, persons, chunk_size, trace_hh_id)#
This model predicts the makeup of the travel party (adults, children, or mixed).
Joint Tour Participation#
In the joint tour person participation model, each eligible person sequentially makes a choice to participate or not participate in each joint tour. Since the party composition model determines what types of people are eligible to join a given tour, the person participation model can operate in an iterative fashion, with each household member choosing to join or not to join a travel party independent of the decisions of other household members. In the event that the constraints posed by the result of the party composition model are not met, the person participation model cycles through the household members multiple times until the required types of people have joined the travel party.
This step also creates the joint_tour_participants
table in the pipeline, which stores the
person ids for each person on the tour.
The main interface to the joint tour participation model is the
joint_tour_participation()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Fields: number_of_participants, person_id (for the point person)
| Skims Keys: NA
- activitysim.abm.models.joint_tour_participation.joint_tour_participation(tours, persons_merged, chunk_size, trace_hh_id)#
Predicts for each eligible person to participate or not participate in each joint tour.
- activitysim.abm.models.joint_tour_participation.participants_chooser(probs, choosers, spec, trace_label)#
custom alternative to logit.make_choices for simulate.simple_simulate
Choosing participants for mixed tours is trickier than adult or child tours becuase we need at least one adult and one child participant in a mixed tour. We call logit.make_choices and then check to see if the tour statisfies this requirement, and rechoose for any that fail until all are satisfied.
In principal, this shold always occur eventually, but we fail after MAX_ITERATIONS, just in case there is some failure in program logic (haven’t seen this occur.)
The return values are the same as logit.make_choices
- Parameters
- probspandas.DataFrame
Rows for choosers and columns for the alternatives from which they are choosing. Values are expected to be valid probabilities across each row, e.g. they should sum to 1.
- chooserspandas.dataframe
simple_simulate choosers df
- specpandas.DataFrame
simple_simulate spec df We only need spec so we can know the column index of the ‘participate’ alternative indicating that the participant has been chosen to participate in the tour
- trace_labelstr
- Returns
- choices, rands
choices, rands as returned by logit.make_choices (in same order as probs)
Joint Tour Destination Choice#
The joint tour destination choice model operate similarly to the usual work and school location choice model, selecting the primary destination for travel tours. The only procedural difference between the models is that the usual work and school location choice model selects the usual location of an activity whether or not the activity is undertaken during the travel day, while the joint tour destination choice model selects the location for an activity which has already been generated.
The tour’s primary destination is the location of the activity that is assumed to provide the greatest impetus for engaging in the travel tour. In the household survey, the primary destination was not asked, but rather inferred from the pattern of stops in a closed loop in the respondents’ travel diaries. The inference was made by weighing multiple criteria including a defined hierarchy of purposes, the duration of activities, and the distance from the tour origin. The model operates in the reverse direction, designating the primary purpose and destination and then adding intermediate stops based on spatial, temporal, and modal characteristics of the inbound and outbound journeys to the primary destination.
- The joint tour destination choice model is made up of three model steps:
sample - selects a sample of alternative locations for the next model step. This selects X locations from the full set of model zones using a simple utility.
logsums - starts with the table created above and calculates and adds the mode choice logsum expression for each alternative location.
simulate - starts with the table created above and chooses a final location, this time with the mode choice logsum included.
Joint tour location choice for placeholder_multiple_zone models uses Presampling by default.
The main interface to the model is the joint_tour_destination()
function. This function is registered as an Inject step in the example Pipeline. See Writing Logsums for how
to write logsums for estimation.
Core Table: tours
| Result Fields: destination
| Skims Keys: TAZ, alt_dest, MD time period
- activitysim.abm.models.joint_tour_destination.joint_tour_destination(tours, persons_merged, households_merged, network_los, chunk_size, trace_hh_id)#
Given the tour generation from the above, each tour needs to have a destination, so in this case tours are the choosers (with the associated person that’s making the tour)
Joint Tour Scheduling#
The joint tour scheduling model selects a tour departure and duration period (and therefore a start and end period as well) for each joint tour. This model uses person Person Time Windows. The primary drivers in the models are accessibility-based parameters such as the auto travel time for the departure/arrival hour combination, demographics, and time pattern characteristics such as the time windows available from previously scheduled tours. The joint tour scheduling model does not use mode choice logsums.
The main interface to the joint tour purpose scheduling model is the
joint_tour_scheduling()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: start, end, duration
| Skims Keys: `` TAZ, destination, MD time period, MD time period``
- activitysim.abm.models.joint_tour_scheduling.joint_tour_scheduling(tours, persons_merged, tdd_alts, chunk_size, trace_hh_id)#
This model predicts the departure time and duration of each joint tour
Non-Mandatory Tour Frequency#
The non-mandatory tour frequency model selects the number of non-mandatory tours made by each person on the simulation day. It also adds non-mandatory tours to the tours in the data pipeline. The individual non-mandatory tour frequency model operates in two stages:
A choice is made using a random utility model between combinations of tours containing zero, one, and two or more escort tours, and between zero and one or more tours of each other purpose.
Up to two additional tours of each purpose are added according to fixed extension probabilities.
The main interface to the non-mandatory tour purpose frequency model is the
non_mandatory_tour_frequency()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: persons
| Result Fields: non_mandatory_tour_frequency
| Skims Keys: NA
- activitysim.abm.models.non_mandatory_tour_frequency.extend_tour_counts(persons, tour_counts, alternatives, trace_hh_id, trace_label)#
extend tour counts based on a probability table
counts can only be extended if original count is between 1 and 4 and tours can only be extended if their count is at the max possible (e.g. 2 for escort, 1 otherwise) so escort might be increased to 3 or 4 and other tour types might be increased to 2 or 3
- Parameters
- persons: pandas dataframe
(need this for join columns)
- tour_counts: pandas dataframe
one row per person, once column per tour_type
- alternatives
alternatives from nmtv interaction_simulate only need this to know max possible frequency for a tour type
- trace_hh_id
- trace_label
- Returns
- extended tour_counts
- tour_counts looks like this:
escort shopping othmaint othdiscr eatout social
- parent_id
- 2588676 2 0 0 1 1 0
- 2588677 0 1 0 1 0 0
- activitysim.abm.models.non_mandatory_tour_frequency.non_mandatory_tour_frequency(persons, persons_merged, chunk_size, trace_hh_id)#
This model predicts the frequency of making non-mandatory trips (alternatives for this model come from a separate csv file which is configured by the user) - these trips include escort, shopping, othmaint, othdiscr, eatout, and social trips in various combination.
Non-Mandatory Tour Destination Choice#
The non-mandatory tour destination choice model chooses a destination zone for non-mandatory tours. The three step (sample, logsums, final choice) process also used for mandatory tour destination choice is used for non-mandatory tour destination choice.
Non-mandatory tour location choice for placeholder_multiple_zone models uses Presampling by default.
The main interface to the non-mandatory tour destination choice model is the
non_mandatory_tour_destination()
function. This function is registered as an Inject step in the example Pipeline. See Writing Logsums
for how to write logsums for estimation.
Core Table: tours
| Result Field: destination
| Skims Keys: TAZ, alt_dest, MD time period, MD time period
- activitysim.abm.models.non_mandatory_destination.non_mandatory_tour_destination(tours, persons_merged, network_los, chunk_size, trace_hh_id)#
Given the tour generation from the above, each tour needs to have a destination, so in this case tours are the choosers (with the associated person that’s making the tour)
Non-Mandatory Tour Scheduling#
The non-mandatory tour scheduling model selects a tour departure and duration period (and therefore a start and end period as well) for each non-mandatory tour. This model uses person Person Time Windows. Includes support for Mandatory Tour Scheduling.
The main interface to the non-mandatory tour purpose scheduling model is the
non_mandatory_tour_scheduling()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: start, end, duration
| Skims Keys: TAZ, destination, MD time period, MD time period
- activitysim.abm.models.non_mandatory_scheduling.non_mandatory_tour_scheduling(tours, persons_merged, tdd_alts, chunk_size, trace_hh_id)#
This model predicts the departure time and duration of each activity for non-mandatory tours
Vehicle Allocation#
The vehicle allocation model selects which vehicle would be used for a tour of given occupancy. The alternatives for the vehicle
allocation model consist of the vehicles owned by the household and an additional non household vehicle option. (Zero-auto
households would be assigned the non-household vehicle option since there are no owned vehicles in the household).
A vehicle is selected for each occupancy level set by the user such that different tour modes that have different occupancies could see different operating
characteristics. The output of the vehicle allocation model is appended to the tour table with column names vehicle_occup_{occupancy}
and the values are
the vehicle type selected.
In prototype_mtc_extended, three occupancy levels are used: 1, 2, and 3.5. The auto operating cost
for occupancy level 1 is used in the drive alone mode and drive to transit modes. Occupancy levels 2 and 3.5 are used for shared
ride 2 and shared ride 3+ auto operating costs, respectively. Auto operating costs are selected in the mode choice pre-processors by selecting the allocated
vehicle type data from the vehicles table. If the allocated vehicle type was the non-household vehicle, the auto operating costs uses
the previous default value from prototype_mtc. All trips and atwork subtours use the auto operating cost of the parent tour. Functionality
was added in tour and atwork subtour mode choice to annotate the tour table and create a selected_vehicle
which denotes the actual vehicle used.
If the tour mode does not include a vehicle, then the selected_vehicle
entry is left blank.
The current implementation does not account for possible use of the household vehicles by other household members. Thus, it is possible for a selected vehicle to be used in two separate tours at the same time.
- activitysim.abm.models.vehicle_allocation.annotate_vehicle_allocation(model_settings, trace_label)#
Add columns to the tours table in the pipeline according to spec.
- Parameters
- model_settingsdict
- trace_labelstr
- activitysim.abm.models.vehicle_allocation.get_skim_dict(network_los, choosers)#
Returns a dictionary of skim wrappers to use in expression writing.
Skims have origin as home_zone_id and destination as the tour destination.
- Parameters
- network_losactivitysim.core.los.Network_LOS object
- chooserspd.DataFrame
- Returns
- skimsdict
index is skim wrapper name, value is the skim wrapper
- activitysim.abm.models.vehicle_allocation.vehicle_allocation(persons, households, vehicles, tours, tours_merged, network_los, chunk_size, trace_hh_id)#
Selects a vehicle for each occupancy level for each tour.
Alternatives consist of the up to the number of household vehicles plus one option for non-household vehicles.
The model will be run once for each tour occupancy defined in the model yaml. Output tour table will columns added for each occupancy level.
The user may also augment the tours tables with new vehicle type-based fields specified via the annotate_tours option.
- Parameters
- personsorca.DataFrameWrapper
- householdsorca.DataFrameWrapper
- vehiclesorca.DataFrameWrapper
- vehicles_mergedorca.DataFrameWrapper
- toursorca.DataFrameWrapper
- tours_mergedorca.DataFrameWrapper
- chunk_sizeorca.injectable
- trace_hh_idorca.injectable
Tour Mode Choice#
The mandatory, non-mandatory, and joint tour mode choice model assigns to each tour the “primary” mode that is used to get from the origin to the primary destination. The tour-based modeling approach requires a reconsideration of the conventional mode choice structure. Instead of a single mode choice model used in a four-step structure, there are two different levels where the mode choice decision is modeled: (a) the tour mode level (upper-level choice); and, (b) the trip mode level (lower-level choice conditional upon the upper-level choice).
The mandatory, non-mandatory, and joint tour mode level represents the decisions that apply to the entire tour, and that will affect the alternatives available for each individual trip or joint trip. These decisions include the choice to use a private car versus using public transit, walking, or biking; whether carpooling will be considered; and whether transit will be accessed by car or by foot. Trip-level decisions correspond to details of the exact mode used for each trip, which may or may not change over the trips in the tour.
The mandatory, non-mandatory, and joint tour mode choice structure is a nested logit model which separates similar modes into different nests to more accurately model the cross-elasticities between the alternatives. The eighteen modes are incorporated into the nesting structure specified in the model settings file. The first level of nesting represents the use a private car, non-motorized means, or transit. In the second level of nesting, the auto nest is divided into vehicle occupancy categories, and transit is divided into walk access and drive access nests. The final level splits the auto nests into free or pay alternatives and the transit nests into the specific line-haul modes.
The primary variables are in-vehicle time, other travel times, cost (the influence of which is derived from the automobile in-vehicle time coefficient and the persons’ modeled value of time), characteristics of the destination zone, demographics, and the household’s level of auto ownership.
The main interface to the mandatory, non-mandatory, and joint tour mode model is the
tour_mode_choice_simulate()
function. This function is
called in the Inject step tour_mode_choice_simulate
and is registered as an Inject step in the example Pipeline.
See Writing Logsums for how to write logsums for estimation.
Core Table: tours
| Result Field: mode
| Skims Keys: TAZ, destination, start, end
- activitysim.abm.models.tour_mode_choice.append_tour_leg_trip_mode_choice_logsums(tours)#
Creates trip mode choice logsum column in tours table for each tour mode and leg
- Parameters
- tourspd.DataFrame
- Returns
- tourspd.DataFrame
Adds two * n_modes logsum columns to each tour row, e.g. “logsum_DRIVE_outbound”
- activitysim.abm.models.tour_mode_choice.create_logsum_trips(tours, segment_column_name, model_settings, trace_label)#
Construct table of trips from half-tours (1 inbound, 1 outbound) for each tour-mode.
- Parameters
- tourspandas.DataFrame
- segment_column_namestr
column in tours table used for segmenting model spec
- model_settingsdict
- trace_labelstr
- Returns
- pandas.DataFrame
Table of trips: 2 per tour, with O/D and purpose inherited from tour
- activitysim.abm.models.tour_mode_choice.get_alts_from_segmented_nested_logit(model_settings, segment_name, trace_label)#
Infer alts from logit spec
- Parameters
- model_settingsdict
- segment_column_namestr
- trace_labelstr
- Returns
- list
- activitysim.abm.models.tour_mode_choice.get_trip_mc_logsums_for_all_modes(tours, segment_column_name, model_settings, trace_label)#
Creates pseudo-trips from tours and runs trip mode choice to get logsums
- Parameters
- tourspandas.DataFrame
- segment_column_namestr
column in tours table used for segmenting model spec
- model_settingsdict
- trace_labelstr
- Returns
- tourspd.DataFrame
Adds two * n_modes logsum columns to each tour row, e.g. “logsum_DRIVE_outbound”
- activitysim.abm.models.tour_mode_choice.logger = <Logger activitysim.abm.models.tour_mode_choice (WARNING)>#
Tour mode choice is run for all tours to determine the transportation mode that will be used for the tour
- activitysim.abm.models.tour_mode_choice.tour_mode_choice_simulate(tours, persons_merged, network_los, chunk_size, trace_hh_id)#
Tour mode choice simulate
At-work Subtours Frequency#
The at-work subtour frequency model selects the number of at-work subtours made for each work tour. It also creates at-work subtours by adding them to the tours table in the data pipeline. These at-work sub-tours are travel tours taken during the workday with their origin at the work location, rather than from home. Explanatory variables include employment status, income, auto ownership, the frequency of other tours, characteristics of the parent work tour, and characteristics of the workplace zone.
Choosers: work tours Alternatives: none, 1 eating out tour, 1 business tour, 1 maintenance tour, 2 business tours, 1 eating out tour + 1 business tour Dependent tables: household, person, accessibility Outputs: work tour subtour frequency choice, at-work tours table (with only tour origin zone at this point)
The main interface to the at-work subtours frequency model is the
atwork_subtour_frequency()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: atwork_subtour_frequency
| Skims Keys: NA
- activitysim.abm.models.atwork_subtour_frequency.atwork_subtour_frequency(tours, persons_merged, chunk_size, trace_hh_id)#
This model predicts the frequency of making at-work subtour tours (alternatives for this model come from a separate csv file which is configured by the user).
At-work Subtours Destination Choice#
The at-work subtours destination choice model is made up of three model steps:
sample - selects a sample of alternative locations for the next model step. This selects X locations from the full set of model zones using a simple utility.
logsums - starts with the table created above and calculates and adds the mode choice logsum expression for each alternative location.
simulate - starts with the table created above and chooses a final location, this time with the mode choice logsum included.
At-work subtour location choice for placeholder_multiple_zone models uses Presampling by default.
Core Table: tours
| Result Table: destination
| Skims Keys: workplace_taz, alt_dest, MD time period
The main interface to the at-work subtour destination model is the
atwork_subtour_destination()
function. This function is registered as an Inject step in the example Pipeline.
See Writing Logsums for how to write logsums for estimation.
At-work Subtour Scheduling#
The at-work subtours scheduling model selects a tour departure and duration period (and therefore a start and end period as well) for each at-work subtour. This model uses person Person Time Windows.
This model is the same as the mandatory tour scheduling model except it operates on the at-work tours and constrains the alternative set to available person Person Time Windows. The at-work subtour scheduling model does not use mode choice logsums. The at-work subtour frequency model can choose multiple tours so this model must process all first tours and then second tours since isFirstAtWorkTour is an explanatory variable.
Choosers: at-work tours Alternatives: alternative departure time and arrival back at origin time pairs WITHIN the work tour departure time and arrival time back at origin AND the person time window. If no time window is available for the tour, make the first and last time periods within the work tour available, make the choice, and log the number of times this occurs. Dependent tables: skims, person, land use, work tour Outputs: at-work tour departure time and arrival back at origin time, updated person time windows
The main interface to the at-work subtours scheduling model is the
atwork_subtour_scheduling()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: start, end, duration
| Skims Keys: workplace_taz, alt_dest, MD time period, MD time period
- activitysim.abm.models.atwork_subtour_scheduling.atwork_subtour_scheduling(tours, persons_merged, tdd_alts, skim_dict, chunk_size, trace_hh_id)#
This model predicts the departure time and duration of each activity for at work subtours tours
At-work Subtour Mode#
The at-work subtour mode choice model assigns a travel mode to each at-work subtour using the Tour Mode Choice model.
The main interface to the at-work subtour mode choice model is the
atwork_subtour_mode_choice()
function. This function is called in the Inject step atwork_subtour_mode_choice
and
is registered as an Inject step in the example Pipeline.
See Writing Logsums for how to write logsums for estimation.
Core Table: tour
| Result Field: tour_mode
| Skims Keys: workplace_taz, destination, start, end
- activitysim.abm.models.atwork_subtour_mode_choice.atwork_subtour_mode_choice(tours, persons_merged, network_los, chunk_size, trace_hh_id)#
At-work subtour mode choice simulate
Intermediate Stop Frequency#
The stop frequency model assigns to each tour the number of intermediate destinations a person will travel to on each leg of the tour from the origin to tour primary destination and back. The model incorporates the ability for more than one stop in each direction, up to a maximum of 3, for a total of 8 trips per tour (four on each tour leg).
Intermediate stops are not modeled for drive-transit tours because doing so can have unintended consequences because of the difficulty of tracking the location of the vehicle. For example, consider someone who used a park and ride for work and then took transit to an intermediate shopping stop on the way home. Without knowing the vehicle location, it cannot be determined if it is reasonable to allow the person to drive home. Even if the tour were constrained to allow driving only on the first and final trip, the trip home from an intermediate stop may not use the same park and ride where the car was dropped off on the outbound leg, which is usually as close as possible to home because of the impracticality of coding drive access links from every park and ride lot to every zone.
This model also creates a trips table in the pipeline for later models.
The main interface to the intermediate stop frequency model is the
stop_frequency()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: tours
| Result Field: stop_frequency
| Skims Keys: NA
- activitysim.abm.models.stop_frequency.stop_frequency(tours, tours_merged, stop_frequency_alts, network_los, chunk_size, trace_hh_id)#
stop frequency model
For each tour, shoose a number of intermediate inbound stops and outbound stops. Create a trip table with inbound and outbound trips.
Thus, a tour with stop_frequency ‘2out_0in’ will have two outbound and zero inbound stops, and four corresponding trips: three outbound, and one inbound.
Adds stop_frequency str column to trips, with fields
creates trips table with columns:
- person_id - household_id - tour_id - primary_purpose - atwork - trip_num - outbound - trip_count
Trip Purpose#
For trip other than the last trip outbound or inbound, assign a purpose based on an observed frequency distribution. The distribution is segmented by tour purpose, tour direction and person type. Work tours are also segmented by departure or arrival time period.
The main interface to the trip purpose model is the
trip_purpose()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: trips
| Result Field: purpose
| Skims Keys: NA
Note
Trip purpose and trip destination choice can be run iteratively together via Trip Purpose and Destination.
- activitysim.abm.models.trip_purpose.choose_intermediate_trip_purpose(trips, probs_spec, estimator, probs_join_cols, use_depart_time, trace_hh_id, trace_label)#
chose purpose for intermediate trips based on probs_spec which assigns relative weights (summing to 1) to the possible purpose choices
- Returns
- purpose: pandas.Series of purpose (str) indexed by trip_id
- activitysim.abm.models.trip_purpose.run_trip_purpose(trips_df, estimator, chunk_size, trace_hh_id, trace_label)#
trip purpose - main functionality separated from model step so it can be called iteratively
For each intermediate stop on a tour (i.e. trip other than the last trip outbound or inbound) each trip is assigned a purpose based on an observed frequency distribution
The distribution should always be segmented by tour purpose and tour direction. By default it is also segmented by person type. The join columns can be overwritten using the “probs_join_cols” parameter in the model settings. The model will attempt to segment by trip depart time as well if necessary and depart time ranges are specified in the probability lookup table.
- Returns
- purpose: pandas.Series of purpose (str) indexed by trip_id
- activitysim.abm.models.trip_purpose.trip_purpose(trips, chunk_size, trace_hh_id)#
trip purpose model step - calls run_trip_purpose to run the actual model
adds purpose column to trips
Trip Destination Choice#
See Trip Destination.
Trip Purpose and Destination#
After running trip purpose and trip destination separately, the two model can be ran together in an iterative fashion on the remaining failed trips (i.e. trips that cannot be assigned a destination). Each iteration uses new random numbers.
The main interface to the trip purpose model is the
trip_purpose_and_destination()
function. This function is registered as an Inject step in the example Pipeline.
Core Table: trips
| Result Field: purpose, destination
| Skims Keys: origin, (tour primary) destination, dest_taz, trip_period
Trip Scheduling (Probablistic)#
For each trip, assign a departure hour based on an input lookup table of percents by tour purpose, direction (inbound/outbound), tour hour, and trip index.
The tour hour is the tour start hour for outbound trips and the tour end hour for inbound trips. The trip index is the trip sequence on the tour, with up to four trips per half tour
For outbound trips, the trip depart hour must be greater than or equal to the previously selected trip depart hour
For inbound trips, trips are handled in reverse order from the next-to-last trip in the leg back to the first. The tour end hour serves as the anchor time point from which to start assigning trip time periods.
Outbound trips on at-work subtours are assigned the tour depart hour and inbound trips on at-work subtours are assigned the tour end hour.
The assignment of trip depart time is run iteratively up to a max number of iterations since it is possible that the time period selected for an earlier trip in a half-tour makes selection of a later trip time period impossible (or very low probability). Thus, the sampling is re-run until a feasible set of trip time periods is found. If a trip can’t be scheduled after the max iterations, then the trip is assigned the previous trip’s choice (i.e. assumed to happen right after the previous trip) or dropped, as configured by the user. The trip scheduling model does not use mode choice logsums.
Alternatives: Available time periods in the tour window (i.e. tour start and end period). When processing stops on work tours, the available time periods is constrained by the at-work subtour start and end period as well.
The main interface to the trip scheduling model is the
trip_scheduling()
function.
This function is registered as an Inject step in the example Pipeline.
Core Table: trips
| Result Field: depart
| Skims Keys: NA
- activitysim.abm.models.trip_scheduling.logger = <Logger activitysim.abm.models.trip_scheduling (WARNING)>#
StopDepartArrivePeriodModel
StopDepartArriveProportions.csv tourpurp,isInbound,interval,trip,p1,p2,p3,p4,p5…p40
- activitysim.abm.models.trip_scheduling.schedule_trips_in_leg(outbound, trips, probs_spec, model_settings, is_last_iteration, trace_hh_id, trace_label)#
- Parameters
- outbound
- trips
- probs_spec
- depart_alt_base
- is_last_iteration
- trace_hh_id
- trace_label
- Returns
- choices: pd.Series
depart choice for trips, indexed by trip_id
- activitysim.abm.models.trip_scheduling.set_stop_num(trips)#
Convert trip_num to stop_num in order to work with duration-based probs that are keyed on stop num. For outbound trips, trip n chooses the duration of stop n-1 (the trip origin). For inbound trips, trip n chooses the duration of stop n (the trip destination). This means outbound trips technically choose a departure time while inbound trips choose an arrival.
- activitysim.abm.models.trip_scheduling.set_tour_hour(trips, tours)#
add columns ‘tour_hour’, ‘earliest’, ‘latest’ to trips
- Parameters
- trips: pd.DataFrame
- tours: pd.DataFrame
- Returns
- modifies trips in place
- activitysim.abm.models.trip_scheduling.trip_scheduling(trips, tours, chunk_size, trace_hh_id)#
Trip scheduling assigns depart times for trips within the start, end limits of the tour.
The algorithm is simplistic:
The first outbound trip starts at the tour start time, and subsequent outbound trips are processed in trip_num order, to ensure that subsequent trips do not depart before the trip that preceeds them.
Inbound trips are handled similarly, except in reverse order, starting with the last trip, and working backwards to ensure that inbound trips do not depart after the trip that succeeds them.
The probability spec assigns probabilities for depart times, but those possible departs must be clipped to disallow depart times outside the tour limits, the departs of prior trips, and in the case of work tours, the start/end times of any atwork subtours.
Scheduling can fail if the probability table assigns zero probabilities to all the available depart times in a trip’s depart window. (This could be avoided by giving every window a small probability, rather than zero, but the existing mtctm1 prob spec does not do this. I believe this is due to the its having been generated from a small household travel survey sample that lacked any departs for some time periods.)
Rescheduling the trips that fail (along with their inbound or outbound leg-mates) can sometimes fix this problem, if it was caused by an earlier trip’s depart choice blocking a subsequent trip’s ability to schedule a depart within the resulting window. But it can also happen if a tour is very short (e.g. one time period) and the prob spec having a zero probability for that tour hour.
Therefore we need to handle trips that could not be scheduled. There are two ways (at least) to solve this problem:
1) choose_most_initial simply assign a depart time to the trip, even if it has a zero probability. It makes most sense, in this case, to assign the ‘most initial’ depart time, so that subsequent trips are minimally impacted. This can be done in the final iteration, thus affecting only the trips that could no be scheduled by the standard approach
2) drop_and_cleanup drop trips that could no be scheduled, and adjust their leg mates, as is done for failed trips in trip_destination.
Which option is applied is determined by the FAILFIX model setting
- activitysim.abm.models.trip_scheduling.update_tour_earliest(trips, outbound_choices)#
Updates “earliest” column for inbound trips based on the maximum outbound trip departure time of the tour. This is done to ensure inbound trips do not depart before the last outbound trip of a tour.
- Parameters
- trips: pd.DataFrame
- outbound_choices: pd.Series
time periods depart choices, one per trip (except for trips with zero probs)
- Returns
- ——-
- modifies trips in place
Trip Scheduling Choice (Logit Choice)#
This model uses a logit-based formulation to determine potential trip windows for the three main components of a tour.
Outbound Leg: The time from leaving the origin location to the time second to last outbound stop.
Main Leg: The time window from the last outbound stop through the main tour destination to the first inbound stop.
Inbound Leg: The time window from the first inbound stop to the tour origin location.
Core Table: tours
| Result Field: outbound_duration
, main_leg_duration
, inbound_duration
| Skims Keys: NA
Required YAML attributes:
SPECIFICATION
This file defines the logit specification for each chooser segment.
COEFFICIENTS
Specification coefficients
PREPROCESSOR
:Preprocessor definitions to run on the chooser dataframe (trips) before the model is run
Trip Departure Choice (Logit Choice)#
Used in conjuction with Trip Scheduling Choice (Logit Choice), this model chooses departure time periods consistent with the time windows for the appropriate leg of the trip.
Core Table: trips
| Result Field: depart
| Skims Keys: NA
Required YAML attributes:
SPECIFICATION
This file defines the logit specification for each chooser segment.
COEFFICIENTS
Specification coefficients
PREPROCESSOR
:Preprocessor definitions to run on the chooser dataframe (trips) before the model is run
Trip Mode Choice#
The trip mode choice model assigns a travel mode for each trip on a given tour. It operates similarly to the tour mode choice model, but only certain trip modes are available for each tour mode. The correspondence rules are defined according to the following principles:
Pay trip modes are only available for pay tour modes (for example, drive-alone pay is only available at the trip mode level if drive-alone pay is selected as a tour mode).
The auto occupancy of the tour mode is determined by the maximum occupancy across all auto trips that make up the tour. Therefore, the auto occupancy for the tour mode is the maximum auto occupancy for any trip on the tour.
Transit tours can include auto shared-ride trips for particular legs. Therefore, ‘casual carpool’, wherein travelers share a ride to work and take transit back to the tour origin, is explicitly allowed in the tour/trip mode choice model structure.
The walk mode is allowed for any trip.
The availability of transit line-haul submodes on transit tours depends on the skimming and tour mode choice hierarchy. Free shared-ride modes are also available in walk-transit tours, albeit with a low probability. Paid shared-ride modes are not allowed on transit tours because no stated preference data is available on the sensitivity of transit riders to automobile value tolls, and no observed data is available to verify the number of people shifting into paid shared-ride trips on transit tours.
The trip mode choice models explanatory variables include household and person variables, level-of-service between the trip origin and destination according to the time period for the tour leg, urban form variables, and alternative-specific constants segmented by tour mode.
The main interface to the trip mode choice model is the
trip_mode_choice()
function. This function
is registered as an Inject step in the example Pipeline. See Writing Logsums for how to write logsums for estimation.
Core Table: trips
| Result Field: trip_mode
| Skims Keys: origin, destination, trip_period
- activitysim.abm.models.trip_mode_choice.trip_mode_choice(trips, network_los, chunk_size, trace_hh_id)#
Trip mode choice - compute trip_mode (same values as for tour_mode) for each trip.
Modes for each primary tour putpose are calculated separately because they have different coefficient values (stored in trip_mode_choice_coefficients.csv coefficient file.)
Adds trip_mode column to trip table
Parking Location Choice#
The parking location choice model selects a parking location for specified trips. While the model does not require parking location be applied to any specific set of trips, it is usually applied for drive trips to specific zones (e.g., CBD) in the model.
The model provides provides a filter for both the eligible choosers and eligible parking location zone. The trips dataframe is the chooser of this model. The zone selection filter is applied to the land use zones dataframe.
If this model is specified in the pipeline, the Write Trip Matrices step will using the parking location choice results to build trip tables in lieu of the trip destination.
The main interface to the trip mode choice model is the
parking_location_choice()
function. This function
is registered as an Inject step, and it is available from the pipeline. See Writing Logsums for how to write
logsums for estimation.
Skims
odt_skims
: Origin to Destination by Time of Daydot_skims
: Destination to Origin by Time of Dayopt_skims
: Origin to Parking Zone by Time of Daypdt_skims
: Parking Zone to Destination by Time of Dayod_skims
: Origin to Destinationdo_skims
: Destination to Originop_skims
: Origin to Parking Zonepd_skims
: Parking Zone to Destination
Core Table: trips
Required YAML attributes:
SPECIFICATION
This file defines the logit specification for each chooser segment.
COEFFICIENTS
Specification coefficients
PREPROCESSOR
:Preprocessor definitions to run on the chooser dataframe (trips) before the model is run
CHOOSER_FILTER_COLUMN_NAME
Boolean field on the chooser table defining which choosers are eligible to parking location choice model. If no filter is specified, all choosers (trips) are eligible for the model.
CHOOSER_SEGMENT_COLUMN_NAME
Column on the chooser table defining the parking segment for the logit model
SEGMENTS
List of eligible chooser segments in the logit specification
ALTERNATIVE_FILTER_COLUMN_NAME
Boolean field used to filter land use zones as eligible parking location choices. If no filter is specified, then all land use zones are considered as viable choices.
ALT_DEST_COL_NAME
The column name to append with the parking location choice results. For choosers (trips) ineligible for this model, a -1 value will be placed in column.
TRIP_ORIGIN
Origin field on the chooser trip table
TRIP_DESTINATION
Destination field on the chooser trip table
- activitysim.abm.models.parking_location_choice.parking_destination_simulate(segment_name, trips, destination_sample, model_settings, skims, chunk_size, trace_hh_id, trace_label)#
Chose destination from destination_sample (with od_logsum and dp_logsum columns added)
- Returns
- choices - pandas.Series
destination alt chosen
- activitysim.abm.models.parking_location_choice.parking_location(trips, trips_merged, land_use, network_los, chunk_size, trace_hh_id)#
Given a set of trips, each trip needs to have a parking location if it is eligible for remote parking.
- activitysim.abm.models.parking_location_choice.wrap_skims(model_settings)#
wrap skims of trip destination using origin, dest column names from model settings. Various of these are used by destination_sample, compute_logsums, and destination_simulate so we create them all here with canonical names.
Note that compute_logsums aliases their names so it can use the same equations to compute logsums from origin to alt_dest, and from alt_dest to primarly destination
odt_skims - SkimStackWrapper: trip origin, trip alt_dest, time_of_day dot_skims - SkimStackWrapper: trip alt_dest, trip origin, time_of_day dpt_skims - SkimStackWrapper: trip alt_dest, trip primary_dest, time_of_day pdt_skims - SkimStackWrapper: trip primary_dest,trip alt_dest, time_of_day od_skims - SkimDictWrapper: trip origin, trip alt_dest dp_skims - SkimDictWrapper: trip alt_dest, trip primary_dest
- Parameters
- model_settings
- Returns
- dict containing skims, keyed by canonical names relative to tour orientation
Write Trip Matrices#
Write open matrix (OMX) trip matrices for assignment. Reads the trips table post preprocessor and run expressions
to code additional data fields, with one data fields for each matrix specified. The matrices are scaled by a
household level expansion factor, which is the household sample rate by default, which is calculated when
households are read in at the beginning of a model run. The main interface to write trip
matrices is the write_trip_matrices()
function. This function
is registered as an Inject step in the example Pipeline.
If the Parking Location Choice model is defined in the pipeline, the parking location zone will be used in lieu of the destination zone.
Core Table: trips
| Result: omx trip matrices
| Skims Keys: origin, destination
- activitysim.abm.models.trip_matrices.annotate_trips(trips, network_los, model_settings)#
Add columns to local trips table. The annotator has access to the origin/destination skims and everything defined in the model settings CONSTANTS.
Pipeline tables can also be accessed by listing them under TABLES in the preprocessor settings.
- activitysim.abm.models.trip_matrices.write_matrices(aggregate_trips, zone_index, orig_index, dest_index, model_settings, is_tap=False)#
Write aggregated trips to OMX format.
The MATRICES setting lists the new OMX files to write. Each file can contain any number of ‘tables’, each specified by a table key (‘name’) and a trips table column (‘data_field’) to use for aggregated counts.
Any data type may be used for columns added in the annotation phase, but the table ‘data_field’s must be summable types: ints, floats, bools.
- activitysim.abm.models.trip_matrices.write_trip_matrices(network_los)#
Write trip matrices step.
Adds boolean columns to local trips table via annotation expressions, then aggregates trip counts and writes OD matrices to OMX. Save annotated trips table to pipeline if desired.
Writes taz trip tables for one and two zone system. Writes taz and tap trip tables for three zone system. Add
is_tap:True
to the settings file to identify an output matrix as tap level trips as opposed to taz level trips.For one zone system, uses the land use table for the set of possible tazs. For two zone system, uses the taz skim zone names for the set of possible tazs. For three zone system, uses the taz skim zone names for the set of possible tazs and uses the tap skim zone names for the set of possible taps.
Util#
Additional helper classes
CDAP#
- activitysim.abm.models.util.cdap.add_interaction_column(choosers, p_tup)#
Add an interaction column in place to choosers, listing the ptypes of the persons in p_tup
The name of the interaction column will be determined by the cdap_ranks from p_tup, and the rows in the column contain the ptypes of those persons in that household row.
For instance, for p_tup = (1,3) choosers interaction column name will be ‘p1_p3’
For a household where person 1 is part-time worker (ptype=2) and person 3 is infant (ptype 8) the corresponding row value interaction code will be 28
We take advantage of the fact that interactions are symmetrical to simplify spec expressions: We name the interaction_column in increasing pnum (cdap_rank) order (p1_p2 and not p3_p1) And we format row values in increasing ptype order (28 and not 82) This simplifies the spec expressions as we don’t have to test for p1_p3 == 28 | p1_p3 == 82
- Parameters
- chooserspandas.DataFrame
household choosers, indexed on _hh_index_ choosers should contain columns ptype_p1, ptype_p2 for each cdap_rank person in hh
- p_tupint tuple
tuple specifying the cdap_ranks for the interaction column p_tup = (1,3) means persons with cdap_rank 1 and 3
- Returns
- activitysim.abm.models.util.cdap.add_pn(col, pnum)#
return the canonical column name for the indiv_util column or columns in merged hh_chooser df for individual with cdap_rank pnum
e.g. M_p1, ptype_p2 but leave _hh_id_ column unchanged
- activitysim.abm.models.util.cdap.assign_cdap_rank(persons, person_type_map, trace_hh_id=None, trace_label=None)#
Assign an integer index, cdap_rank, to each household member. (Starting with 1, not 0)
Modifies persons df in place
The cdap_rank order is important, because cdap only assigns activities to the first MAX_HHSIZE persons in each household.
This will preferentially be two working adults and the three youngest children.
Rank is assigned starting at 1. This necessitates some care indexing, but is preferred as it follows the convention of 1-based pnums in expression files.
According to the documentation of reOrderPersonsForCdap in mtctm2.abm.ctramp HouseholdCoordinatedDailyActivityPatternModel:
“Method reorders the persons in the household for use with the CDAP model, which only explicitly models the interaction of five persons in a HH. Priority in the reordering is first given to full time workers (up to two), then to part time workers (up to two workers, of any type), then to children (youngest to oldest, up to three). If the method is called for a household with less than 5 people, the cdapPersonArray is the same as the person array.”
We diverge from the above description in that a cdap_rank is assigned to all persons, including ‘extra’ household members, whose activity is assigned subsequently. The pair _hh_id_, cdap_rank will uniquely identify each household member.
- Parameters
- personspandas.DataFrame
Table of persons data. Must contain columns _hh_size_, _hh_id_, _ptype_, _age_
- Returns
- cdap_rankpandas.Series
integer cdap_rank of every person, indexed on _persons_index_
- activitysim.abm.models.util.cdap.build_cdap_spec(interaction_coefficients, hhsize, trace_spec=False, trace_label=None, cache=True)#
Build a spec file for computing utilities of alternative household member interaction patterns for households of specified size.
We generate this spec automatically from a table of rules and coefficients because the interaction rules are fairly simple and can be expressed compactly whereas there is a lot of redundancy between the spec files for different household sizes, as well as in the vectorized expression of the interaction alternatives within the spec file itself
- interaction_coefficients has five columns:
- activity
A single character activity type name (M, N, or H)
- interaction_ptypes
List of ptypes in the interaction (in order of increasing ptype) or empty for wildcards (meaning that the interaction applies to all ptypes in that size hh)
- cardinality
the number of persons in the interaction (e.g. 3 for a 3-way interaction)
- slug
a human friendly efficient name so we can dump a readable spec trace file for debugging this slug is replaced with the numerical coefficient value after we dump the trace file
- coefficient
The coefficient to apply for all hh interactions for this activity and set of ptypes
The generated spec will have the eval expression in the index, and a utility column for each alternative (e.g. [‘HH’, ‘HM’, ‘HN’, ‘MH’, ‘MM’, ‘MN’, ‘NH’, ‘NM’, ‘NN’] for hhsize 2)
In order to be able to dump the spec in a human-friendly fashion to facilitate debugging the cdap_interaction_coefficients table, we first populate utility columns in the spec file with the coefficient slugs, dump the spec file, and then replace the slugs with coefficients.
- Parameters
- interaction_coefficientspandas.DataFrame
Rules and coefficients for generating interaction specs for different household sizes
- hhsizeint
household size for which the spec should be built.
- Returns
- spec: pandas.DataFrame
- activitysim.abm.models.util.cdap.extra_hh_member_choices(persons, cdap_fixed_relative_proportions, locals_d, trace_hh_id, trace_label)#
Generate the activity choices for the ‘extra’ household members who weren’t handled by cdap
Following the CTRAMP HouseholdCoordinatedDailyActivityPatternModel, “a separate, simple cross-sectional distribution is looked up for the remaining household members”
The cdap_fixed_relative_proportions spec is handled like an activitysim logit utility spec, EXCEPT that the values computed are relative proportions, not utilities (i.e. values are not exponentiated before being normalized to probabilities summing to 1.0)
- Parameters
- personspandas.DataFrame
- Table of persons data indexed on _persons_index_
We expect, at least, columns [_hh_id_, _ptype_]
- cdap_fixed_relative_proportions
spec to compute/specify the relative proportions of each activity (M, N, H) that should be used to choose activities for additional household members not handled by CDAP.
- locals_dDict
dictionary of local variables that eval_variables adds to the environment for an evaluation of an expression that begins with @
- Returns
- choicespandas.Series
list of alternatives chosen for all extra members, indexed by _persons_index_
- activitysim.abm.models.util.cdap.hh_choosers(indiv_utils, hhsize)#
Build a chooser table for calculating house utilities for all households of specified hhsize
The choosers table will have one row per household with columns containing the indiv_utils for all non-extra (i.e. cdap_rank <- MAX_HHSIZE) persons. That makes 3 columns for each individual. e.g. the utilities of person with cdap_rank 1 will be included as M_p1, N_p1, H_p1
The chooser table will also contain interaction columns for all possible interactions involving from 2 to 3 persons (actually MAX_INTERACTION_CARDINALITY, which is currently 3).
The interaction columns list the ptypes of the persons in the interaction set, sorted by ptype. For instance the interaction between persons with cdap_rank 1 and three and ptypes will be listed in a column named ‘p1_p3’ and for a household where persons p1 and p3 are 2 and 4 will a row value of 24 in the p1_p3 column.
- Parameters
- indiv_utilspandas.DataFrame
CDAP utilities for each individual, ignoring interactions. ind_utils has index of _persons_index_ and a column for each alternative i.e. three columns ‘M’ (Mandatory), ‘N’ (NonMandatory), ‘H’ (Home)
- hhsizeint
household size for which the choosers table should be built. Households with more than MAX_HHSIZE members will be included with MAX_HHSIZE choosers since the are handled the same, and the activities of the extra members are assigned afterwards
- Returns
- chooserspandas.DataFrame
choosers households of hhsize with activity utility columns interaction columns for all (non-extra) household members
- activitysim.abm.models.util.cdap.household_activity_choices(indiv_utils, interaction_coefficients, hhsize, trace_hh_id=None, trace_label=None)#
Calculate household utilities for each activity pattern alternative for households of hhsize The resulting activity pattern for each household will be coded as a string of activity codes. e.g. ‘MNHH’ for a 4 person household with activities Mandatory, NonMandatory, Home, Home
- Parameters
- indiv_utilspandas.DataFrame
CDAP utilities for each individual, ignoring interactions ind_utils has index of _persons_index_ and a column for each alternative i.e. three columns ‘M’ (Mandatory), ‘N’ (NonMandatory), ‘H’ (Home)
- interaction_coefficientspandas.DataFrame
Rules and coefficients for generating interaction specs for different household sizes
- hhsizeint
the size of household for which activity perttern should be calculated (1..MAX_HHSIZE)
- Returns
- choicespandas.Series
the chosen cdap activity pattern for each household represented as a string (e.g. ‘MNH’) with same index (_hh_index_) as utils
- activitysim.abm.models.util.cdap.individual_utilities(persons, cdap_indiv_spec, locals_d, trace_hh_id=None, trace_label=None)#
Calculate CDAP utilities for all individuals.
- Parameters
- personspandas.DataFrame
DataFrame of individual persons data.
- cdap_indiv_specpandas.DataFrame
CDAP spec applied to individuals.
- Returns
- utilitiespandas.DataFrame
Will have index of persons and columns for each of the alternatives. plus some ‘useful columns’ [_hh_id_, _ptype_, ‘cdap_rank’, _hh_size_]
- activitysim.abm.models.util.cdap.preprocess_interaction_coefficients(interaction_coefficients)#
The input cdap_interaction_coefficients.csv file has three columns:
- activity
A single character activity type name (M, N, or H)
- interaction_ptypes
List of ptypes in the interaction (in order of increasing ptype) Stars (***) instead of ptypes means the interaction applies to all ptypes in that size hh.
- coefficient
The coefficient to apply for all hh interactions for this activity and set of ptypes
To facilitate building the spec for a given hh ssize, we add two additional columns:
- cardinality
the number of persons in the interaction (e.g. 3 for a 3-way interaction)
- slug
a human friendly efficient name so we can dump a readable spec trace file for debugging this slug is then replaced with the numerical coefficient value prior to evaluation
- activitysim.abm.models.util.cdap.run_cdap(persons, person_type_map, cdap_indiv_spec, cdap_interaction_coefficients, cdap_fixed_relative_proportions, locals_d, chunk_size=0, trace_hh_id=None, trace_label=None)#
Choose individual activity patterns for persons.
- Parameters
- personspandas.DataFrame
Table of persons data. Must contain at least a household ID, household size, person type category, and age, plus any columns used in cdap_indiv_spec
- cdap_indiv_specpandas.DataFrame
CDAP spec for individuals without taking any interactions into account.
- cdap_interaction_coefficientspandas.DataFrame
Rules and coefficients for generating interaction specs for different household sizes
- cdap_fixed_relative_proportionspandas.DataFrame
Spec to for the relative proportions of each activity (M, N, H) to choose activities for additional household members not handled by CDAP
- locals_dDict
This is a dictionary of local variables that will be the environment for an evaluation of an expression that begins with @ in either the cdap_indiv_spec or cdap_fixed_relative_proportions expression files
- chunk_size: int
Chunk size or 0 for no chunking
- trace_hh_idint
hh_id to trace or None if no hh tracing
- trace_labelstr
label for tracing or None if no tracing
- Returns
- choicespandas.DataFrame
dataframe is indexed on _persons_index_ and has two columns:
- cdap_activitystr
activity for that person expressed as ‘M’, ‘N’, ‘H’
- activitysim.abm.models.util.cdap.unpack_cdap_indiv_activity_choices(persons, hh_choices, trace_hh_id, trace_label)#
Unpack the household activity choice list into choices for each (non-extra) household member
- Parameters
- personspandas.DataFrame
Table of persons data indexed on _persons_index_ We expect, at least, columns [_hh_id_, ‘cdap_rank’]
- hh_choicespandas.Series
household activity pattern is encoded as a string (of length hhsize) of activity codes e.g. ‘MNHH’ for a 4 person household with activities Mandatory, NonMandatory, Home, Home
- Returns
- cdap_indiv_activity_choicespandas.Series
series contains one activity per individual hh member, indexed on _persons_index_
Estimation#
See Estimation for more information.
Logsums#
- activitysim.abm.models.util.logsums.compute_logsums(choosers, tour_purpose, logsum_settings, model_settings, network_los, chunk_size, chunk_tag, trace_label, in_period_col=None, out_period_col=None, duration_col=None)#
- Parameters
- choosers
- tour_purpose
- logsum_settings
- model_settings
- network_los
- chunk_size
- trace_hh_id
- trace_label
- Returns
- logsums: pandas series
computed logsums with same index as choosers
Mode#
- activitysim.abm.models.util.mode.mode_choice_simulate(choosers, spec, nest_spec, skims, locals_d, chunk_size, mode_column_name, logsum_column_name, trace_label, trace_choice_name, trace_column_names=None, estimator=None)#
common method for both tour_mode_choice and trip_mode_choice
- Parameters
- choosers
- spec
- nest_spec
- skims
- locals_d
- chunk_size
- mode_column_name
- logsum_column_name
- trace_label
- trace_choice_name
- estimator
- Returns
- activitysim.abm.models.util.mode.run_tour_mode_choice_simulate(choosers, tour_purpose, model_settings, mode_column_name, logsum_column_name, network_los, skims, constants, estimator, chunk_size, trace_label=None, trace_choice_name=None)#
This is a utility to run a mode choice model for each segment (usually segments are tour/trip purposes). Pass in the tours/trip that need a mode, the Skim object, the spec to evaluate with, and any additional expressions you want to use in the evaluation of variables.
Overlap#
- activitysim.abm.models.util.overlap.p2p_time_window_overlap(p1_ids, p2_ids)#
- Parameters
- p1_ids
- p2_ids
- Returns
- activitysim.abm.models.util.overlap.rle(a)#
Compute run lengths of values in rows of a two dimensional ndarry of ints.
We assume the first and last columns are buffer columns (because this is the case for time windows) and so don’t include them in results.
Return arrays giving row_id, start_pos, run_length, and value of each run of any length.
- Parameters
- anumpy.ndarray of int shape(n, <num_time_periods_in_a_day>)
The input array would normally only have values of 0 or 1 to detect overlapping time period availability but we don’t assume this, and will detect and report runs of any values. (Might prove useful in future?…)
- Returns
- row_idnumpy.ndarray int shape(<num_runs>)
- start_posnumpy.ndarray int shape(<num_runs>)
- run_lengthnumpy.ndarray int shape(<num_runs>)
- run_valnumpy.ndarray int shape(<num_runs>)
Tour Destination#
- class activitysim.abm.models.util.tour_destination.SizeTermCalculator(size_term_selector)#
convenience object to provide size_terms for a selector (e.g. non_mandatory) for various segments (e.g. tour_type or purpose) returns size terms for specified segment in df or series form
- activitysim.abm.models.util.tour_destination.choose_MAZ_for_TAZ(taz_sample, MAZ_size_terms, trace_label)#
Convert taz_sample table with TAZ zone sample choices to a table with a MAZ zone chosen for each TAZ choose MAZ probabilistically (proportionally by size_term) from set of MAZ zones in parent TAZ
- Parameters
- taz_sample: dataframe with duplicated index <chooser_id_col> and columns: <DEST_TAZ>, prob, pick_count
- MAZ_size_terms: dataframe with duplicated index <chooser_id_col> and columns: zone_id, dest_TAZ, size_term
- Returns
- dataframe with with duplicated index <chooser_id_col> and columns: <DEST_MAZ>, prob, pick_count
- activitysim.abm.models.util.tour_destination.run_destination_logsums(tour_purpose, persons_merged, destination_sample, model_settings, network_los, chunk_size, trace_label)#
add logsum column to existing tour_destination_sample table
logsum is calculated by running the mode_choice model for each sample (person, dest_zone_id) pair in destination_sample, and computing the logsum of all the utilities
person_id
dest_zone_id
rand
pick_count
logsum (added)
23750
14
0.565502716034
4
1.85659498857
23750
16
0.711135838871
6
1.92315598631
…
23751
12
0.408038878552
1
2.40612135416
23751
14
0.972732479292
2
1.44009018355
- activitysim.abm.models.util.tour_destination.run_destination_simulate(spec_segment_name, tours, persons_merged, destination_sample, want_logsums, model_settings, network_los, destination_size_terms, estimator, chunk_size, trace_label, skip_choice=False)#
run destination_simulate on tour_destination_sample annotated with mode_choice logsum to select a destination from sample alternatives
Tour Frequency#
- activitysim.abm.models.util.tour_frequency.create_tours(tour_counts, tour_category, parent_col='person_id')#
This method processes the tour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the tours that were generated
- Parameters
- tour_counts: DataFrame
table specifying how many tours of each type to create one row per person (or parent_tour for atwork subtours) one (int) column per tour_type, with number of tours to create
- tour_categorystr
one of ‘mandatory’, ‘non_mandatory’, ‘atwork’, or ‘joint’
- Returns
- tourspandas.DataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a unique tour identifier, a person_id column, and a tour type column which comes from the column names of the alternatives DataFrame supplied above.
tours.tour_type - tour type (e.g. school, work, shopping, eat) tours.tour_type_num - if there are two ‘school’ type tours, they will be numbered 1 and 2 tours.tour_type_count - number of tours of tour_type parent has (parent’s max tour_type_num) tours.tour_num - index of tour (of any type) for parent tours.tour_count - number of tours of any type) for parent (parent’s max tour_num) tours.tour_category - one of ‘mandatory’, ‘non_mandatory’, ‘atwork’, or ‘joint’
- activitysim.abm.models.util.tour_frequency.process_atwork_subtours(work_tours, atwork_subtour_frequency_alts)#
This method processes the atwork_subtour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the subtours tours that were generated
- Parameters
- work_tours: DataFrame
A series which has parent work tour tour_id as the index and columns with person_id and atwork_subtour_frequency.
- atwork_subtour_frequency_alts: DataFrame
A DataFrame which has as a unique index with atwork_subtour_frequency values and frequency counts for the subtours to be generated for that choice
- Returns
- toursDataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a unique tour identifier, a person_id column, and a tour type column which comes from the column names of the alternatives DataFrame supplied above.
- activitysim.abm.models.util.tour_frequency.process_joint_tours(joint_tour_frequency, joint_tour_frequency_alts, point_persons)#
This method processes the joint_tour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the joint tours that were generated
- Parameters
- joint_tour_frequencypandas.Series
household joint_tour_frequency (which came out of the joint tour frequency model) indexed by household_id
- joint_tour_frequency_alts: DataFrame
A DataFrame which has as a unique index with joint_tour_frequency values and frequency counts for the tours to be generated for that choice
- point_personspandas DataFrame
table with columns for (at least) person_ids and home_zone_id indexed by household_id
- Returns
- toursDataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a tour identifier, a household_id column, a tour_type column and tour_type_num and tour_num columns which is set to 1 or 2 depending whether it is the first or second joint tour made by the household.
- activitysim.abm.models.util.tour_frequency.process_mandatory_tours(persons, mandatory_tour_frequency_alts)#
This method processes the mandatory_tour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the mandatory tours that were generated
- Parameters
- personsDataFrame
Persons is a DataFrame which has a column call mandatory_tour_frequency (which came out of the mandatory tour frequency model) and a column is_worker which indicates the person’s worker status. The only valid values of the mandatory_tour_frequency column to take are “work1”, “work2”, “school1”, “school2” and “work_and_school”
- Returns
- toursDataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a tour identifier, a person_id column, a tour_type column which is “work” or “school” and a tour_num column which is set to 1 or 2 depending whether it is the first or second mandatory tour made by the person. The logic for whether the work or school tour comes first given a “work_and_school” choice depends on the is_worker column: work tours first for workers, second for non-workers
- activitysim.abm.models.util.tour_frequency.process_non_mandatory_tours(persons, tour_counts)#
This method processes the non_mandatory_tour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the non mandatory tours that were generated
- Parameters
- persons: pandas.DataFrame
persons table containing a non_mandatory_tour_frequency column which has the index of the chosen alternative as the value
- non_mandatory_tour_frequency_alts: DataFrame
A DataFrame which has as a unique index which relates to the values in the series above typically includes columns which are named for trip purposes with values which are counts for that trip purpose. Example trip purposes include escort, shopping, othmaint, othdiscr, eatout, social, etc. A row would be an alternative which might be to take one shopping trip and zero trips of other purposes, etc.
- Returns
- toursDataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a unique tour identifier, a person_id column, and a tour type column which comes from the column names of the alternatives DataFrame supplied above.
- activitysim.abm.models.util.tour_frequency.process_tours(tour_frequency, tour_frequency_alts, tour_category, parent_col='person_id')#
This method processes the tour_frequency column that comes out of the model of the same name and turns into a DataFrame that represents the tours that were generated
- Parameters
- tour_frequency: Series
A series which has <parent_col> as the index and the chosen alternative index as the value
- tour_frequency_alts: DataFrame
A DataFrame which has as a unique index which relates to the values in the series above typically includes columns which are named for trip purposes with values which are counts for that trip purpose. Example trip purposes include escort, shopping, othmaint, othdiscr, eatout, social, etc. A row would be an alternative which might be to take one shopping trip and zero trips of other purposes, etc.
- tour_categorystr
one of ‘mandatory’, ‘non_mandatory’, ‘atwork’, or ‘joint’
- parent_col: str
the name of the index (parent_tour_id for atwork subtours, otherwise person_id)
- Returns
- tourspandas.DataFrame
An example of a tours DataFrame is supplied as a comment in the source code - it has an index which is a unique tour identifier, a person_id column, and a tour type column which comes from the column names of the alternatives DataFrame supplied above.
tours.tour_type - tour type (e.g. school, work, shopping, eat) tours.tour_type_num - if there are two ‘school’ type tours, they will be numbered 1 and 2 tours.tour_type_count - number of tours of tour_type parent has (parent’s max tour_type_num) tours.tour_num - index of tour (of any type) for parent tours.tour_count - number of tours of any type) for parent (parent’s max tour_num) tours.tour_category - one of ‘mandatory’, ‘non_mandatory’, ‘atwork’, or ‘joint’
Trip#
- activitysim.abm.models.util.trip.cleanup_failed_trips(trips)#
drop failed trips and cleanup fields in leg_mates:
trip_num assign new ordinal trip num after failed trips are dropped trip_count assign new count of trips in leg, sans failed trips first update first flag as we may have dropped first trip (last trip can’t fail) next_trip_id assign id of next trip in leg after failed trips are dropped
- activitysim.abm.models.util.trip.flag_failed_trip_leg_mates(trips_df, col_name)#
set boolean flag column of specified name to identify failed trip leg_mates in place
- activitysim.abm.models.util.trip.generate_alternative_sizes(max_duration, max_trips)#
Builds a lookup Numpy array pattern sizes based on the number of trips in the leg and the duration available to the leg. :param max_duration: :param max_trips: :return:
- activitysim.abm.models.util.trip.get_time_windows(residual, level)#
- Parameters
residual –
level –
- Returns
- activitysim.abm.models.util.trip.initialize_from_tours(tours, stop_frequency_alts, addtl_tour_cols_to_preserve=None)#
Instantiates a trips table based on tour-level attributes: stop frequency, tour origin, tour destination.
Vectorize Tour Scheduling#
- activitysim.abm.models.util.vectorize_tour_scheduling.compute_logsums(alt_tdd, tours_merged, tour_purpose, model_settings, skims, trace_label)#
Compute logsums for the tour alt_tdds, which will differ based on their different start, stop times of day, which translate to different odt_skim out_period and in_periods.
In mtctm1, tdds are hourly, but there are only 5 skim time periods, so some of the tdd_alts will be the same, once converted to skim time periods. With 5 skim time periods there are 15 unique out-out period pairs but 190 tdd alternatives.
For efficiency, rather compute a lot of redundant logsums, we compute logsums for the unique (out-period, in-period) pairs and then join them back to the alt_tdds.
- activitysim.abm.models.util.vectorize_tour_scheduling.get_previous_tour_by_tourid(current_tour_window_ids, previous_tour_by_window_id, alts)#
Matches current tours with attributes of previous tours for the same person. See the return value below for more information.
- Parameters
- current_tour_window_idsSeries
A Series of parent ids for the tours we’re about make the choice for - index should match the tours DataFrame.
- previous_tour_by_window_idSeries
A Series where the index is the parent (window) id and the value is the index of the alternatives of the scheduling.
- altsDataFrame
The alternatives of the scheduling.
- Returns
- prev_altsDataFrame
A DataFrame with an index matching the CURRENT tours we’re making a decision for, but with columns from the PREVIOUS tour of the person associated with each of the CURRENT tours. Columns listed in PREV_TOUR_COLUMNS from the alternatives will have “_previous” added as a suffix to keep differentiated from the current alternatives that will be part of the interaction.
- activitysim.abm.models.util.vectorize_tour_scheduling.run_alts_preprocessor(model_settings, alts, segment, locals_dict, trace_label)#
run preprocessor on alts, as specified by ALTS_PREPROCESSOR in model_settings
we are agnostic on whether alts are merged or not
- Parameters
- model_settings: dict
yaml model settings file as dict
- alts: pandas.DataFrame
tdd_alts or tdd_alts merged wiht choosers (we are agnostic)
- segment: string
segment selector as understood by caller (e.g. logsum_tour_purpose)
- locals_dict: dict
we let caller worry about what needs to be in it. though actually depends on modelers needs
- trace_label: string
- Returns
- alts: pandas.DataFrame
annotated copy of alts
- activitysim.abm.models.util.vectorize_tour_scheduling.schedule_tours(tours, persons_merged, alts, spec, logsum_tour_purpose, model_settings, timetable, timetable_window_id_col, previous_tour, tour_owner_id_col, estimator, chunk_size, tour_trace_label, tour_chunk_tag, sharrow_skip=False)#
chunking wrapper for _schedule_tours
While interaction_sample_simulate provides chunking support, the merged tours, persons dataframe and the tdd_interaction_dataset are very big, so we want to create them inside the chunking loop to minimize memory footprint. So we implement the chunking loop here, and pass a chunk_size of 0 to interaction_sample_simulate to disable its chunking support.
- activitysim.abm.models.util.vectorize_tour_scheduling.tdd_interaction_dataset(tours, alts, timetable, choice_column, window_id_col, trace_label)#
interaction_sample_simulate expects alts index same as choosers (e.g. tour_id) name of choice column in alts
- Parameters
- tourspandas.DataFrame
must have person_id column and index on tour_id
- altspandas.DataFrame
alts index must be timetable tdd id
- timetableTimeTable object
- choice_columnstr
name of column to store alt index in alt_tdd DataFrame (since alt_tdd is duplicate index on person_id but unique on person_id,alt_id)
- Returns
- alt_tddpandas DataFrame
columns: start, end , duration, <choice_column> index: tour_id
- activitysim.abm.models.util.vectorize_tour_scheduling.vectorize_joint_tour_scheduling(joint_tours, joint_tour_participants, persons_merged, alts, persons_timetable, spec, model_settings, estimator, chunk_size=0, trace_label=None, sharrow_skip=False)#
Like vectorize_tour_scheduling but specifically for joint tours
joint tours have a few peculiarities necessitating separate treatment:
Timetable has to be initialized to set all timeperiods…
- Parameters
- toursDataFrame
DataFrame of tours containing tour attributes, as well as a person_id column to define the nth tour for each person.
- persons_mergedDataFrame
DataFrame of persons containing attributes referenced by expressions in spec
- altsDataFrame
DataFrame of alternatives which represent time slots. Will be passed to interaction_simulate in batches for each nth tour.
- specDataFrame
The spec which will be passed to interaction_simulate. (or dict of specs keyed on tour_type if tour_types is not None)
- model_settingsdict
- Returns
- choicesSeries
A Series of choices where the index is the index of the tours DataFrame and the values are the index of the alts DataFrame.
- persons_timetableTimeTable
timetable updated with joint tours (caller should replace_table for it to persist)
- activitysim.abm.models.util.vectorize_tour_scheduling.vectorize_subtour_scheduling(parent_tours, subtours, persons_merged, alts, spec, model_settings, estimator, chunk_size=0, trace_label=None, sharrow_skip=False)#
Like vectorize_tour_scheduling but specifically for atwork subtours
subtours have a few peculiarities necessitating separate treatment:
Timetable has to be initialized to set all timeperiods outside parent tour footprint as unavailable. So atwork subtour timewindows are limited to the footprint of the parent work tour. And parent_tour_id’ column of tours is used instead of parent_id as timetable row_id.
- Parameters
- parent_toursDataFrame
parent tours of the subtours (because we need to know the tdd of the parent tour to assign_subtour_mask of timetable indexed by parent_tour id
- subtoursDataFrame
atwork subtours to schedule
- persons_mergedDataFrame
DataFrame of persons containing attributes referenced by expressions in spec
- altsDataFrame
DataFrame of alternatives which represent time slots. Will be passed to interaction_simulate in batches for each nth tour.
- specDataFrame
The spec which will be passed to interaction_simulate. (all subtours share same spec regardless of subtour type)
- model_settingsdict
- chunk_size
- trace_label
- Returns
- choicesSeries
A Series of choices where the index is the index of the subtours DataFrame and the values are the index of the alts DataFrame.
- activitysim.abm.models.util.vectorize_tour_scheduling.vectorize_tour_scheduling(tours, persons_merged, alts, timetable, tour_segments, tour_segment_col, model_settings, chunk_size=0, trace_label=None)#
The purpose of this method is fairly straightforward - it takes tours and schedules them into time slots. Alternatives should be specified so as to define those time slots (usually with start and end times).
schedule_tours adds variables that can be used in the spec which have to do with the previous tours per person. Every column in the alternatives table is appended with the suffix “_previous” and made available. So if your alternatives table has columns for start and end, then start_previous and end_previous will be set to the start and end of the most recent tour for a person. The first time through, start_previous and end_previous are undefined, so make sure to protect with a tour_num >= 2 in the variable computation.
FIXME - fix docstring: tour_segments, tour_segment_col
- Parameters
- toursDataFrame
DataFrame of tours containing tour attributes, as well as a person_id column to define the nth tour for each person.
- persons_mergedDataFrame
DataFrame of persons containing attributes referenced by expressions in spec
- altsDataFrame
DataFrame of alternatives which represent time slots. Will be passed to interaction_simulate in batches for each nth tour.
- specDataFrame
The spec which will be passed to interaction_simulate. (or dict of specs keyed on tour_type if tour_types is not None)
- model_settingsdict
- Returns
- choicesSeries
A Series of choices where the index is the index of the tours DataFrame and the values are the index of the alts DataFrame.
- timetableTimeTable
persons timetable updated with tours (caller should replace_table for it to persist)
Tests#
See activitysim.abm.test
and activitysim.abm.models.util.test