Settings#
- settings activitysim.core.configuration.Settings#
The overall settings for the ActivitySim model system.
The input for these settings is typically stored in one main YAML file, usually called
settings.yaml
.Note that this implementation is presently used only for generating documentation, but future work may migrate the settings implementation to actually use this pydantic code to validate the settings before running the model.
- Config:
extra: str = allow
validate_assignment: bool = True
- Fields:
benchmarking (bool)
check_for_variability (bool)
checkpoint_format (Literal['hdf', 'parquet'])
checkpoints (bool | list)
chunk_method (Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'])
chunk_size (int)
chunk_training_mode (Literal['disabled', 'training', 'production', 'adaptive', 'explicit'])
cleanup_pipeline_after_run (bool)
cleanup_trace_files_on_resume (bool)
create_input_store (bool)
default_initial_rows_per_chunk (int)
disable_destination_sampling (bool)
disable_zarr (bool)
downcast_float (bool)
downcast_int (bool)
duplicate_step_execution (Literal['error', 'allow'])
fail_fast (bool)
hh_ids (pathlib.Path)
households_sample_size (int)
inherit_settings (bool | pathlib.Path)
input_store (str)
input_table_list (list[activitysim.core.configuration.top.InputTable])
instrument (bool)
keep_chunk_logs (bool)
keep_mem_logs (bool)
log_alt_losers (bool)
log_settings (tuple[str])
memory_profile (bool)
min_available_chunk_ratio (float)
models (list[str])
multiprocess (bool)
multiprocess_steps (list[activitysim.core.configuration.top.MultiprocessStep])
num_processes (int)
offset_preprocessing (bool)
omx_ignore_patterns (list[str])
other_settings (dict[str, Any])
output_tables (activitysim.core.configuration.top.OutputTables)
pipeline_complib (str)
recode_pipeline_columns (bool)
resume_after (str | None)
rng_base_seed (int | None)
rotate_logs (bool)
sharrow (bool | str)
source_file_paths (list[pathlib.Path])
store_skims_in_shm (bool)
testing_fail_trip_destination (bool)
trace_hh_id (int | None)
trace_od (tuple[int, int] | None)
treat_warnings_as_errors (bool)
use_shadow_pricing (bool)
want_dest_choice_presampling (bool)
want_dest_choice_sample_tables (bool)
write_raw_tables (bool)
- Validators:
_check_store_skims_in_shm
»all fields
- field benchmarking: bool = False#
Flag this model run as a benchmarking run.
Added in version 1.1.
This is generally a developer-only feature and not needed for regular usage of ActivitySim.
By flagging a model run as a benchmark, certain operations of the model are altered, to ensure valid benchmark readings. For example, in regular operation, data such as skims are loaded on-demand within the first model component that needs them. With benchmarking enabled, all data are always pre-loaded before any component is run, to ensure that recorded times are the runtime of the component itself, and not data I/O operations that are neither integral to that component nor necessarily stable over replication.
- Validated by:
_check_store_skims_in_shm
- field check_for_variability: bool = False#
Debugging feature to find broken model specifications.
Enabling this check does not alter valid results but slows down model runs.
- Validated by:
_check_store_skims_in_shm
- field checkpoint_format: Literal['hdf', 'parquet'] = 'parquet'#
Storage format to use when saving checkpoint files.
- Validated by:
_check_store_skims_in_shm
- field checkpoints: bool | list = True#
When to write checkpoint (intermediate table states) to disk.
If True, checkpoints are written at each step. If False, no intermediate checkpoints will be written before the end of run. Or, provide an explicit list of models to checkpoint.
- Validated by:
_check_store_skims_in_shm
- field chunk_method: Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'] = 'hybrid_uss'#
Memory use measure to use for chunking.
The following methods are supported to calculate memory overhead when chunking is enabled:
- “bytes”
expected rowsize based on actual size (as reported by numpy and pandas) of explicitly allocated data this can underestimate overhead due to transient data requirements of operations (e.g. merge, sort, transpose).
- “uss”
expected rowsize based on change in (unique set size) (uss) both as a result of explicit data allocation, and readings by MemMonitor sniffer thread that measures transient uss during time-consuming numpy and pandas operations.
- “hybrid_uss”
hybrid_uss avoids problems with pure uss, especially with small chunk sizes (e.g. initial training chunks) as numpy may recycle cached blocks and show no increase in uss even though data was allocated and logged.
- “rss”
like uss, but for resident set size (rss), which is the portion of memory occupied by a process that is held in RAM.
- “hybrid_rss”
like hybrid_uss, but for rss
RSS is reported by
psutil.Process.memory_info()
and USS is reported bypsutil.Process.memory_full_info()
. USS is the memory which is private to a process and which would be freed if the process were terminated. This is the metric that most closely matches the rather vague notion of memory “in use” (the meaning of which is difficult to pin down in operating systems with virtual memory where memory can (but sometimes can’t) be swapped or mapped to disk. Previous testing found hybrid_uss performs best and is most reliable and is therefore the default.For more, see Chunk.
- Validated by:
_check_store_skims_in_shm
- field chunk_size: int = 0#
Approximate amount of RAM to allocate to ActivitySim for batch processing.
See Chunk for more details.
- Validated by:
_check_store_skims_in_shm
- field chunk_training_mode: Literal['disabled', 'training', 'production', 'adaptive', 'explicit'] = 'disabled'#
The method to use for chunk training.
- “disabled”
All chunking is disabled. If you have enough RAM, this is the fastest mode, but it requires potentially a lot of RAM.
- “training”
The model is run in training mode, which tracks the amount of memory used by each table by submodel and writes the results to a cache file that is then re-used for production runs. This mode is significantly slower than production mode since it does significantly more memory inspection.
- “production”
The model is run in production mode, using the cache file created in training mode. If no such file is found, the model falls back to training mode. This mode is significantly faster than training mode, as it uses the cached memory inspection results to determine chunk sizes.
- “adaptive”
Like production mode, any existing cache file is used to determine the starting chunk settings, but the model also updates the cache settings based on additional memory inspection. This may additionally improve the cache settings to reduce runtimes when run in production mode, but at the cost of some slowdown during the run to accommodate extra memory inspection.
- “explicit”
The model is run without memory inspection, and the chunk cache file is not used, even if it exists. Instead, the chunk size settings are explicitly set in the settings file of each compatible model step. Only those steps that have an “explicit_chunk” setting are chunkable with this mode, all other steps are run without chunking.
See Chunk for more details.
- Validated by:
_check_store_skims_in_shm
- field cleanup_pipeline_after_run: bool = False#
Cleans up pipeline after successful run.
This will clean up pipeline only after successful runs, by creating a single-checkpoint pipeline file, and deleting any subprocess pipelines.
- Validated by:
_check_store_skims_in_shm
- field cleanup_trace_files_on_resume: bool = False#
Clean all trace files when restarting a model from a checkpoint.
- Validated by:
_check_store_skims_in_shm
- field create_input_store: bool = False#
Write the inputs as read in back to an HDF5 store.
If enabled, this writes the store to the outputs folder to use for subsequent model runs, as reading HDF5 can be faster than reading CSV files.
- Validated by:
_check_store_skims_in_shm
- field default_initial_rows_per_chunk: int = 100#
Default number of rows to use in initial chunking.
- Validated by:
_check_store_skims_in_shm
- field disable_zarr: bool = False#
Disable the use of zarr format skims.
Added in version 1.2.
By default, if sharrow is enabled (any setting other than false), ActivitySim currently loads data from zarr format skims if a zarr location is provided, and data is found there. If no data is found there, then original OMX skim data is loaded, any transformations or encodings are applied, and then this data is written out to a zarr file at that location. Setting this option to True will disable the use of zarr.
- Validated by:
_check_store_skims_in_shm
- field downcast_float: bool = False#
automatically downcasting float variables.
Use of this setting should be tested by the region to confirm result consistency.
Added in version 1.3.
- Validated by:
_check_store_skims_in_shm
- field downcast_int: bool = False#
automatically downcasting integer variables.
Use of this setting should be tested by the region to confirm result consistency.
Added in version 1.3.
- Validated by:
_check_store_skims_in_shm
- field duplicate_step_execution: Literal['error', 'allow'] = 'error'#
How activitysim should handle attempts to re-run a step with the same name.
Added in version 1.3.
- “error”
Attempts to re-run a step that has already been run and checkpointed will raise a RuntimeError, halting model execution. This is the default if no value is given.
- “allow”
Attempts to re-run a step are allowed, potentially overwriting the results from the previous time that step was run.
- Validated by:
_check_store_skims_in_shm
- field hh_ids: Path = None#
Load only the household ids given in this file.
The file need only contain the desired households ids, nothing else. If given as a relative path (or just a file name), both the data and config directories are searched, in that order, for the matching file.
- Validated by:
_check_store_skims_in_shm
- field households_sample_size: int = None#
Number of households to sample and simulate
If omitted or set to 0, ActivitySim will simulate all households.
- Validated by:
_check_store_skims_in_shm
- field inherit_settings: bool | Path = None#
Instruction on if and how to find other files that can provide settings.
When this value is True, all config directories are searched in order for additional files with the same filename. If other files are found they are also loaded, but only settings values that are not already explicitly set are applied. Alternatively, set this to a different file name, in which case settings from that other file are loaded (again, backfilling unset values only). Once the settings files are loaded, this value does not have any other effect on the operation of the model(s).
- Validated by:
_check_store_skims_in_shm
- field input_table_list: list[InputTable] = None#
list of table names, indices, and column re-maps for each table in input_store
- Validated by:
_check_store_skims_in_shm
- field instrument: bool = False#
Use pyinstrument to profile component performance.
Added in version 1.2.
This is generally a developer-only feature and not needed for regular usage of ActivitySim.
Use of this setting to enable statistical profiling of ActivitySim code, using the pyinstrument library (an optional dependency which must also be installed). A separate profiling session is triggered for each model component. See the pyinstrument documentation for a description of how this tool works.
When activated, a “profiling–*” directory is created in the output directory of the model, tagged with the date and time of the profiling run. Profile output is always tagged like this and never overwrites previous profiling outputs, facilitating serial comparisons of runtimes in response to code or configuration changes.
- Validated by:
_check_store_skims_in_shm
- field keep_chunk_logs: bool = True#
Whether to keep chunk logs when deleting other files.
- Validated by:
_check_store_skims_in_shm
- field log_alt_losers: bool = False#
Write out expressions when all alternatives are unavailable.
This can be useful for model development to catch errors in specifications. Enabling this check does not alter valid results but slows down model runs.
- Validated by:
_check_store_skims_in_shm
- field log_settings: tuple[str] = ('households_sample_size', 'chunk_size', 'chunk_method', 'chunk_training_mode', 'multiprocess', 'num_processes', 'resume_after', 'trace_hh_id', 'memory_profile', 'instrument', 'sharrow')#
Setting to log on startup.
- Validated by:
_check_store_skims_in_shm
- field memory_profile: bool = False#
Generate a memory profile by sampling memory usage from a secondary process.
Added in version 1.2.
This is generally a developer-only feature and not needed for regular usage of ActivitySim.
Using this feature will open a secondary process, whose only job is to poll memory usage for the main ActivitySim process. The usage is logged to a file with time stamps, so it can be cross-referenced against ActivitySim logs to identify what parts of the code are using RAM. The profiling is done from a separate process to avoid the profiler itself from significantly slowing the main model core, or (more importantly) generating memory usage on its own that pollutes the collected data.
- Validated by:
_check_store_skims_in_shm
- field min_available_chunk_ratio: float = 0.05#
minimum fraction of total chunk_size to reserve for adaptive chunking
- Validated by:
_check_store_skims_in_shm
- field models: list[str] = None#
list of model steps to run - auto ownership, tour frequency, etc.
See model_steps for more details about each step.
- Validated by:
_check_store_skims_in_shm
- field multiprocess: bool = False#
Enable multiprocessing for this model.
- Validated by:
_check_store_skims_in_shm
- field multiprocess_steps: list[MultiprocessStep] = None#
A list of multiprocess steps.
- Validated by:
_check_store_skims_in_shm
- field num_processes: int = None#
If running in multiprocessing mode, use this number of processes by default.
If not given or set to 0, the number of processes to use is set to half the number of available CPU cores, plus 1.
- Validated by:
_check_store_skims_in_shm
- field offset_preprocessing: bool = False#
Flag to indicate whether offset preprocessing has already been done.
Added in version 1.2.
This flag is generally set automatically within ActivitySim during a run, and not be a user ahead of time. The ability to do so is provided as a developer-only feature for testing and development.
- Validated by:
_check_store_skims_in_shm
- field omx_ignore_patterns: list[str] = []#
List of regex patterns to ignore when reading OMX files.
This is useful if you have tables in your OMX file that you don’t want to read in. For example, if you have both time-of-day values and time-independent values (e.g., “BIKE_TIME” and “BIKE_TIME__AM”), you can ignore the time-of-day values by setting this to [”BIKE_TIME__.+”].
Added in version 1.3.
- Validated by:
_check_store_skims_in_shm
- field output_tables: OutputTables = None#
list of output tables to write to CSV or HDF5
- Validated by:
_check_store_skims_in_shm
- field pipeline_complib: str = 'NOTSET'#
Compression library to use when storing pipeline tables in an HDF5 file.
Added in version 1.3.
- Validated by:
_check_store_skims_in_shm
- field recode_pipeline_columns: bool = False#
Apply recoding instructions on input and final output for pipeline tables.
Added in version 1.2.
Recoding instructions can be provided in individual
InputTable.recode_columns
andOutputTable.decode_columns
settings. This global setting permits disabling all recoding processes simultaneously.Warning
Disabling recoding is fine in legacy mode but it is generally not compatible with using
Settings.sharrow
.- Validated by:
_check_store_skims_in_shm
- field resume_after: str | None = None#
to resume running the data pipeline after the last successful checkpoint
- Validated by:
_check_store_skims_in_shm
- field rng_base_seed: int | None = 0#
Base seed for pseudo-random number generator.
- Validated by:
_check_store_skims_in_shm
- field sharrow: bool | str = False#
Set the sharrow operating mode.
Added in version 1.2.
false - Do not use sharrow. This is the default if no value is given.
true - Use sharrow optimizations when possible, but fall back to legacy pandas.eval systems when any error is encountered. This is the preferred mode for running with sharrow if reliability is more important than performance.
require - Use sharrow optimizations, and raise an error if they fail unexpectedly. This is the preferred mode for running with sharrow if performance is a concern.
test - Run every relevant calculation using both sharrow and legacy systems, and compare them to ensure the results match. This is the slowest mode of operation, but useful for development and debugging.
- Validated by:
_check_store_skims_in_shm
- field source_file_paths: list[Path] = None#
A list of source files from which these settings were loaded.
This value should not be set by the user within the YAML settings files, instead it is populated as those files are loaded. It is primarily provided for debugging purposes, and does not actually affect the operation of the model.
- Validated by:
_check_store_skims_in_shm
- field store_skims_in_shm: bool = True#
Store skim dataset in shared memory.
Added in version 1.3.
By default, if sharrow is enabled (any setting other than false), ActivitySim stores the skim dataset in shared memory. This can be changed by setting this option to False, in which case skims are stores in “typical” process-local memory. Note that storing skims in shared memory is pretty much required for multiprocessing, unless you have a very small model or an absurdly large amount of RAM.
- Validated by:
_check_store_skims_in_shm
- field trace_hh_id: int | None = None#
Trace this household id
If omitted, no tracing is written out
- Validated by:
_check_store_skims_in_shm
- field trace_od: tuple[int, int] | None = None#
Trace origin, destination pair in accessibility calculation
If omitted, no tracing is written out.
- Validated by:
_check_store_skims_in_shm
- field treat_warnings_as_errors: bool = False#
Treat most warnings as errors.
Use of this setting is not recommended outside of rigorous testing regimes.
Added in version 1.3.
- Validated by:
_check_store_skims_in_shm
- field use_shadow_pricing: bool = False#
turn shadow_pricing on and off for work and school location
- Validated by:
_check_store_skims_in_shm
- field want_dest_choice_sample_tables: bool = False#
turn writing of sample_tables on and off for all models
- Validated by:
_check_store_skims_in_shm
- field write_raw_tables: bool = False#
Dump input tables back to disk immediately after loading them.
This is generally a developer-only feature and not needed for regular usage of ActivitySim.
The data tables are written out to <output_dir>/raw_tables before any annotation steps, but after initial processing (renaming, filtering columns, recoding).
- Validated by:
_check_store_skims_in_shm