Settings#

settings activitysim.core.configuration.Settings#

The overall settings for the ActivitySim model system.

The input for these settings is typically stored in one main YAML file, usually called settings.yaml.

Note that this implementation is presently used only for generating documentation, but future work may migrate the settings implementation to actually use this pydantic code to validate the settings before running the model.

Config:
  • extra: str = allow

  • validate_assignment: bool = True

Fields:
  • benchmarking (bool)

  • check_for_variability (bool)

  • checkpoint_format (Literal['hdf', 'parquet'])

  • checkpoints (bool | list)

  • chunk_method (Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'])

  • chunk_size (int)

  • chunk_training_mode (Literal['disabled', 'training', 'production', 'adaptive', 'explicit'])

  • cleanup_pipeline_after_run (bool)

  • cleanup_trace_files_on_resume (bool)

  • create_input_store (bool)

  • default_initial_rows_per_chunk (int)

  • disable_destination_sampling (bool)

  • disable_zarr (bool)

  • downcast_float (bool)

  • downcast_int (bool)

  • duplicate_step_execution (Literal['error', 'allow'])

  • fail_fast (bool)

  • hh_ids (pathlib.Path)

  • households_sample_size (int)

  • inherit_settings (bool | pathlib.Path)

  • input_store (str)

  • input_table_list (list[activitysim.core.configuration.top.InputTable])

  • instrument (bool)

  • keep_chunk_logs (bool)

  • keep_mem_logs (bool)

  • log_alt_losers (bool)

  • log_settings (tuple[str])

  • memory_profile (bool)

  • min_available_chunk_ratio (float)

  • models (list[str])

  • multiprocess (bool)

  • multiprocess_steps (list[activitysim.core.configuration.top.MultiprocessStep])

  • num_processes (int)

  • offset_preprocessing (bool)

  • omx_ignore_patterns (list[str])

  • other_settings (dict[str, Any])

  • output_tables (activitysim.core.configuration.top.OutputTables)

  • pipeline_complib (str)

  • recode_pipeline_columns (bool)

  • resume_after (str | None)

  • rng_base_seed (int | None)

  • rotate_logs (bool)

  • sharrow (bool | str)

  • source_file_paths (list[pathlib.Path])

  • store_skims_in_shm (bool)

  • testing_fail_trip_destination (bool)

  • trace_hh_id (int | None)

  • trace_od (tuple[int, int] | None)

  • treat_warnings_as_errors (bool)

  • use_shadow_pricing (bool)

  • want_dest_choice_presampling (bool)

  • want_dest_choice_sample_tables (bool)

  • write_raw_tables (bool)

Validators:
  • _check_store_skims_in_shm » all fields

field benchmarking: bool = False#

Flag this model run as a benchmarking run.

Added in version 1.1.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

By flagging a model run as a benchmark, certain operations of the model are altered, to ensure valid benchmark readings. For example, in regular operation, data such as skims are loaded on-demand within the first model component that needs them. With benchmarking enabled, all data are always pre-loaded before any component is run, to ensure that recorded times are the runtime of the component itself, and not data I/O operations that are neither integral to that component nor necessarily stable over replication.

Validated by:
  • _check_store_skims_in_shm

field check_for_variability: bool = False#

Debugging feature to find broken model specifications.

Enabling this check does not alter valid results but slows down model runs.

Validated by:
  • _check_store_skims_in_shm

field checkpoint_format: Literal['hdf', 'parquet'] = 'parquet'#

Storage format to use when saving checkpoint files.

Validated by:
  • _check_store_skims_in_shm

field checkpoints: bool | list = True#

When to write checkpoint (intermediate table states) to disk.

If True, checkpoints are written at each step. If False, no intermediate checkpoints will be written before the end of run. Or, provide an explicit list of models to checkpoint.

Validated by:
  • _check_store_skims_in_shm

field chunk_method: Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'] = 'hybrid_uss'#

Memory use measure to use for chunking.

The following methods are supported to calculate memory overhead when chunking is enabled:

  • “bytes”

    expected rowsize based on actual size (as reported by numpy and pandas) of explicitly allocated data this can underestimate overhead due to transient data requirements of operations (e.g. merge, sort, transpose).

  • “uss”

    expected rowsize based on change in (unique set size) (uss) both as a result of explicit data allocation, and readings by MemMonitor sniffer thread that measures transient uss during time-consuming numpy and pandas operations.

  • “hybrid_uss”

    hybrid_uss avoids problems with pure uss, especially with small chunk sizes (e.g. initial training chunks) as numpy may recycle cached blocks and show no increase in uss even though data was allocated and logged.

  • “rss”

    like uss, but for resident set size (rss), which is the portion of memory occupied by a process that is held in RAM.

  • “hybrid_rss”

    like hybrid_uss, but for rss

RSS is reported by psutil.Process.memory_info() and USS is reported by psutil.Process.memory_full_info(). USS is the memory which is private to a process and which would be freed if the process were terminated. This is the metric that most closely matches the rather vague notion of memory “in use” (the meaning of which is difficult to pin down in operating systems with virtual memory where memory can (but sometimes can’t) be swapped or mapped to disk. Previous testing found hybrid_uss performs best and is most reliable and is therefore the default.

For more, see Chunk.

Validated by:
  • _check_store_skims_in_shm

field chunk_size: int = 0#

Approximate amount of RAM to allocate to ActivitySim for batch processing.

See Chunk for more details.

Validated by:
  • _check_store_skims_in_shm

field chunk_training_mode: Literal['disabled', 'training', 'production', 'adaptive', 'explicit'] = 'disabled'#

The method to use for chunk training.

  • “disabled”

    All chunking is disabled. If you have enough RAM, this is the fastest mode, but it requires potentially a lot of RAM.

  • “training”

    The model is run in training mode, which tracks the amount of memory used by each table by submodel and writes the results to a cache file that is then re-used for production runs. This mode is significantly slower than production mode since it does significantly more memory inspection.

  • “production”

    The model is run in production mode, using the cache file created in training mode. If no such file is found, the model falls back to training mode. This mode is significantly faster than training mode, as it uses the cached memory inspection results to determine chunk sizes.

  • “adaptive”

    Like production mode, any existing cache file is used to determine the starting chunk settings, but the model also updates the cache settings based on additional memory inspection. This may additionally improve the cache settings to reduce runtimes when run in production mode, but at the cost of some slowdown during the run to accommodate extra memory inspection.

  • “explicit”

    The model is run without memory inspection, and the chunk cache file is not used, even if it exists. Instead, the chunk size settings are explicitly set in the settings file of each compatible model step. Only those steps that have an “explicit_chunk” setting are chunkable with this mode, all other steps are run without chunking.

See Chunk for more details.

Validated by:
  • _check_store_skims_in_shm

field cleanup_pipeline_after_run: bool = False#

Cleans up pipeline after successful run.

This will clean up pipeline only after successful runs, by creating a single-checkpoint pipeline file, and deleting any subprocess pipelines.

Validated by:
  • _check_store_skims_in_shm

field cleanup_trace_files_on_resume: bool = False#

Clean all trace files when restarting a model from a checkpoint.

Validated by:
  • _check_store_skims_in_shm

field create_input_store: bool = False#

Write the inputs as read in back to an HDF5 store.

If enabled, this writes the store to the outputs folder to use for subsequent model runs, as reading HDF5 can be faster than reading CSV files.

Validated by:
  • _check_store_skims_in_shm

field default_initial_rows_per_chunk: int = 100#

Default number of rows to use in initial chunking.

Validated by:
  • _check_store_skims_in_shm

field disable_destination_sampling: bool = False#
Validated by:
  • _check_store_skims_in_shm

field disable_zarr: bool = False#

Disable the use of zarr format skims.

Added in version 1.2.

By default, if sharrow is enabled (any setting other than false), ActivitySim currently loads data from zarr format skims if a zarr location is provided, and data is found there. If no data is found there, then original OMX skim data is loaded, any transformations or encodings are applied, and then this data is written out to a zarr file at that location. Setting this option to True will disable the use of zarr.

Validated by:
  • _check_store_skims_in_shm

field downcast_float: bool = False#

automatically downcasting float variables.

Use of this setting should be tested by the region to confirm result consistency.

Added in version 1.3.

Validated by:
  • _check_store_skims_in_shm

field downcast_int: bool = False#

automatically downcasting integer variables.

Use of this setting should be tested by the region to confirm result consistency.

Added in version 1.3.

Validated by:
  • _check_store_skims_in_shm

field duplicate_step_execution: Literal['error', 'allow'] = 'error'#

How activitysim should handle attempts to re-run a step with the same name.

Added in version 1.3.

  • “error”

    Attempts to re-run a step that has already been run and checkpointed will raise a RuntimeError, halting model execution. This is the default if no value is given.

  • “allow”

    Attempts to re-run a step are allowed, potentially overwriting the results from the previous time that step was run.

Validated by:
  • _check_store_skims_in_shm

field fail_fast: bool = False#
Validated by:
  • _check_store_skims_in_shm

field hh_ids: Path = None#

Load only the household ids given in this file.

The file need only contain the desired households ids, nothing else. If given as a relative path (or just a file name), both the data and config directories are searched, in that order, for the matching file.

Validated by:
  • _check_store_skims_in_shm

field households_sample_size: int = None#

Number of households to sample and simulate

If omitted or set to 0, ActivitySim will simulate all households.

Validated by:
  • _check_store_skims_in_shm

field inherit_settings: bool | Path = None#

Instruction on if and how to find other files that can provide settings.

When this value is True, all config directories are searched in order for additional files with the same filename. If other files are found they are also loaded, but only settings values that are not already explicitly set are applied. Alternatively, set this to a different file name, in which case settings from that other file are loaded (again, backfilling unset values only). Once the settings files are loaded, this value does not have any other effect on the operation of the model(s).

Validated by:
  • _check_store_skims_in_shm

field input_store: str = None#

HDF5 inputs file

Validated by:
  • _check_store_skims_in_shm

field input_table_list: list[InputTable] = None#

list of table names, indices, and column re-maps for each table in input_store

Validated by:
  • _check_store_skims_in_shm

field instrument: bool = False#

Use pyinstrument to profile component performance.

Added in version 1.2.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

Use of this setting to enable statistical profiling of ActivitySim code, using the pyinstrument library (an optional dependency which must also be installed). A separate profiling session is triggered for each model component. See the pyinstrument documentation for a description of how this tool works.

When activated, a “profiling–*” directory is created in the output directory of the model, tagged with the date and time of the profiling run. Profile output is always tagged like this and never overwrites previous profiling outputs, facilitating serial comparisons of runtimes in response to code or configuration changes.

Validated by:
  • _check_store_skims_in_shm

field keep_chunk_logs: bool = True#

Whether to keep chunk logs when deleting other files.

Validated by:
  • _check_store_skims_in_shm

field keep_mem_logs: bool = False#
Validated by:
  • _check_store_skims_in_shm

field log_alt_losers: bool = False#

Write out expressions when all alternatives are unavailable.

This can be useful for model development to catch errors in specifications. Enabling this check does not alter valid results but slows down model runs.

Validated by:
  • _check_store_skims_in_shm

field log_settings: tuple[str] = ('households_sample_size', 'chunk_size', 'chunk_method', 'chunk_training_mode', 'multiprocess', 'num_processes', 'resume_after', 'trace_hh_id', 'memory_profile', 'instrument', 'sharrow')#

Setting to log on startup.

Validated by:
  • _check_store_skims_in_shm

field memory_profile: bool = False#

Generate a memory profile by sampling memory usage from a secondary process.

Added in version 1.2.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

Using this feature will open a secondary process, whose only job is to poll memory usage for the main ActivitySim process. The usage is logged to a file with time stamps, so it can be cross-referenced against ActivitySim logs to identify what parts of the code are using RAM. The profiling is done from a separate process to avoid the profiler itself from significantly slowing the main model core, or (more importantly) generating memory usage on its own that pollutes the collected data.

Validated by:
  • _check_store_skims_in_shm

field min_available_chunk_ratio: float = 0.05#

minimum fraction of total chunk_size to reserve for adaptive chunking

Validated by:
  • _check_store_skims_in_shm

field models: list[str] = None#

list of model steps to run - auto ownership, tour frequency, etc.

See model_steps for more details about each step.

Validated by:
  • _check_store_skims_in_shm

field multiprocess: bool = False#

Enable multiprocessing for this model.

Validated by:
  • _check_store_skims_in_shm

field multiprocess_steps: list[MultiprocessStep] = None#

A list of multiprocess steps.

Validated by:
  • _check_store_skims_in_shm

field num_processes: int = None#

If running in multiprocessing mode, use this number of processes by default.

If not given or set to 0, the number of processes to use is set to half the number of available CPU cores, plus 1.

Validated by:
  • _check_store_skims_in_shm

field offset_preprocessing: bool = False#

Flag to indicate whether offset preprocessing has already been done.

Added in version 1.2.

This flag is generally set automatically within ActivitySim during a run, and not be a user ahead of time. The ability to do so is provided as a developer-only feature for testing and development.

Validated by:
  • _check_store_skims_in_shm

field omx_ignore_patterns: list[str] = []#

List of regex patterns to ignore when reading OMX files.

This is useful if you have tables in your OMX file that you don’t want to read in. For example, if you have both time-of-day values and time-independent values (e.g., “BIKE_TIME” and “BIKE_TIME__AM”), you can ignore the time-of-day values by setting this to [”BIKE_TIME__.+”].

Added in version 1.3.

Validated by:
  • _check_store_skims_in_shm

field other_settings: dict[str, Any] = None#
Validated by:
  • _check_store_skims_in_shm

field output_tables: OutputTables = None#

list of output tables to write to CSV or HDF5

Validated by:
  • _check_store_skims_in_shm

field pipeline_complib: str = 'NOTSET'#

Compression library to use when storing pipeline tables in an HDF5 file.

Added in version 1.3.

Validated by:
  • _check_store_skims_in_shm

field recode_pipeline_columns: bool = False#

Apply recoding instructions on input and final output for pipeline tables.

Added in version 1.2.

Recoding instructions can be provided in individual InputTable.recode_columns and OutputTable.decode_columns settings. This global setting permits disabling all recoding processes simultaneously.

Warning

Disabling recoding is fine in legacy mode but it is generally not compatible with using Settings.sharrow.

Validated by:
  • _check_store_skims_in_shm

field resume_after: str | None = None#

to resume running the data pipeline after the last successful checkpoint

Validated by:
  • _check_store_skims_in_shm

field rng_base_seed: int | None = 0#

Base seed for pseudo-random number generator.

Validated by:
  • _check_store_skims_in_shm

field rotate_logs: bool = False#
Validated by:
  • _check_store_skims_in_shm

field sharrow: bool | str = False#

Set the sharrow operating mode.

Added in version 1.2.

  • false - Do not use sharrow. This is the default if no value is given.

  • true - Use sharrow optimizations when possible, but fall back to legacy pandas.eval systems when any error is encountered. This is the preferred mode for running with sharrow if reliability is more important than performance.

  • require - Use sharrow optimizations, and raise an error if they fail unexpectedly. This is the preferred mode for running with sharrow if performance is a concern.

  • test - Run every relevant calculation using both sharrow and legacy systems, and compare them to ensure the results match. This is the slowest mode of operation, but useful for development and debugging.

Validated by:
  • _check_store_skims_in_shm

field source_file_paths: list[Path] = None#

A list of source files from which these settings were loaded.

This value should not be set by the user within the YAML settings files, instead it is populated as those files are loaded. It is primarily provided for debugging purposes, and does not actually affect the operation of the model.

Validated by:
  • _check_store_skims_in_shm

field store_skims_in_shm: bool = True#

Store skim dataset in shared memory.

Added in version 1.3.

By default, if sharrow is enabled (any setting other than false), ActivitySim stores the skim dataset in shared memory. This can be changed by setting this option to False, in which case skims are stores in “typical” process-local memory. Note that storing skims in shared memory is pretty much required for multiprocessing, unless you have a very small model or an absurdly large amount of RAM.

Validated by:
  • _check_store_skims_in_shm

field testing_fail_trip_destination: bool = False#
Validated by:
  • _check_store_skims_in_shm

field trace_hh_id: int | None = None#

Trace this household id

If omitted, no tracing is written out

Validated by:
  • _check_store_skims_in_shm

field trace_od: tuple[int, int] | None = None#

Trace origin, destination pair in accessibility calculation

If omitted, no tracing is written out.

Validated by:
  • _check_store_skims_in_shm

field treat_warnings_as_errors: bool = False#

Treat most warnings as errors.

Use of this setting is not recommended outside of rigorous testing regimes.

Added in version 1.3.

Validated by:
  • _check_store_skims_in_shm

field use_shadow_pricing: bool = False#

turn shadow_pricing on and off for work and school location

Validated by:
  • _check_store_skims_in_shm

field want_dest_choice_presampling: bool = True#
Validated by:
  • _check_store_skims_in_shm

field want_dest_choice_sample_tables: bool = False#

turn writing of sample_tables on and off for all models

Validated by:
  • _check_store_skims_in_shm

field write_raw_tables: bool = False#

Dump input tables back to disk immediately after loading them.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

The data tables are written out to <output_dir>/raw_tables before any annotation steps, but after initial processing (renaming, filtering columns, recoding).

Validated by:
  • _check_store_skims_in_shm