This page documents the ActivitySim software design and how to contribute to the project.
The core software components of ActivitySim are described below. ActivitySim is implemented in Python, and makes heavy use of the vectorized backend C/C++ libraries in pandas and numpy in order to be quite performant. The core design principle of the system is vectorization of for loops, and this principle is woven into the system wherever reasonable. As a result, the Python portions of the software can be thought of as more of an orchestrator, data processor, etc. that integrates a series of C/C++ vectorized data table and matrix operations. The model system formulates each simulation as a series of vectorized table operations and the Python layer is responsible for setting up and providing expressions to operate on these large data tables.
In developing this software platform, we strive to adhere to a best practices approach to scientific computing, as summarized in this article.
An ActivitySim model is a sequence of model / data processing steps, commonly known as a data pipeline. A well defined data pipeline has the ability to resume jobs at a known point, which facilitates debugging of problems with data and/or calculations. It also allows for checkpointing model resources, such as the state of each person at a point in the model simulation. Checkpointing also allows for regression testing of results at specified points in overall model run.
Earlier versions of ActivitySim depended on ORCA, an orchestration/pipeline tool
that defines model steps, dynamic data sources, and connects them to processing functions. ORCA defined dynamic data tables
based on pandas DataFrames, columns based on pandas Series, and injectables (functions). Model steps
were executed as steps registered with the ORCA engine. Over time ActivitySim has extended ORCA’s functionality by
adding a Pipeline that runs a series of model steps, manages the state of the data
tables throughout the model run, allows for restarting at any model step, and integrates with the
random number generation procedures (see Random). As a result, ORCA is no longer a dependency of
the system. See
activitysim.core.inject for more information.
ActivitySim works with three open data formats, HDF5 , Open Matrix (OMX), and CSV . The HDF5 binary data container is used for the Pipeline data store. OMX, which is based on HDF5, is used for input and output matrices (skims and demand matrices). CSV files are used for various inputs and outputs as well.
Three key data structures in ActivitySim are:
pandas.DataFrame - A data table with rows and columns, similar to an R data frame, Excel worksheet, or database table
pandas.Series - a vector of data, a column in a DataFrame table or a 1D array
numpy.array - an N-dimensional array of items of the same type, and is often a network skim matrix or collection of skim matrices by time-period or mode for example
ActivitySim exposes all model expressions in CSV files. These model expression CSV files contain Python expressions, mainly pandas/numpy expression, and reference input data tables and network skim matrices. With this design, the Python code, which can be thought of as a generic expression engine, and the specific model calculations, such as the utilities, are separate. This helps to avoid modifying the actual Python code when making changes to the models, such as during model calibration. An example of model expressions is found in the example auto ownership model specification file - auto_ownership.csv. Refer to the Utility Expressions section for more detail.
Many of the models have pre- and post-processor table annotators, which read a CSV file of expression, calculate
required additional table fields, and join the fields to the target tables. An example table annotation expressions
file is found in the example configuration files for households for the CDAP model -
Refer to Estimation for more information and the
ActivitySim currently supports multinomial (MNL) and nested logit (NL) choice models. Refer to Logit for more information. It also supports custom expressions as noted above, which can often be used to code additional types of choice models. In addition, developers can write their own choice models in Python and expose these through the framework.
Person Time Windows¶
The departure time and duration models require person time windows. Time windows are adjacent time periods that are available for travel. ActivitySim maintains time windows in a pandas table where each row is a person and each time period is a column. As travel is scheduled throughout the simulation, the relevant columns for the tour, trip, etc. are updated as needed. Refer to Person Time Windows for more information.
An activitysim travel model is made up of a series of models, or steps in the data pipeline. A model typically does the following:
registers an ORCA step that is called by the model runner
sets up logging and tracing
gets the relevant input data tables from ORCA
gets all required settings, config files, etc.
runs a data preprocessor on each input table that needs additional fields for the calculation
solves the model in chunks of data table rows
runs a data postprocessor on the output table data that needs additional fields for later models
writes the resulting table data to the pipeline
See Models for more information.
The development version of ActivitySim can be installed as follows:
Clone or fork the source from the GitHub repository
Activate the correct conda environment if needed
Navigate to your local activitysim git directory
Run the command
python setup.py develop
develop command is required in order to make changes to the
source and see the results without reinstalling. You may need to first uninstall the
the pip installed version before installing the development version from source. This is
pip uninstall activitysim.
ActivitySim development adheres to the following standards.
Imports should be one per line.
Imports should be grouped into standard library, third-party, and intra-library imports.
fromimport should follow regular
Within each group the imports should be alphabetized.
Imports of scientific Python libraries should follow these conventions:
import numpy as np import pandas as pd
Working Together in the Repository¶
We use GitHub Flow. The key points to our GitHub workflow are:
The master branch contains the latest working/release version of the ActivitySim resources
The master branch is protected and therefore can only be written to by the Travis CI system
Work is done in an issue/feature branch (or a fork) and then pushed to a new branch
The test system automatically runs the tests on the new branch
If the tests pass, then a manual pull request can be approved to merge into master
The repository administrator handles the pull request and makes sure that related resources such as the wiki, documentation, issues, etc. are updated. See Releases for more information.
ActivitySim uses the following versioning convention
where MAJOR designates a major revision number for the software, like 2 or 3 for Python. Usually, raising a major revision number means that you are adding a lot of features, breaking backward-compatibility or drastically changing the APIs (Application Program Interface) or ABIs (Application Binary Interface).
MINOR usually groups moderate changes to the software like bug fixes or minor improvements. Most of the time, end users can upgrade with no risks their software to a new minor release. In case an API changes, the end users will be notified with deprecation warnings. In other words, API and ABI stability is usually a promise between two minor releases.
ActivitySim testing is done with three tools:
pycodestyle, a tool to check Python code against the pycodestyle style conventions
pytest, a Python testing tool
coveralls, a tool for measuring code coverage and publishing code coverage stats online
To run the tests locally, first make sure the required packages are installed:
pip install pytest pytest-cov coveralls pycodestyle
Next, run the tests with the following commands:
pycodestyle activitysim py.test --cov activitysim --cov-report term-missing
These same tests are run by Travis with each push to the repository. These tests need to pass in order to merge the revisions into master.
In some cases, test targets need to be updated to match the new results produced by the code since these are now the correct results. In order to update the test targets, first determine which tests are failing and then review the failing lines in the source files. These are easy to identify since each test ultimately comes down to one of Python’s various types of assert statements. Once you identify which assert is failing, you can work your way back through the code that creates the test targets in order to update it. After updating the test targets, re-run the tests to confirm the new code passes all the tests.
A handy way to profile ActivitySim model runs is to use snakeviz. To do so, first install snakeviz and then run ActivitySim with the Python profiler (cProfile) to create a profiler file. Then run snakeviz on the profiler file to interactively explore the component runtimes.
The documentation is written in reStructuredText markup and built with Sphinx. In addition to converting rst files to html and other document formats, these tools also read the inline Python docstrings and convert them into html as well. ActivitySim’s docstrings are written in numpydoc format since it is easier to use than standard rst format.
To build the documentation, first make sure the required packages are installed:
pip install sphinx numpydoc sphinx_rtd_theme
Next, build the documentation in html format with the following command run from the
If the activitysim package is installed, then the documentation will be built from that version of
the source code instead of the git repo version. Make sure to
pip uninstall activitysim before
building the documentation if needed.
When pushing revisions to the repo, the documentation is automatically built by Travis after
successfully passing the tests. The documents are built with the
The script does the following:
installs the required python packages
../activitysim/docs/_build/html/*pages to the
GitHub automatically publishes the gh-pages branch at https://activitysim.github.io/activitysim.
Before releasing a new version of ActivitySim, the following release checklist should be consulted:
Create the required Anaconda environment
Run all the examples, including the full scale example
Build the package
Install and run the package in a new Anaconda environment
Build the documentation
Run the tests
Increment the package version number
Update any necessary web links, such as switching from the develop branch to the master branch
Contribution Review Criteria¶
When contributing to ActivitySim, the set of questions below will be asked of the contribution. Make sure to also review the documentation above before making a submittal. The automated test system also provides some helpful information where identified.
To submit a contribution for review, issue a pull request with a comment introducing your contribution. The comment should include a brief overview, responses to the questions, and pointers to related information. The entire submittal should ideally be self contained so any additional documentation should be in the pull request as well. The PMC and/or its Contractor will handle the review request, comment on each question, complete the feedback form, and reply to the pull request. If accepted, the commit(s) will be squashed and merged. Its a good idea to setup a pre-submittal meeting to discuss questions and better understand expectations.
Does it contain all the required elements, including a runnable example, documentation, and tests?
Does it implement good methods (i.e. is it consistent with good practices in travel modeling)?
Are the runtimes reasonable and does it provide documentation justifying this claim?
Does it include non-Python code, such as C/C++? If so, does it compile on any OS and are compilation instructions included?
Is it licensed with the ActivitySim license that allows the code to be freely distributed and modified and includes attribution so that the provenance of the code can be tracked? Does it include an official release of ownership from the funding agency if applicable?
Does it appropriately interact with the data pipeline (i.e. it doesn’t create new ways of managing data)?
Does it include regression tests to enable checking that consistent results will be returned when updates are made to the framework?
Does it include sufficient test coverage and test data for existing and proposed features?
Any other comments or suggestions for improving the developer experience?
The PMC and/or its Contractor will provide feedback for each review criteria above and tag each submittal category as follows:
Accept but recommend revisions
Do not accept