Skip to content

How It Works

Bayesian Symbolic Regression (BSR) has the following features:

  1. It models equations as expression trees, with root and intermediate tree nodes representing operators (e.g. * for a binary node and sin for a unary node) and leaf nodes representing features in the data. BSR then defines the search space as the union of the following three parts:

    • Tree structure (T): this represents the structure of the expression tree (e.g. how to recursively construct the tree and when to stop by using leaf nodes), and also specifies the assignment of operators to non-leaf nodes.
    • Leaf nodes (M): this assigns features to leaf nodes that are already defined from part T.
    • Operator parameters (\(\Theta\)): this uses a vector \(\Theta\) to collect additional parameters for certain operators which require them (e.g. a linear operator ln with intercept and slope params).
  2. It specifies priors for each of the three parts above. AutoRA's implementation of BSR allows users to either specify custom priors for part T or choose among a pre-specified set.

  3. It defines actions that mutate one expression tree (original) into a new expression tree (proposed), and supports the calculation of transition probabilities based on the likelihoods of the original and proposed models.

  4. It designs and implements a Reversible-Jump Markov-Chain Monte-Carlo algorithm (RJ-MCMC), which iteratively accepts new samples (where each sample is a valid expression tree) based on the transition probabilities calculated above. In each iteration, K expression trees are obtained either from the original samples or the new proposed samples.

  5. With each iteration, the candidate prediction model is a linear mixture of the K trees, wherein the ground truth response is regressed on the results generated by the K expression trees to obtain the linear regression parameters \(\beta\).

AutoRA's implementation of BSR is adapted from original authors' codebase, and includes comprehensive refactoring of data structures and MCMC computations. It also provides new priors that suit the cognitive and behavioral sciences.