How It Works
Bayesian Symbolic Regression (BSR) has the following features:
It models equations as expression trees, with root and intermediate tree nodes representing operators (e.g.
*for a binary node and
sinfor a unary node) and leaf nodes representing features in the data. BSR then defines the search space as the union of the following three parts:
- Tree structure (T): this represents the structure of the expression tree (e.g. how to recursively construct the tree and when to stop by using leaf nodes), and also specifies the assignment of operators to non-leaf nodes.
- Leaf nodes (M): this assigns features to leaf nodes that are already defined from part T.
- Operator parameters (\(\Theta\)): this uses a vector \(\Theta\) to collect additional parameters for certain operators which require them (e.g. a linear operator
lnwith intercept and slope params).
It specifies priors for each of the three parts above.
AutoRA's implementation of BSR allows users to either specify custom priors for part
Tor choose among a pre-specified set.
actionsthat mutate one expression tree (
original) into a new expression tree (
proposed), and supports the calculation of transition probabilities based on the likelihoods of the
It designs and implements a Reversible-Jump Markov-Chain Monte-Carlo algorithm (RJ-MCMC), which iteratively accepts new samples (where each sample is a valid expression tree) based on the transition probabilities calculated above. In each iteration,
Kexpression trees are obtained either from the
originalsamples or the new
With each iteration, the candidate prediction model is a linear mixture of the
Ktrees, wherein the ground truth response is regressed on the results generated by the
Kexpression trees to obtain the linear regression parameters \(\beta\).
AutoRA's implementation of BSR is adapted from original authors' codebase, and includes comprehensive refactoring of data structures and MCMC computations. It also provides new priors that suit the cognitive and behavioral sciences.