How It Works
The Bayesian Machine Scientist (BMS) uses Bayesian inference to search the space of possible equations. The following are the relevant quantities in this Bayesian approach:
Probability of : Conditional Probability of given : Joint Probability of and
Mathematically, we know:
Rearranging this expression, we get Bayes rule:
Here,
In essence, prior knowledge
BMS capitalizes on this process for updating knowledge:
1) It formulates the problem of fitting an equation to data by first specifying priors over equations. In their paper, Guimerà et al. use the empirical frequency of equations on Wikipedia to specify these priors.
2) It then scores different candidate equations using description length as a loss function. Formally, this description length is the number of natural units of information (nats) needed to jointly encode the data and the equation optimally.
3) Since the loss function is computationally intractable, it uses an approximation:
where
In this formulation, the goodness of fit
To better frame the problem, equations are modeled as acyclic graphs (i.e., trees).
Bayesian inference via MCMC is then applied to navigate the search space efficiently. Note, there are many sampling strategies other than MCMC that could be used.
The search space is very rugged, and local minima are difficult to escape, so BMS employs parallel tempering to overcome this.
One incremental unit of search in this approach involves two steps:
I) Markov chain Monte Carlo Sampling:
a) One of three mutations - Root Removal/Addition, Elementary Tree Replacement, Node Replacement - are selected for the equation tree.
b) Choosing the operator associated with the mutation relies on how likely the operator is to turn up (encoded in the priors).
c) Choosing a specific variable or parameter value is random.
d) Accepting or rejecting the mutation depends on Metropolis' rule.
II) Parallel Tree Swap:
a) Two parallel trees held at different temperatures are selected.
b) The temperatures of the two trees are swapped.
c) If this decreases the loss of the now colder tree, the tree temperatures are permanently swapped.
d) If not, the trees are reverted to preexisting temperatures.
After iterating over these two steps for
References
R. Guimerà et al., A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6, eaav697 (2020). Wit, Ernst; Edwin van den Heuvel; Jan-Willem Romeyn (2012).