Skip to content

Distance Metrics

To evaluate the precision of a symbolic regression algorithm, or the loss in training of such algorithm, we need a distance metric between equations. Evaluating such a distance is not straight forward. The Equation Tree pacakge therefore includes various metrics that can be used individually or in conjunction:

  • Prediction Distance. Distance between function values as proposed by La Cava et al. (2021).
  • Symbolic Solution. Another metric proposed by La Cava et al. (2021) is called symbolic solution, designed to capture SR models that differ from the true model by a constant or scalar.
  • Normalized Edit Distance. Matsubara et al. (2022) propose a normalized edit distance for the trees. For a pair of two trees, edit distance computes the minimum cost to transform one to another with a sequence of operations, each of which either 1) inserts, 2) deletes, or 3) renames a node.

Prediction Distance

coming soon ...

Pros - ...

Cons - Can be heavily reliant on the input sample it has been evaluated on

Symbolic Solution

coming soon ...

Pros - ...

Cons - Is a binary if the equations do not only differ by a scalar or a constant

Normalized Edit Distance

coming soon ...

Pros - ...

Cons - ...

References

La Cava, W. G., Orzechowski, P., Burlacu, B., de França, F. O., Virgolin, M., Jin, Y., Kommenda, M., & Moore, J. H. "Contemporary Symbolic Regression Methods and their Relative Performance." In CoRR (2021), Available at: https://arxiv.org/abs/2107.14351

Matsubara, Y., Chiba, N., Igarashi, R., & Ushiku, Y. "Rethinking symbolic regression datasets and benchmarks for scientific discovery." In arXiv preprint arXiv:2206.10540. (2022), Available at: https://arxiv.org/abs/2206.10540