!pip install equation_tree
import random
from equation_tree import sample
# To obtain reproducible results, we set a seed for the following section
import numpy as np
np.random.seed(42)
# Adjusting the input dimension of the equations
Then, we can sample an equation:
equation = sample()
Equation Representations And Features¶
First, lets look at the type of the equation
type(equation)
It is a list! This is because we can sample multiple equations in one go:
equations = sample(n=100)
This returns 100 equations:
print(len(equations))
print(equations[0])
print(equations[42])
They are represented as strings, but we can look at other representations. For example, prefix notation (for more details on different representations of the equations, see the respective section of the documentation):
equations[42].prefix
We can also look at various features of the equation. For example, the number of constants, the tree depth of the underlying tree, the number of nodes or the tree structure (for more details on these features, see the respective section of the documentation):
print(equations[42].n_constants)
print(equations[42].depth)
print(equations[42].n_nodes)
print(equations[42].structure)
Instantiate Equations¶
Note, the constants in the sampled equation are abstract: symbols starting with c represent constants (c_1, c_2, ...). We can instantiate constants with numbers:
# first we need to import the functionality
from equation_tree import instantiate_constants
import random
# then we can use a function to instantiate the constants. For example for random constants between 0 and 1:
instantiated_equation = instantiate_constants(equations[42], lambda : random.random())
print(f'abstract: {equations[42]}', f', instantiated: {instantiated_equation}')
# we can also use other functions (for example all functions to be a constant
instantiated_equation_ = instantiate_constants(equations[41], lambda : 1)
print(f'abstract: {equations[41]}', f', instantiated: {instantiated_equation_}')
Note, we can use arbitrary functions to instantiate the constants.
Evaluating Equations¶
After instantiating equations, we can evaluate them on arbitrary input:
# import functionality
values = instantiated_equation.evaluate({'x_1': [1, 2, 3, 4]})
values
# We can also use pandas dataframes as inputs:
# import functionality
import pandas as pd
# define the input and get the values
input_df = pd.DataFrame({'x_1': [1, 2, 3, 4]})
instantiated_equation.evaluate(input_df)
equations_2d = sample(n=5, max_num_variables=2)
equations_2d
Note, not all the equations have exactly two input variable. Some of them have only one. This is since equations with one input variable are still defined on two (or more) dimensions.
Equation Complexity¶
We can also manipulate the equation complexity (for example, as tree depth):
equations_simple = sample(n=5, depth=3)
equations_complex = sample(n=5, depth=8)
print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)
Instead of an exact depth, we can also sample all equations up to a specified depth:
equations_simple = sample(n=5, max_depth=3)
equations_complex = sample(n=5, max_depth=8)
print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)
Using Priors¶
We can also make use of priors to fully customize the sampling. Here, the entries for the structures, features, functions and operators represent the probability of the respective attribute being sampled.
p = {
'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
'features': {'constants': .2, 'variables': .8},
'functions': {'sin': .5, 'cos': .5},
'operators': {'+': .8, '-': .2}
}
equations_with_prior = sample(n=10, prior=p, max_num_variables=10)
equations_with_prior
This functionality extends to the use of conditional priors conditioned on the respective parent node. For example, we can manipulate the probability of specific features, functions and operators inside a sin function (Here, if a feature is the child of the sin function it will always be a variable, if a function is the child of a sin function it will always be cos, and if an operator is the child of a sin function, it will have a 1:1 chance of being + or -).
p_ = {
'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
'features': {'constants': .2, 'variables': .8},
'functions': {'sin': .5, 'cos': .5},
'operators': {'+': .5, '-': .5},
'function_conditionals': {
'sin': {
'features': {'constants': 0., 'variables': 1.},
'functions': {'sin': 0., 'cos': 1.},
'operators': {'+': .5, '-': .5}
},
'cos': {
'features': {'constants': 0., 'variables': 1.},
'functions': {'cos': 1., 'sin': 0.},
'operators': {'+': 0., '-': 1.}
}
},
'operator_conditionals': {
'+': {
'features': {'constants': .5, 'variables': .5},
'functions': {'sin': 1., 'cos': 0.},
'operators': {'+': 1., '-': 0.}
},
'-': {
'features': {'constants': .3, 'variables': .7},
'functions': {'cos': .5, 'sin': .5},
'operators': {'+': .9, '-': .1}
}
},
}
equations_with_conditional_prior = sample(n=10, prior=p_, max_num_variables=10)
equations_with_conditional_prior
WARNING If your application is dependent on these priors, you should "burn" samples before starting the sampling. During the sampling process, equations get simplified and invalid equations are discarded. This is likely to lead to disparities between the priors and the sampled frequencies. To counteract this, the package offers the functionality to "burn" samples and adjust the priors so that the outcome frequency match them more closely. To burn samples, use the following code (We don't run it in the notebook since the adjusted priors are saved to disk for future use):
burn(
prior,
max_number_variables,
path_to_file,
number_of_burned_samples,
learning_rate
)
*this function should be run multiple times. The learning rate defines how much adjusted from previous runs are adjusted. After burning, you can load the adjusted priors via:
sample(..., file=path_to_file)
*multiple adjusted priors can be stored in the same file.