Basic Usage¶

Content:

Basic Functionality for sampling and processing equations
Advanced settings for sampling equations

Installation¶

The Equation Tree package is available on pyPI:

In [ ]:

Copied!

!pip install equation_tree
!pip install equation_tree

Basic Functionality¶

Sampling With Default Settings¶

First, we need to import the functionality. Here, we also set a seed to ensure reproducible results.

In [ ]:

Copied!





import random

from equation_tree import sample

# To obtain reproducible results, we set a seed for the following section
import numpy as np
np.random.seed(42)
# Adjusting the input dimension of the equations
import random

from equation_tree import sample

# To obtain reproducible results, we set a seed for the following section
import numpy as np
np.random.seed(42)
# Adjusting the input dimension of the equations

Then, we can sample an equation:

In [ ]:

Copied!

equation = sample()
equation = sample()

Equation Representations And Features¶

First, lets look at the type of the equation

In [ ]:

Copied!

type(equation)
type(equation)

It is a list! This is because we can sample multiple equations in one go:

In [ ]:

Copied!

equations = sample(n=100)
equations = sample(n=100)

This returns 100 equations:

In [ ]:

Copied!

print(len(equations))
print(equations[0])
print(equations[42])
print(len(equations))
print(equations[0])
print(equations[42])

They are represented as strings, but we can look at other representations. For example, prefix notation (for more details on different representations of the equations, see the respective section of the documentation):

In [ ]:

Copied!

equations[42].prefix
equations[42].prefix

We can also look at various features of the equation. For example, the number of constants, the tree depth of the underlying tree, the number of nodes or the tree structure (for more details on these features, see the respective section of the documentation):

In [ ]:

Copied!





print(equations[42].n_constants)
print(equations[42].depth)
print(equations[42].n_nodes)
print(equations[42].structure)
print(equations[42].n_constants)
print(equations[42].depth)
print(equations[42].n_nodes)
print(equations[42].structure)

Instantiate Equations¶

Note, the constants in the sampled equation are abstract: symbols starting with c represent constants (c_1, c_2, ...). We can instantiate constants with numbers:

In [ ]:

Copied!





# first we need to import the functionality
from equation_tree import instantiate_constants
import random

# then we can use a function to instantiate the constants. For example for random constants between 0 and 1:
instantiated_equation = instantiate_constants(equations[42], lambda : random.random())
print(f'abstract: {equations[42]}', f', instantiated: {instantiated_equation}')
# first we need to import the functionality
from equation_tree import instantiate_constants
import random

# then we can use a function to instantiate the constants. For example for random constants between 0 and 1:
instantiated_equation = instantiate_constants(equations[42], lambda : random.random())
print(f'abstract: {equations[42]}', f', instantiated: {instantiated_equation}')

In [ ]:

Copied!

# we can also use other functions (for example all functions to be a constant
instantiated_equation_ = instantiate_constants(equations[41], lambda : 1)
print(f'abstract: {equations[41]}', f', instantiated: {instantiated_equation_}')
# we can also use other functions (for example all functions to be a constant
instantiated_equation_ = instantiate_constants(equations[41], lambda : 1)
print(f'abstract: {equations[41]}', f', instantiated: {instantiated_equation_}')

Note, we can use arbitrary functions to instantiate the constants.

Evaluating Equations¶

After instantiating equations, we can evaluate them on arbitrary input:

In [ ]:

Copied!

# import functionality
values = instantiated_equation.evaluate({'x_1': [1, 2, 3, 4]})
values
# import functionality
values = instantiated_equation.evaluate({'x_1': [1, 2, 3, 4]})
values

In [ ]:

Copied!





# We can also use pandas dataframes as inputs:

# import functionality
import pandas as pd

# define the input and get the values
input_df = pd.DataFrame({'x_1': [1, 2, 3, 4]})
instantiated_equation.evaluate(input_df)
# We can also use pandas dataframes as inputs:

# import functionality
import pandas as pd

# define the input and get the values
input_df = pd.DataFrame({'x_1': [1, 2, 3, 4]})
instantiated_equation.evaluate(input_df)

Sample Settings¶

When sampling equations, we can control for a variety of features of the underlying distribution.

Input Dimensions¶

We can manipulate the space on witch the equations are defined. For example, if we want equations that are defined on 2-dimensions, we can write:

In [ ]:

Copied!

equations_2d = sample(n=5, max_num_variables=2)
equations_2d = sample(n=5, max_num_variables=2)

In [ ]:

Copied!

equations_2d
equations_2d

Note, not all the equations have exactly two input variable. Some of them have only one. This is since equations with one input variable are still defined on two (or more) dimensions.

Equation Complexity¶

We can also manipulate the equation complexity (for example, as tree depth):

In [ ]:

Copied!

equations_simple = sample(n=5, depth=3)
equations_complex = sample(n=5, depth=8)
equations_simple = sample(n=5, depth=3)
equations_complex = sample(n=5, depth=8)

In [ ]:

Copied!

print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)
print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)

Instead of an exact depth, we can also sample all equations up to a specified depth:

In [ ]:

Copied!

equations_simple = sample(n=5, max_depth=3)
equations_complex = sample(n=5, max_depth=8)
equations_simple = sample(n=5, max_depth=3)
equations_complex = sample(n=5, max_depth=8)

In [ ]:

Copied!

print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)
print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)

Using Priors¶

We can also make use of priors to fully customize the sampling. Here, the entries for the structures, features, functions and operators represent the probability of the respective attribute being sampled.

In [ ]:

Copied!





p = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .8, '-': .2}
}
equations_with_prior = sample(n=10, prior=p, max_num_variables=10)
equations_with_prior
p = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .8, '-': .2}
}
equations_with_prior = sample(n=10, prior=p, max_num_variables=10)
equations_with_prior

This functionality extends to the use of conditional priors conditioned on the respective parent node. For example, we can manipulate the probability of specific features, functions and operators inside a sin function (Here, if a feature is the child of the sin function it will always be a variable, if a function is the child of a sin function it will always be cos, and if an operator is the child of a sin function, it will have a 1:1 chance of being + or -).

In [ ]:

Copied!





p_ = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .5, '-': .5},
    'function_conditionals': {
        'sin': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'sin': 0., 'cos': 1.},
            'operators': {'+': .5, '-': .5}
        },
        'cos': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'cos': 1., 'sin': 0.},
            'operators': {'+': 0., '-': 1.}
        }
    },
    'operator_conditionals': {
        '+': {
            'features': {'constants': .5, 'variables': .5},
            'functions': {'sin': 1., 'cos': 0.},
            'operators': {'+': 1., '-': 0.}
        },
        '-': {
            'features': {'constants': .3, 'variables': .7},
            'functions': {'cos': .5, 'sin': .5},
            'operators': {'+': .9, '-': .1}
        }
    },
}
equations_with_conditional_prior = sample(n=10, prior=p_, max_num_variables=10)
equations_with_conditional_prior
p_ = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .5, '-': .5},
    'function_conditionals': {
        'sin': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'sin': 0., 'cos': 1.},
            'operators': {'+': .5, '-': .5}
        },
        'cos': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'cos': 1., 'sin': 0.},
            'operators': {'+': 0., '-': 1.}
        }
    },
    'operator_conditionals': {
        '+': {
            'features': {'constants': .5, 'variables': .5},
            'functions': {'sin': 1., 'cos': 0.},
            'operators': {'+': 1., '-': 0.}
        },
        '-': {
            'features': {'constants': .3, 'variables': .7},
            'functions': {'cos': .5, 'sin': .5},
            'operators': {'+': .9, '-': .1}
        }
    },
}
equations_with_conditional_prior = sample(n=10, prior=p_, max_num_variables=10)
equations_with_conditional_prior

WARNING If your application is dependent on these priors, you should "burn" samples before starting the sampling. During the sampling process, equations get simplified and invalid equations are discarded. This is likely to lead to disparities between the priors and the sampled frequencies. To counteract this, the package offers the functionality to "burn" samples and adjust the priors so that the outcome frequency match them more closely. To burn samples, use the following code (We don't run it in the notebook since the adjusted priors are saved to disk for future use):

burn(
    prior,
    max_number_variables,
    path_to_file,
    number_of_burned_samples,
    learning_rate
    )

*this function should be run multiple times. The learning rate defines how much adjusted from previous runs are adjusted. After burning, you can load the adjusted priors via:

    sample(..., file=path_to_file)

*multiple adjusted priors can be stored in the same file.