Basic Usage¶

Aim: Use the Controller to recover a simple ground truth model from noisy data.

In [ ]:

Copied!

# Uncomment the following line when running on Google Colab
# !pip install autora
# Uncomment the following line when running on Google Colab
# !pip install autora

In [ ]:

Copied!





import numpy as np
from autora.experimentalist.pipeline import make_pipeline
from autora.variable import VariableCollection, Variable
from sklearn.linear_model import LinearRegression

from autora.workflow import Cycle
from itertools import takewhile
import numpy as np
from autora.experimentalist.pipeline import make_pipeline
from autora.variable import VariableCollection, Variable
from sklearn.linear_model import LinearRegression

from autora.workflow import Cycle
from itertools import takewhile

In [ ]:

Copied!

def ground_truth(x):
    return x + 1
def ground_truth(x):
    return x + 1

The space of allowed x values is the integers between 0 and 10 inclusive, and we record the allowed output values as well.

In [ ]:

Copied!





variables_0 = VariableCollection(
   independent_variables=[Variable(name="x1", allowed_values=range(11))],
   dependent_variables=[Variable(name="y", value_range=(-20, 20))],
   )
variables_0 = VariableCollection(
   independent_variables=[Variable(name="x1", allowed_values=range(11))],
   dependent_variables=[Variable(name="y", value_range=(-20, 20))],
   )

The experimentalist is used to propose experiments. Since the space of values is so restricted, we can just sample them all each time.

In [ ]:

Copied!

example_experimentalist = make_pipeline(
    [variables_0.independent_variables[0].allowed_values])
example_experimentalist = make_pipeline(
    [variables_0.independent_variables[0].allowed_values])

When we run a synthetic experiment, we get a reproducible noisy result:

In [ ]:

Copied!





def get_example_synthetic_experiment_runner():
    rng = np.random.default_rng(seed=180)
    def runner(x):
        return ground_truth(x) + rng.normal(0, 0.1, x.shape)
    return runner
example_synthetic_experiment_runner = get_example_synthetic_experiment_runner()
example_synthetic_experiment_runner(np.array([1]))
def get_example_synthetic_experiment_runner():
    rng = np.random.default_rng(seed=180)
    def runner(x):
        return ground_truth(x) + rng.normal(0, 0.1, x.shape)
    return runner
example_synthetic_experiment_runner = get_example_synthetic_experiment_runner()
example_synthetic_experiment_runner(np.array([1]))

Out[ ]:

array([2.04339546])

The theorist "tries" to work out the best model. We use a trivial scikit-learn regressor.

In [ ]:

Copied!

example_theorist = LinearRegression()
example_theorist = LinearRegression()

We initialize the Controller with the variables describing the domain of the model,
the theorist, experimentalist and experiment runner,
as well as a monitor which will let us know which cycle we're currently on.

In [ ]:

Copied!





cycle = Cycle(
    variables=variables_0,
    theorist=example_theorist,
    experimentalist=example_experimentalist,
    experiment_runner=example_synthetic_experiment_runner,
    monitor=lambda state: print(f"Generated {len(state.models)} models"),
)
cycle # doctest: +ELLIPSIS
cycle = Cycle(
    variables=variables_0,
    theorist=example_theorist,
    experimentalist=example_experimentalist,
    experiment_runner=example_synthetic_experiment_runner,
    monitor=lambda state: print(f"Generated {len(state.models)} models"),
)
cycle # doctest: +ELLIPSIS

Out[ ]:

<autora.workflow.cycle.Cycle at 0x157959660>

We can run the cycle by calling the run method:

In [ ]:

Copied!

_ = cycle.run(num_cycles=3)
_ = cycle.run(num_cycles=3)

Generated 1 models
Generated 2 models
Generated 3 models

We can now interrogate the results. The first set of conditions which went into the experiment runner were:

In [ ]:

Copied!

cycle.data.conditions[0]
cycle.data.conditions[0]

Out[ ]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

The observations include the conditions and the results:

In [ ]:

Copied!

cycle.data.observations[0]
cycle.data.observations[0]

Out[ ]:

array([[ 0.        ,  0.92675345],
       [ 1.        ,  1.89519928],
       [ 2.        ,  3.08746571],
       [ 3.        ,  3.93023943],
       [ 4.        ,  4.95429102],
       [ 5.        ,  6.04763988],
       [ 6.        ,  7.20770574],
       [ 7.        ,  7.85681519],
       [ 8.        ,  9.05735823],
       [ 9.        , 10.18713406],
       [10.        , 10.88517906]])

In the third cycle (index = 2) the first and last values are different again:

In [ ]:

Copied!

cycle.data.observations[2][[0,-1]]
cycle.data.observations[2][[0,-1]]

Out[ ]:

array([[ 0.        ,  1.08559827],
       [10.        , 11.08179553]])

The best fit model after the first cycle is:

In [ ]:

Copied!

cycle.data.models[0]
cycle.data.models[0]

Out[ ]:

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [ ]:

Copied!





def report_linear_fit(m: LinearRegression,  precision=4):
    s = f"y = {np.round(m.coef_[0].item(), precision)} x " \
        f"+ {np.round(m.intercept_.item(), 4)}"
    return s
report_linear_fit(cycle.data.models[0])
def report_linear_fit(m: LinearRegression,  precision=4):
    s = f"y = {np.round(m.coef_[0].item(), precision)} x " \
        f"+ {np.round(m.intercept_.item(), 4)}"
    return s
report_linear_fit(cycle.data.models[0])

Out[ ]:

'y = 1.0089 x + 0.9589'

The best fit model after all the cycles, including all the data, is:

In [ ]:

Copied!

report_linear_fit(cycle.data.models[-1])
report_linear_fit(cycle.data.models[-1])

Out[ ]:

'y = 0.9989 x + 1.0292'

This is close to the ground truth model of x -> (x + 1) We can also run the cycle with more control over the execution flow:

In [ ]:

Copied!

_ = next(cycle)
_ = next(cycle)

Generated 4 models

In [ ]:

Copied!

_ = next(cycle)
_ = next(cycle)

Generated 5 models

In [ ]:

Copied!

_ = next(cycle)
_ = next(cycle)

Generated 6 models

We can continue to run the cycle as long as we like, with a simple arbitrary stopping condition like the number of models generated:

In [ ]:

Copied!

_ = list(takewhile(lambda c: len(c.data.models) < 9, cycle))
_ = list(takewhile(lambda c: len(c.data.models) < 9, cycle))

Generated 7 models
Generated 8 models
Generated 9 models

... or the precision (here we keep iterating while the difference between the gradients of the second-last and last cycle is larger than 0.001).

In [ ]:

Copied!





_ = list(
        takewhile(
            lambda c: np.abs(c.data.models[-1].coef_.item() -
                           c.data.models[-2].coef_.item()) > 1e-3,
            cycle
        )
    )
_ = list(
        takewhile(
            lambda c: np.abs(c.data.models[-1].coef_.item() -
                           c.data.models[-2].coef_.item()) > 1e-3,
            cycle
        )
    )

Generated 10 models
Generated 11 models

... or continue to run as long as we like:

In [ ]:

Copied!

_ = cycle.run(num_cycles=100)
_ = cycle.run(num_cycles=100)

Generated 12 models
Generated 13 models
Generated 14 models
Generated 15 models
Generated 16 models
Generated 17 models
Generated 18 models
Generated 19 models
Generated 20 models
Generated 21 models
Generated 22 models
Generated 23 models
Generated 24 models
Generated 25 models
Generated 26 models
Generated 27 models
Generated 28 models
Generated 29 models
Generated 30 models
Generated 31 models
Generated 32 models
Generated 33 models
Generated 34 models
Generated 35 models
Generated 36 models
Generated 37 models
Generated 38 models
Generated 39 models
Generated 40 models
Generated 41 models
Generated 42 models
Generated 43 models
Generated 44 models
Generated 45 models
Generated 46 models
Generated 47 models
Generated 48 models
Generated 49 models
Generated 50 models
Generated 51 models
Generated 52 models
Generated 53 models
Generated 54 models
Generated 55 models
Generated 56 models
Generated 57 models
Generated 58 models
Generated 59 models
Generated 60 models
Generated 61 models
Generated 62 models
Generated 63 models
Generated 64 models
Generated 65 models
Generated 66 models
Generated 67 models
Generated 68 models
Generated 69 models
Generated 70 models
Generated 71 models
Generated 72 models
Generated 73 models
Generated 74 models
Generated 75 models
Generated 76 models
Generated 77 models
Generated 78 models
Generated 79 models
Generated 80 models
Generated 81 models
Generated 82 models
Generated 83 models
Generated 84 models
Generated 85 models
Generated 86 models
Generated 87 models
Generated 88 models
Generated 89 models
Generated 90 models
Generated 91 models
Generated 92 models
Generated 93 models
Generated 94 models
Generated 95 models
Generated 96 models
Generated 97 models
Generated 98 models
Generated 99 models
Generated 100 models
Generated 101 models
Generated 102 models
Generated 103 models
Generated 104 models
Generated 105 models
Generated 106 models
Generated 107 models
Generated 108 models
Generated 109 models
Generated 110 models
Generated 111 models