Basic Usage¶
Aim: Use the Controller to recover a simple ground truth model from noisy data.
# Uncomment the following line when running on Google Colab
# !pip install autora
import numpy as np
from autora.experimentalist.pipeline import make_pipeline
from autora.variable import VariableCollection, Variable
from sklearn.linear_model import LinearRegression
from autora.workflow import Cycle
from itertools import takewhile
def ground_truth(x):
return x + 1
The space of allowed x values is the integers between 0 and 10 inclusive, and we record the allowed output values as well.
variables_0 = VariableCollection(
independent_variables=[Variable(name="x1", allowed_values=range(11))],
dependent_variables=[Variable(name="y", value_range=(-20, 20))],
)
The experimentalist is used to propose experiments. Since the space of values is so restricted, we can just sample them all each time.
example_experimentalist = make_pipeline(
[variables_0.independent_variables[0].allowed_values])
When we run a synthetic experiment, we get a reproducible noisy result:
def get_example_synthetic_experiment_runner():
rng = np.random.default_rng(seed=180)
def runner(x):
return ground_truth(x) + rng.normal(0, 0.1, x.shape)
return runner
example_synthetic_experiment_runner = get_example_synthetic_experiment_runner()
example_synthetic_experiment_runner(np.array([1]))
array([2.04339546])
The theorist "tries" to work out the best model. We use a trivial scikit-learn regressor.
example_theorist = LinearRegression()
We initialize the Controller with the variables describing the domain of the model,
the theorist, experimentalist and experiment runner,
as well as a monitor which will let us know which cycle we're currently on.
cycle = Cycle(
variables=variables_0,
theorist=example_theorist,
experimentalist=example_experimentalist,
experiment_runner=example_synthetic_experiment_runner,
monitor=lambda state: print(f"Generated {len(state.models)} models"),
)
cycle # doctest: +ELLIPSIS
<autora.workflow.cycle.Cycle at 0x157959660>
We can run the cycle by calling the run method:
_ = cycle.run(num_cycles=3)
Generated 1 models Generated 2 models Generated 3 models
We can now interrogate the results. The first set of conditions which went into the experiment runner were:
cycle.data.conditions[0]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
The observations include the conditions and the results:
cycle.data.observations[0]
array([[ 0. , 0.92675345], [ 1. , 1.89519928], [ 2. , 3.08746571], [ 3. , 3.93023943], [ 4. , 4.95429102], [ 5. , 6.04763988], [ 6. , 7.20770574], [ 7. , 7.85681519], [ 8. , 9.05735823], [ 9. , 10.18713406], [10. , 10.88517906]])
In the third cycle (index = 2) the first and last values are different again:
cycle.data.observations[2][[0,-1]]
array([[ 0. , 1.08559827], [10. , 11.08179553]])
The best fit model after the first cycle is:
cycle.data.models[0]
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
def report_linear_fit(m: LinearRegression, precision=4):
s = f"y = {np.round(m.coef_[0].item(), precision)} x " \
f"+ {np.round(m.intercept_.item(), 4)}"
return s
report_linear_fit(cycle.data.models[0])
'y = 1.0089 x + 0.9589'
The best fit model after all the cycles, including all the data, is:
report_linear_fit(cycle.data.models[-1])
'y = 0.9989 x + 1.0292'
This is close to the ground truth model of x -> (x + 1) We can also run the cycle with more control over the execution flow:
_ = next(cycle)
Generated 4 models
_ = next(cycle)
Generated 5 models
_ = next(cycle)
Generated 6 models
We can continue to run the cycle as long as we like, with a simple arbitrary stopping condition like the number of models generated:
_ = list(takewhile(lambda c: len(c.data.models) < 9, cycle))
Generated 7 models Generated 8 models Generated 9 models
... or the precision (here we keep iterating while the difference between the gradients of the second-last and last cycle is larger than 0.001).
_ = list(
takewhile(
lambda c: np.abs(c.data.models[-1].coef_.item() -
c.data.models[-2].coef_.item()) > 1e-3,
cycle
)
)
Generated 10 models Generated 11 models
... or continue to run as long as we like:
_ = cycle.run(num_cycles=100)
Generated 12 models Generated 13 models Generated 14 models Generated 15 models Generated 16 models Generated 17 models Generated 18 models Generated 19 models Generated 20 models Generated 21 models Generated 22 models Generated 23 models Generated 24 models Generated 25 models Generated 26 models Generated 27 models Generated 28 models Generated 29 models Generated 30 models Generated 31 models Generated 32 models Generated 33 models Generated 34 models Generated 35 models Generated 36 models Generated 37 models Generated 38 models Generated 39 models Generated 40 models Generated 41 models Generated 42 models Generated 43 models Generated 44 models Generated 45 models Generated 46 models Generated 47 models Generated 48 models Generated 49 models Generated 50 models Generated 51 models Generated 52 models Generated 53 models Generated 54 models Generated 55 models Generated 56 models Generated 57 models Generated 58 models Generated 59 models Generated 60 models Generated 61 models Generated 62 models Generated 63 models Generated 64 models Generated 65 models Generated 66 models Generated 67 models Generated 68 models Generated 69 models Generated 70 models Generated 71 models Generated 72 models Generated 73 models Generated 74 models Generated 75 models Generated 76 models Generated 77 models Generated 78 models Generated 79 models Generated 80 models Generated 81 models Generated 82 models Generated 83 models Generated 84 models Generated 85 models Generated 86 models Generated 87 models Generated 88 models Generated 89 models Generated 90 models Generated 91 models Generated 92 models Generated 93 models Generated 94 models Generated 95 models Generated 96 models Generated 97 models Generated 98 models Generated 99 models Generated 100 models Generated 101 models Generated 102 models Generated 103 models Generated 104 models Generated 105 models Generated 106 models Generated 107 models Generated 108 models Generated 109 models Generated 110 models Generated 111 models