Basic Introduction to Functions and States¶

Using the functions and objects in autora.state, we can build flexible pipelines and cycles which operate on state objects.

Theoretical Overview¶

The fundamental idea is this:

We define a "state" object $S$ which can be modified by components of autora (theorist, experimentalist, experiment_runner), $\Delta S$.
A new state at some point $i+1$ is $$S_{i+1} = S_i + \Delta S_{i+1}$$
The cycle state after $n$ steps is thus $$S_n = S_{0} + \sum^{n}_{i=1} \Delta S_{i}$$

To represent $S$ in code, you can use autora.state.State. To operate on these, we define functions.

Each component in an autora cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not. $$ f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}$$
There is a wrapper function $w$ (autora.state.wrap_to_use_state) which changes the signature of $f$ to require $S$ and aggregates the resulting $\Delta S_{i+1}$ $$w\left[f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}\right] \rightarrow \left[ f^\prime(S_i, a_0, ..., a_m) \rightarrow S_{i} + \Delta S_{i+1} = S_{i+1}\right]$$
Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\prime$, the full autora cycle can then be represented as: $$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0)))$$

There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we can define a full autora cycle using python notation as shown in the following example.

.

A new state at some point $i+1$ is $$S_{i+1} = S_i + \Delta S_{i+1}$$
The cycle state after $n$ steps is thus $$S_n = S_{0} + \sum^{n}_{i=1} \Delta S_{i}$$

To represent $S$ and $\Delta S$ in code, you can use autora.state.State and autora.state.Delta respectively. To operate on these, we define functions.

Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not. $$ f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}$$
There is a wrapper function $w$ (autora.state.wrap_to_use_state) which changes the signature of $f$ to require $S$ and aggregates the resulting $\Delta S_{i+1}$ $$w\left[f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}\right] \rightarrow \left[ f^\prime(S_i, a_0, ..., a_m) \rightarrow S_{i} + \Delta S_{i+1} = S_{i+1}\right]$$
Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\prime$, the full AER cycle can then be represented as: $$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0)))$$

There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we can define a full AER cycle using python notation as shown in the following example.

Example¶

First initialize the State. In this case, we use the pre-defined StandardState which implements the standard autora naming convention. There are two variables x with a range [-10, 10] and y with an unspecified range.

In [ ]:

Copied!





from autora.state import StandardState
from autora.variable import VariableCollection, Variable

s_0 = StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    )
)
from autora.state import StandardState
from autora.variable import VariableCollection, Variable

s_0 = StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    )
)

Specify the experimentalist. Use a standard function random_pool. This gets 5 independent random samples (by default, configurable using an argument) from the value_range of the independent variables, and returns them in a DataFrame. To make this work as a function on the State objects, we wrap it in the on_state function and determine the state field it will operate on, namely conditions.

In [ ]:

Copied!





from autora.experimentalist.random import random_pool
from autora.state import on_state

experimentalist = on_state(function=random_pool, output=["conditions"])
s_1 = experimentalist(s_0, random_state=42)
s_1
from autora.experimentalist.random import random_pool
from autora.state import on_state

experimentalist = on_state(function=random_pool, output=["conditions"])
s_1 = experimentalist(s_0, random_state=42)
s_1

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.479121
1 -1.222431
2  7.171958
3  3.947361
4 -8.116453, experiment_data=None, models=[])

Specify the experiment runner with the state field it will operate on, namely experiment_data. This experiment runner calculates a linear function, adds noise, assigns the value to the y column in a new DataFrame.

In [ ]:

Copied!





from autora.state import on_state
import numpy as np
import pandas as pd

def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):
    rng = np.random.default_rng(random_state)
    x = conditions["x"]
    noise = rng.normal(0, 1, len(x))
    y = c[0] + (c[1] * x) + noise
    observations = conditions.assign(y = y)
    return observations

experiment_runner = on_state(function=experiment_runner, output=["experiment_data"])
s_2 = experiment_runner(s_1, random_state=43)
s_2
from autora.state import on_state
import numpy as np
import pandas as pd

def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):
    rng = np.random.default_rng(random_state)
    x = conditions["x"]
    noise = rng.normal(0, 1, len(x))
    y = c[0] + (c[1] * x) + noise
    observations = conditions.assign(y = y)
    return observations

experiment_runner = on_state(function=experiment_runner, output=["experiment_data"])
s_2 = experiment_runner(s_1, random_state=43)
s_2

Specify a theorist, using a standard LinearRegression from scikit-learn. We do not need to define the state field that the theorists will operate on - it will automatically operate on the models field.

In [ ]:

Copied!





from sklearn.linear_model import LinearRegression
from autora.state import estimator_on_state

theorist = estimator_on_state(LinearRegression(fit_intercept=True))
s_3 = theorist(experiment_runner(experimentalist(s_2)))
s_3
from sklearn.linear_model import LinearRegression
from autora.state import estimator_on_state

theorist = estimator_on_state(LinearRegression(fit_intercept=True))
s_3 = theorist(experiment_runner(experimentalist(s_2)))
s_3

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  0.785198
1  9.834543
2  0.616326
3 -4.376617
4 -3.698967, experiment_data=          x          y
0  5.479121  24.160713
1 -1.222431  -2.211546
2  7.171958  30.102304
3  3.947361  16.880769
4 -8.116453 -32.457650
5  0.785198   3.193693
6  9.834543  41.207621
7  0.616326   3.879125
8 -4.376617 -14.668082
9 -3.698967 -11.416276, models=[LinearRegression()])

If we like, we can run the experimentalist, experiment_runner and theorist ten times.

In [ ]:

Copied!





s_ = s_0
for i in range(10):
    s_ = experimentalist(s_, random_state=180+i)
    s_ = experiment_runner(s_, random_state=2*180+i)
    s_ = theorist(s_)
s_ = s_0
for i in range(10):
    s_ = experimentalist(s_, random_state=180+i)
    s_ = experiment_runner(s_, random_state=2*180+i)
    s_ = theorist(s_)

The experiment_data has 50 entries (10 cycles and 5 samples per cycle):

In [ ]:

Copied!

s_.experiment_data
s_.experiment_data

Out[ ]:

	x	y
0	1.521127	8.997542
1	3.362120	15.339784
2	1.065391	5.938495
3	-5.844244	-21.453802
4	-6.444732	-24.975886
5	5.724585	24.929289
6	1.781805	9.555725
7	-1.015081	-2.632280
8	2.044083	12.001204
9	7.709324	30.806166
10	-6.680454	-24.846327
11	-3.630735	-11.346701
12	-0.498322	1.794183
13	-4.043702	-15.594289
14	5.772865	25.094876
15	9.028931	37.677228
16	8.052637	34.472556
17	3.774115	16.791553
18	-8.405662	-31.734315
19	5.433506	22.975112
20	-9.644367	-36.919598
21	1.673131	7.548614
22	7.600316	32.294054
23	4.354666	20.998850
24	6.047273	26.670616
25	-5.608438	-20.570161
26	0.733890	5.029705
27	-2.781912	-9.190651
28	-2.308464	-6.179939
29	-3.547105	-12.875100
30	0.945089	6.013183
31	2.694897	14.141356
32	7.445893	31.312279
33	4.423105	19.647015
34	2.200961	11.587911
35	-4.915881	-17.061782
36	-2.997968	-10.397403
37	0.099454	4.949820
38	-3.924786	-13.532503
39	7.050950	31.085545
40	-8.077780	-31.084307
41	4.391481	17.991533
42	6.749162	30.242121
43	2.246804	10.411612
44	4.477989	19.571584
45	-0.262734	1.181040
46	-7.187250	-26.718313
47	-0.790985	0.058681
48	6.545334	27.510641
49	-7.185274	-26.510872

The fitted coefficients are close to the original intercept = 2, gradient = 4

In [ ]:

Copied!

print(s_.models[-1].intercept_, s_.models[-1].coef_)
print(s_.models[-1].intercept_, s_.models[-1].coef_)

[2.08476524] [[4.00471062]]