Basic Introduction to Functions and States¶
Using the functions and objects in autora.state, we can build flexible pipelines and cycles which operate on state
objects.
Theoretical Overview¶
The fundamental idea is this:
- We define a "state" object $S$ which can be modified by components of
autora(theorist, experimentalist, experiment_runner), $\Delta S$. - A new state at some point $i+1$ is $$S_{i+1} = S_i + \Delta S_{i+1}$$
- The cycle state after $n$ steps is thus $$S_n = S_{0} + \sum^{n}_{i=1} \Delta S_{i}$$
To represent $S$ in code, you can use autora.state.State. To operate on these, we define functions.
Each component in an
autoracycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not. $$ f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}$$There is a wrapper function $w$ (
autora.state.wrap_to_use_state) which changes the signature of $f$ to require $S$ and aggregates the resulting $\Delta S_{i+1}$ $$w\left[f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}\right] \rightarrow \left[ f^\prime(S_i, a_0, ..., a_m) \rightarrow S_{i} + \Delta S_{i+1} = S_{i+1}\right]$$Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\prime$, the full
autoracycle can then be represented as: $$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0)))$$
There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we
can define a full autora cycle using python notation as shown in the following example.
.
- A new state at some point $i+1$ is $$S_{i+1} = S_i + \Delta S_{i+1}$$
- The cycle state after $n$ steps is thus $$S_n = S_{0} + \sum^{n}_{i=1} \Delta S_{i}$$
To represent $S$ and $\Delta S$ in code, you can use autora.state.State and autora.state.Delta
respectively. To operate on these, we define functions.
Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not. $$ f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}$$
There is a wrapper function $w$ (
autora.state.wrap_to_use_state) which changes the signature of $f$ to require $S$ and aggregates the resulting $\Delta S_{i+1}$ $$w\left[f(s_0, ..., s_n, a_0, ..., a_m) \rightarrow \Delta S_{i+1}\right] \rightarrow \left[ f^\prime(S_i, a_0, ..., a_m) \rightarrow S_{i} + \Delta S_{i+1} = S_{i+1}\right]$$Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\prime$, the full AER cycle can then be represented as: $$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0)))$$
There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we can define a full AER cycle using python notation as shown in the following example.
Example¶
First initialize the State. In this case, we use the pre-defined StandardState which implements the standard autora
naming convention.
There are two variables x with a range [-10, 10] and y with an unspecified range.
from autora.state import StandardState
from autora.variable import VariableCollection, Variable
s_0 = StandardState(
variables=VariableCollection(
independent_variables=[Variable("x", value_range=(-10, 10))],
dependent_variables=[Variable("y")]
)
)
Specify the experimentalist. Use a standard function random_pool.
This gets 5 independent random samples (by default, configurable using an argument)
from the value_range of the independent variables, and returns them in a DataFrame.
To make this work as a function on the State objects, we wrap it in the on_state function and determine the state field it will operate on, namely conditions.
from autora.experimentalist.random import random_pool
from autora.state import on_state
experimentalist = on_state(function=random_pool, output=["conditions"])
s_1 = experimentalist(s_0, random_state=42)
s_1
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=None, models=[])
Specify the experiment runner with the state field it will operate on, namely experiment_data. This experiment runner calculates a linear function, adds noise, assigns the value to the y column
in a new DataFrame.
from autora.state import on_state
import numpy as np
import pandas as pd
def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):
rng = np.random.default_rng(random_state)
x = conditions["x"]
noise = rng.normal(0, 1, len(x))
y = c[0] + (c[1] * x) + noise
observations = conditions.assign(y = y)
return observations
experiment_runner = on_state(function=experiment_runner, output=["experiment_data"])
s_2 = experiment_runner(s_1, random_state=43)
s_2
Specify a theorist, using a standard LinearRegression from scikit-learn. We do not need to define the state field that the theorists will operate on - it will automatically operate on the models field.
from sklearn.linear_model import LinearRegression
from autora.state import estimator_on_state
theorist = estimator_on_state(LinearRegression(fit_intercept=True))
s_3 = theorist(experiment_runner(experimentalist(s_2)))
s_3
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 0.785198 1 9.834543 2 0.616326 3 -4.376617 4 -3.698967, experiment_data= x y 0 5.479121 24.160713 1 -1.222431 -2.211546 2 7.171958 30.102304 3 3.947361 16.880769 4 -8.116453 -32.457650 5 0.785198 3.193693 6 9.834543 41.207621 7 0.616326 3.879125 8 -4.376617 -14.668082 9 -3.698967 -11.416276, models=[LinearRegression()])
If we like, we can run the experimentalist, experiment_runner and theorist ten times.
s_ = s_0
for i in range(10):
s_ = experimentalist(s_, random_state=180+i)
s_ = experiment_runner(s_, random_state=2*180+i)
s_ = theorist(s_)
The experiment_data has 50 entries (10 cycles and 5 samples per cycle):
s_.experiment_data
| x | y | |
|---|---|---|
| 0 | 1.521127 | 8.997542 |
| 1 | 3.362120 | 15.339784 |
| 2 | 1.065391 | 5.938495 |
| 3 | -5.844244 | -21.453802 |
| 4 | -6.444732 | -24.975886 |
| 5 | 5.724585 | 24.929289 |
| 6 | 1.781805 | 9.555725 |
| 7 | -1.015081 | -2.632280 |
| 8 | 2.044083 | 12.001204 |
| 9 | 7.709324 | 30.806166 |
| 10 | -6.680454 | -24.846327 |
| 11 | -3.630735 | -11.346701 |
| 12 | -0.498322 | 1.794183 |
| 13 | -4.043702 | -15.594289 |
| 14 | 5.772865 | 25.094876 |
| 15 | 9.028931 | 37.677228 |
| 16 | 8.052637 | 34.472556 |
| 17 | 3.774115 | 16.791553 |
| 18 | -8.405662 | -31.734315 |
| 19 | 5.433506 | 22.975112 |
| 20 | -9.644367 | -36.919598 |
| 21 | 1.673131 | 7.548614 |
| 22 | 7.600316 | 32.294054 |
| 23 | 4.354666 | 20.998850 |
| 24 | 6.047273 | 26.670616 |
| 25 | -5.608438 | -20.570161 |
| 26 | 0.733890 | 5.029705 |
| 27 | -2.781912 | -9.190651 |
| 28 | -2.308464 | -6.179939 |
| 29 | -3.547105 | -12.875100 |
| 30 | 0.945089 | 6.013183 |
| 31 | 2.694897 | 14.141356 |
| 32 | 7.445893 | 31.312279 |
| 33 | 4.423105 | 19.647015 |
| 34 | 2.200961 | 11.587911 |
| 35 | -4.915881 | -17.061782 |
| 36 | -2.997968 | -10.397403 |
| 37 | 0.099454 | 4.949820 |
| 38 | -3.924786 | -13.532503 |
| 39 | 7.050950 | 31.085545 |
| 40 | -8.077780 | -31.084307 |
| 41 | 4.391481 | 17.991533 |
| 42 | 6.749162 | 30.242121 |
| 43 | 2.246804 | 10.411612 |
| 44 | 4.477989 | 19.571584 |
| 45 | -0.262734 | 1.181040 |
| 46 | -7.187250 | -26.718313 |
| 47 | -0.790985 | 0.058681 |
| 48 | 6.545334 | 27.510641 |
| 49 | -7.185274 | -26.510872 |
The fitted coefficients are close to the original intercept = 2, gradient = 4
print(s_.models[-1].intercept_, s_.models[-1].coef_)
[2.08476524] [[4.00471062]]