The State mechanism¶
A State is an object representing data from an experiment, like the conditions, observed experiment data and models.
In the AutoRA framework, experimentalists, experiment runners and theorists are functions which
- operate on 
Statesand - return 
States. 
The autora.state submodule provides classes and functions to help build these functions.
Core Principle: every procedure accepts a State and returns a State¶
The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:
- Data – stored as an immutable 
State - Procedures – functions which act on 
Stateobjects to add new data and return a newState. 
Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:
| Procedure | Data | 
|---|---|
| Experimentalist | Conditions | 
| Experiment Runner | Experiment Data | 
| Theorist | Model | 
The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:
- Takes in existing Data in a 
State$S$ - Adds new data $\Delta S$
 - Returns an updated 
State$S^\prime$ 
$$ \begin{aligned} f(S) &= S + \Delta S \\ &= S^\prime \end{aligned} $$
AutoRA includes:
- Classes to represent the Data $S$ – the 
Stateobject (and the derivedStandardState– a pre-defined version with the common fields needed for cyclical experiments) - Functions to make it easier to write procedures of the form $f(S) = S^\prime$
 
from dataclasses import dataclass, field
import numpy as np
import pandas as pd
import autora.state
from autora.variable import VariableCollection, Variable
State objects¶
State objects contain metadata describing an experiment, and the data gathered during an experiment. Any State
object used in an AutoRA cycle will be a subclass of the autora.state.State, with the necessary fields specified.
(The autora.state.StandardState provides some sensible defaults.)
@dataclass(frozen=True)
class BasicState(autora.state.State):
   data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "extend"})
   
s = BasicState()
Because it is a python dataclass, the State fields can be accessed using attribute notation, for example:
s.data  # an empty DataFrame with a column "x"
State objects can be updated by adding Delta objects. A Delta represents new data, and is combined with the
existing data in the State object. The State itself is immutable by design, so adding a Delta to it creates a new
State.
s + autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]}))
BasicState(data= x y 0 1 1)
When carrying out this "addition", s:
inspects the
Deltait has been passed and finds any field names matching fields ons, in this casedata.For each matching field it combines the data in a way determined by the field's metadata. The key options are:
- "replace" means that the data in the 
Deltaobject completely replace the data in theState, - "extend" means that the data in the 
Deltaobject are combined – for pandas DataFrames this means that the new data are concatenated to the bottom of the existing DataFrame. 
For full details on which options are available, see the documentation for the
autora.statemodule.- "replace" means that the data in the 
 
(s + 
 autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]})) + 
 autora.state.Delta(data=pd.DataFrame({"x":[2], "y":[2]}))
 ).data  # Access just the experiment_data on the updated State
| x | y | |
|---|---|---|
| 0 | 1 | 1 | 
| 1 | 2 | 2 | 
StandardState¶
For typical AutoRA experiments, you can use the autora.state.StandardState object, which has fields for variables,
conditions, experiment data and models. You can initialize a StandardState object like this:
s_0 = autora.state.StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    ),
    conditions=pd.DataFrame({"x":[]}),
    experiment_data=pd.DataFrame({"x":[], "y":[]}),
    models=[]
)
Making a function of the correct form¶
There are several equivalent ways to make a function of the form $f(S) = S^\prime$. These are (from simplest but most restrictive, to most complex but with the greatest flexibility):
- Use the 
autora.state.on_statedecorator - Modify 
generate_conditionsto accept aStandardStateand update this with aDelta 
There are also special cases, like the autora.state.estimator_on_state wrapper for scikit-learn estimators.
Say you have a function to generate new experimental conditions, given some variables.
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions
We'll look at each of the ways you can make this into a function of the required form.
Use the autora.state.on_state decorator¶
autora.state.on_state is a wrapper for functions which allows them to accept State objects as the first argument.
The most concise way to use it is as a decorator on the function where it is defined. You can specify how the
returned values should be mapped to fields on the State using the @autora.state.on_state(output=...) argument.
@autora.state.on_state(output=["conditions"])
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Fully equivalently, you can modify generate_conditions to return a Delta of values with the appropriate field
names from State:
@autora.state.on_state
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return autora.state.Delta(conditions=conditions)        # Return a Delta with the appropriate names
    # return {"conditions": conditions}                     # Returning a dictionary is equivalent
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Deep dive: autora.state_on_state¶
The decorator notation is equivalent to the following:
def generate_conditions_inner(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    result = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        result[iv.name] = c                             #  - Save the new values to the DataFrame
    return result
generate_conditions = autora.state.on_state(generate_conditions_inner, output=["conditions"])
# Example
generate_conditions(s_0, random_state=180)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 1.521127 1 3.362120 2 1.065391 3 -5.844244 4 -6.444732, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
During the generate_conditions(s_0, random_state=180) call, autora.state.on_state does the following:
- Inspects the signature of 
generate_conditions_innerto see which variables are required – in this case:variables,num_samplesandrandom_state.
 - Looks for fields with those names on 
s_0:- Finds a field called 
variables. 
 - Finds a field called 
 - Calls 
generate_conditions_innerwith those fields as arguments, plus any arguments specified in thegenerate_conditionscall (here justrandom_state) - Converts the returned value 
resultintoDelta(conditions=result)using the name specified inoutput=["conditions"] - Returns 
s_0 + Delta(conditions=result) 
Modify generate_conditions to accept a StandardState and update this with a Delta¶
Fully equivalently to using the autora.state.on_state wrapper, you can construct a function which takes and returns
State objects.
def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in state.variables.independent_variables:        # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    delta = autora.state.Delta(conditions=conditions)       # Construct a new Delta representing the updated data
    new_state = state + delta                               # Construct a new state, "adding" the Delta
    return new_state
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Special case: autora.state.estimator_on_state for scikit-learn estimators¶
The "theorist" component in an AutoRA cycle is often a scikit-learn compatible estimator which implements a curve
fitting function like a linear, logistic or symbolic regression. scikit-learn estimators are classes, and they have
a specific wrapper: autora.state.estimator_on_state, used as follows:
from sklearn.linear_model import LinearRegression
estimator = LinearRegression(fit_intercept=True)       # Initialize the regressor with all its parameters
theorist = autora.state.estimator_on_state(estimator)  # Wrap the estimator
# Example
variables = s_0.variables          # Reuse the variables from before 
xs = np.linspace(-10, 10, 101)     # Make an array of x-values 
noise = np.random.default_rng(179).normal(0., 0.5, xs.shape)  # Gaussian noise
ys = (3.5 * xs + 2. + noise)       # Calculate y = 3.5 x + 2 + noise  
s_1 = autora.state.StandardState(  # Initialize the State with those data
    variables=variables,
    experiment_data=pd.DataFrame({"x":xs, "y":ys}),
)
s_1_prime = theorist(s_1)         # Run the theorist
print(f"Returned models: "
      f"{s_1_prime.models}")      
print(f"Last model's coefficients: "
      f"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}")
Returned models: [LinearRegression()] Last model's coefficients: y = [3.49729147] x + [1.99930059]
During the theorist(s_1) call, autora.state.estimator_on_state does the following:
- Gets the names of the independent and dependent variables from the 
s_1.variables - Gathers the values of those variables from 
s_1.experiment_data - Passes those values to the 
LinearRegression().fit(x, y)method - Constructs 
Delta(models=[LinearRegression()])with the fitted regressor - Returns 
s_1 + Delta(models=[LinearRegression()])