The `State` mechanism¶

A State is an object representing data from an experiment, like the conditions, observed experiment data and models. In the AutoRA framework, experimentalists, experiment runners and theorists are functions which

operate on States and
return States.

The autora.state submodule provides classes and functions to help build these functions.

Core Principle: every procedure accepts a `State` and returns a `State`¶

The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:

Data – stored as an immutable State
Procedures – functions which act on State objects to add new data and return a new State.

Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:

Procedure	Data
Experimentalist	Conditions
Experiment Runner	Experiment Data
Theorist	Model

The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:

Takes in existing Data in a State $S$
Adds new data $\Delta S$
Returns an updated State $S^\prime$

$$ \begin{aligned} f(S) &= S + \Delta S \\ &= S^\prime \end{aligned} $$

AutoRA includes:

Classes to represent the Data $S$ – the State object (and the derived StandardState – a pre-defined version with the common fields needed for cyclical experiments)
Functions to make it easier to write procedures of the form $f(S) = S^\prime$

In [ ]:

Copied!





from dataclasses import dataclass, field

import numpy as np
import pandas as pd
import autora.state
from autora.variable import VariableCollection, Variable
from dataclasses import dataclass, field

import numpy as np
import pandas as pd
import autora.state
from autora.variable import VariableCollection, Variable

`State` objects¶

State objects contain metadata describing an experiment, and the data gathered during an experiment. Any State object used in an AutoRA cycle will be a subclass of the autora.state.State, with the necessary fields specified. (The autora.state.StandardState provides some sensible defaults.)

In [ ]:

Copied!

@dataclass(frozen=True)
class BasicState(autora.state.State):
   data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "extend"})
   
s = BasicState()
@dataclass(frozen=True)
class BasicState(autora.state.State):
   data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "extend"})
   
s = BasicState()

Because it is a python dataclass, the State fields can be accessed using attribute notation, for example:

In [ ]:

Copied!

s.data  # an empty DataFrame with a column "x"
s.data  # an empty DataFrame with a column "x"

Out[ ]:

State objects can be updated by adding Delta objects. A Delta represents new data, and is combined with the existing data in the State object. The State itself is immutable by design, so adding a Delta to it creates a new State.

In [ ]:

Copied!

s + autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]}))
s + autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]}))

Out[ ]:

BasicState(data=   x  y
0  1  1)

When carrying out this "addition", s:

inspects the Delta it has been passed and finds any field names matching fields on s, in this case data.
For each matching field it combines the data in a way determined by the field's metadata. The key options are:
- "replace" means that the data in the Delta object completely replace the data in the State,
- "extend" means that the data in the Delta object are combined – for pandas DataFrames this means that the new data are concatenated to the bottom of the existing DataFrame.
For full details on which options are available, see the documentation for the autora.state module.

In [ ]:

Copied!





(s + 
 autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]})) + 
 autora.state.Delta(data=pd.DataFrame({"x":[2], "y":[2]}))
 ).data  # Access just the experiment_data on the updated State
(s + 
 autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]})) + 
 autora.state.Delta(data=pd.DataFrame({"x":[2], "y":[2]}))
 ).data  # Access just the experiment_data on the updated State

Out[ ]:

	x	y
0	1	1
1	2	2

`StandardState`¶

For typical AutoRA experiments, you can use the autora.state.StandardState object, which has fields for variables, conditions, experiment data and models. You can initialize a StandardState object like this:

In [ ]:

Copied!





s_0 = autora.state.StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    ),
    conditions=pd.DataFrame({"x":[]}),
    experiment_data=pd.DataFrame({"x":[], "y":[]}),
    models=[]
)
s_0 = autora.state.StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    ),
    conditions=pd.DataFrame({"x":[]}),
    experiment_data=pd.DataFrame({"x":[], "y":[]}),
    models=[]
)

Making a function of the correct form¶

There are several equivalent ways to make a function of the form $f(S) = S^\prime$. These are (from simplest but most restrictive, to most complex but with the greatest flexibility):

Use the autora.state.on_state decorator
Modify generate_conditions to accept a StandardState and update this with a Delta

There are also special cases, like the autora.state.estimator_on_state wrapper for scikit-learn estimators.

Say you have a function to generate new experimental conditions, given some variables.

In [ ]:

Copied!





def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions

We'll look at each of the ways you can make this into a function of the required form.

Use the `autora.state.on_state` decorator¶

autora.state.on_state is a wrapper for functions which allows them to accept State objects as the first argument.

The most concise way to use it is as a decorator on the function where it is defined. You can specify how the returned values should be mapped to fields on the State using the @autora.state.on_state(output=...) argument.

In [ ]:

Copied!





@autora.state.on_state(output=["conditions"])
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions

# Example
generate_conditions(s_0)
@autora.state.on_state(output=["conditions"])
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return conditions

# Example
generate_conditions(s_0)

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.479121
1 -1.222431
2  7.171958
3  3.947361
4 -8.116453, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

Fully equivalently, you can modify generate_conditions to return a Delta of values with the appropriate field names from State:

In [ ]:

Copied!





@autora.state.on_state
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return autora.state.Delta(conditions=conditions)        # Return a Delta with the appropriate names
    # return {"conditions": conditions}                     # Returning a dictionary is equivalent

# Example
generate_conditions(s_0)
@autora.state.on_state
def generate_conditions(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    return autora.state.Delta(conditions=conditions)        # Return a Delta with the appropriate names
    # return {"conditions": conditions}                     # Returning a dictionary is equivalent

# Example
generate_conditions(s_0)

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.479121
1 -1.222431
2  7.171958
3  3.947361
4 -8.116453, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

Deep dive: `autora.state_on_state`¶

The decorator notation is equivalent to the following:

In [ ]:

Copied!





def generate_conditions_inner(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    result = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        result[iv.name] = c                             #  - Save the new values to the DataFrame
    return result

generate_conditions = autora.state.on_state(generate_conditions_inner, output=["conditions"])

# Example
generate_conditions(s_0, random_state=180)
def generate_conditions_inner(variables, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    result = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in variables.independent_variables:              # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        result[iv.name] = c                             #  - Save the new values to the DataFrame
    return result

generate_conditions = autora.state.on_state(generate_conditions_inner, output=["conditions"])

# Example
generate_conditions(s_0, random_state=180)

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  1.521127
1  3.362120
2  1.065391
3 -5.844244
4 -6.444732, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

During the generate_conditions(s_0, random_state=180) call, autora.state.on_state does the following:

Inspects the signature of generate_conditions_inner to see which variables are required – in this case:
- variables,
- num_samples and
- random_state.
Looks for fields with those names on s_0:
- Finds a field called variables.
Calls generate_conditions_inner with those fields as arguments, plus any arguments specified in the generate_conditions call (here just random_state)
Converts the returned value result into Delta(conditions=result) using the name specified in output=["conditions"]
Returns s_0 + Delta(conditions=result)

Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`¶

Fully equivalently to using the autora.state.on_state wrapper, you can construct a function which takes and returns State objects.

In [ ]:

Copied!





def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in state.variables.independent_variables:        # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    delta = autora.state.Delta(conditions=conditions)       # Construct a new Delta representing the updated data
    new_state = state + delta                               # Construct a new state, "adding" the Delta
    return new_state

# Example
generate_conditions(s_0)
def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):
    rng = np.random.default_rng(random_state)               # Initialize a random number generator
    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  
    for iv in state.variables.independent_variables:        # Loop through the independent variables
        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range
        conditions[iv.name] = c                             #  - Save the new values to the DataFrame
    delta = autora.state.Delta(conditions=conditions)       # Construct a new Delta representing the updated data
    new_state = state + delta                               # Construct a new state, "adding" the Delta
    return new_state

# Example
generate_conditions(s_0)

Out[ ]:

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.479121
1 -1.222431
2  7.171958
3  3.947361
4 -8.116453, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators¶

The "theorist" component in an AutoRA cycle is often a scikit-learn compatible estimator which implements a curve fitting function like a linear, logistic or symbolic regression. scikit-learn estimators are classes, and they have a specific wrapper: autora.state.estimator_on_state, used as follows:

In [ ]:

Copied!





from sklearn.linear_model import LinearRegression


estimator = LinearRegression(fit_intercept=True)       # Initialize the regressor with all its parameters
theorist = autora.state.estimator_on_state(estimator)  # Wrap the estimator


# Example
variables = s_0.variables          # Reuse the variables from before 
xs = np.linspace(-10, 10, 101)     # Make an array of x-values 
noise = np.random.default_rng(179).normal(0., 0.5, xs.shape)  # Gaussian noise
ys = (3.5 * xs + 2. + noise)       # Calculate y = 3.5 x + 2 + noise  

s_1 = autora.state.StandardState(  # Initialize the State with those data
    variables=variables,
    experiment_data=pd.DataFrame({"x":xs, "y":ys}),
)
s_1_prime = theorist(s_1)         # Run the theorist
print(f"Returned models: "
      f"{s_1_prime.models}")      
print(f"Last model's coefficients: "
      f"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}")
from sklearn.linear_model import LinearRegression


estimator = LinearRegression(fit_intercept=True)       # Initialize the regressor with all its parameters
theorist = autora.state.estimator_on_state(estimator)  # Wrap the estimator


# Example
variables = s_0.variables          # Reuse the variables from before 
xs = np.linspace(-10, 10, 101)     # Make an array of x-values 
noise = np.random.default_rng(179).normal(0., 0.5, xs.shape)  # Gaussian noise
ys = (3.5 * xs + 2. + noise)       # Calculate y = 3.5 x + 2 + noise  

s_1 = autora.state.StandardState(  # Initialize the State with those data
    variables=variables,
    experiment_data=pd.DataFrame({"x":xs, "y":ys}),
)
s_1_prime = theorist(s_1)         # Run the theorist
print(f"Returned models: "
      f"{s_1_prime.models}")      
print(f"Last model's coefficients: "
      f"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}")

Returned models: [LinearRegression()]
Last model's coefficients: y = [3.49729147] x + [1.99930059]

During the theorist(s_1) call, autora.state.estimator_on_state does the following:

Gets the names of the independent and dependent variables from the s_1.variables
Gathers the values of those variables from s_1.experiment_data
Passes those values to the LinearRegression().fit(x, y) method
Constructs Delta(models=[LinearRegression()]) with the fitted regressor
Returns s_1 + Delta(models=[LinearRegression()])

The State mechanism¶

Core Principle: every procedure accepts a State and returns a State¶

State objects¶

StandardState¶

Making a function of the correct form¶

Use the autora.state.on_state decorator¶

Deep dive: autora.state_on_state¶

Modify generate_conditions to accept a StandardState and update this with a Delta¶

Special case: autora.state.estimator_on_state for scikit-learn estimators¶

The `State` mechanism¶

Core Principle: every procedure accepts a `State` and returns a `State`¶

`State` objects¶

`StandardState`¶

Use the `autora.state.on_state` decorator¶

Deep dive: `autora.state_on_state`¶

Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`¶

Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators¶