The State
mechanism¶
A State
is an object representing data from an experiment, like the conditions, observed experiment data and models.
In the AutoRA framework, experimentalists, experiment runners and theorists are functions which
- operate on
States
and - return
States
.
The autora.state
submodule provides classes and functions to help build these functions.
Core Principle: every procedure accepts a State
and returns a State
¶
The AutoRA State
mechanism is an implementation of the functional programming paradigm. It distinguishes between:
- Data – stored as an immutable
State
- Procedures – functions which act on
State
objects to add new data and return a newState
.
Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:
Procedure | Data |
---|---|
Experimentalist | Conditions |
Experiment Runner | Experiment Data |
Theorist | Model |
The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:
- Takes in existing Data in a
State
$S$ - Adds new data $\Delta S$
- Returns an updated
State
$S^\prime$
$$ \begin{aligned} f(S) &= S + \Delta S \\ &= S^\prime \end{aligned} $$
AutoRA includes:
- Classes to represent the Data $S$ – the
State
object (and the derivedStandardState
– a pre-defined version with the common fields needed for cyclical experiments) - Functions to make it easier to write procedures of the form $f(S) = S^\prime$
from dataclasses import dataclass, field
import numpy as np
import pandas as pd
import autora.state
from autora.variable import VariableCollection, Variable
State
objects¶
State
objects contain metadata describing an experiment, and the data gathered during an experiment. Any State
object used in an AutoRA cycle will be a subclass of the autora.state.State
, with the necessary fields specified.
(The autora.state.StandardState
provides some sensible defaults.)
@dataclass(frozen=True)
class BasicState(autora.state.State):
data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "extend"})
s = BasicState()
Because it is a python dataclass, the State
fields can be accessed using attribute notation, for example:
s.data # an empty DataFrame with a column "x"
State
objects can be updated by adding Delta
objects. A Delta
represents new data, and is combined with the
existing data in the State
object. The State
itself is immutable by design, so adding a Delta
to it creates a new
State
.
s + autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]}))
BasicState(data= x y 0 1 1)
When carrying out this "addition", s
:
inspects the
Delta
it has been passed and finds any field names matching fields ons
, in this casedata
.For each matching field it combines the data in a way determined by the field's metadata. The key options are:
- "replace" means that the data in the
Delta
object completely replace the data in theState
, - "extend" means that the data in the
Delta
object are combined – for pandas DataFrames this means that the new data are concatenated to the bottom of the existing DataFrame.
For full details on which options are available, see the documentation for the
autora.state
module.- "replace" means that the data in the
(s +
autora.state.Delta(data=pd.DataFrame({"x":[1], "y":[1]})) +
autora.state.Delta(data=pd.DataFrame({"x":[2], "y":[2]}))
).data # Access just the experiment_data on the updated State
x | y | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 2 |
StandardState
¶
For typical AutoRA experiments, you can use the autora.state.StandardState
object, which has fields for variables,
conditions, experiment data and models. You can initialize a StandardState
object like this:
s_0 = autora.state.StandardState(
variables=VariableCollection(
independent_variables=[Variable("x", value_range=(-10, 10))],
dependent_variables=[Variable("y")]
),
conditions=pd.DataFrame({"x":[]}),
experiment_data=pd.DataFrame({"x":[], "y":[]}),
models=[]
)
Making a function of the correct form¶
There are several equivalent ways to make a function of the form $f(S) = S^\prime$. These are (from simplest but most restrictive, to most complex but with the greatest flexibility):
- Use the
autora.state.on_state
decorator - Modify
generate_conditions
to accept aStandardState
and update this with aDelta
There are also special cases, like the autora.state.estimator_on_state
wrapper for scikit-learn
estimators.
Say you have a function to generate new experimental conditions, given some variables.
def generate_conditions(variables, num_samples=5, random_state=42):
rng = np.random.default_rng(random_state) # Initialize a random number generator
conditions = pd.DataFrame() # Create a DataFrame to hold the results
for iv in variables.independent_variables: # Loop through the independent variables
c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range
conditions[iv.name] = c # - Save the new values to the DataFrame
return conditions
We'll look at each of the ways you can make this into a function of the required form.
Use the autora.state.on_state
decorator¶
autora.state.on_state
is a wrapper for functions which allows them to accept State
objects as the first argument.
The most concise way to use it is as a decorator on the function where it is defined. You can specify how the
returned values should be mapped to fields on the State
using the @autora.state.on_state(output=...)
argument.
@autora.state.on_state(output=["conditions"])
def generate_conditions(variables, num_samples=5, random_state=42):
rng = np.random.default_rng(random_state) # Initialize a random number generator
conditions = pd.DataFrame() # Create a DataFrame to hold the results
for iv in variables.independent_variables: # Loop through the independent variables
c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range
conditions[iv.name] = c # - Save the new values to the DataFrame
return conditions
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Fully equivalently, you can modify generate_conditions
to return a Delta of values with the appropriate field
names from State
:
@autora.state.on_state
def generate_conditions(variables, num_samples=5, random_state=42):
rng = np.random.default_rng(random_state) # Initialize a random number generator
conditions = pd.DataFrame() # Create a DataFrame to hold the results
for iv in variables.independent_variables: # Loop through the independent variables
c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range
conditions[iv.name] = c # - Save the new values to the DataFrame
return autora.state.Delta(conditions=conditions) # Return a Delta with the appropriate names
# return {"conditions": conditions} # Returning a dictionary is equivalent
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Deep dive: autora.state_on_state
¶
The decorator notation is equivalent to the following:
def generate_conditions_inner(variables, num_samples=5, random_state=42):
rng = np.random.default_rng(random_state) # Initialize a random number generator
result = pd.DataFrame() # Create a DataFrame to hold the results
for iv in variables.independent_variables: # Loop through the independent variables
c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range
result[iv.name] = c # - Save the new values to the DataFrame
return result
generate_conditions = autora.state.on_state(generate_conditions_inner, output=["conditions"])
# Example
generate_conditions(s_0, random_state=180)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 1.521127 1 3.362120 2 1.065391 3 -5.844244 4 -6.444732, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
During the generate_conditions(s_0, random_state=180)
call, autora.state.on_state
does the following:
- Inspects the signature of
generate_conditions_inner
to see which variables are required – in this case:variables
,num_samples
andrandom_state
.
- Looks for fields with those names on
s_0
:- Finds a field called
variables
.
- Finds a field called
- Calls
generate_conditions_inner
with those fields as arguments, plus any arguments specified in thegenerate_conditions
call (here justrandom_state
) - Converts the returned value
result
intoDelta(conditions=result)
using the name specified inoutput=["conditions"]
- Returns
s_0 + Delta(conditions=result)
Modify generate_conditions
to accept a StandardState
and update this with a Delta
¶
Fully equivalently to using the autora.state.on_state
wrapper, you can construct a function which takes and returns
State
objects.
def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):
rng = np.random.default_rng(random_state) # Initialize a random number generator
conditions = pd.DataFrame() # Create a DataFrame to hold the results
for iv in state.variables.independent_variables: # Loop through the independent variables
c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range
conditions[iv.name] = c # - Save the new values to the DataFrame
delta = autora.state.Delta(conditions=conditions) # Construct a new Delta representing the updated data
new_state = state + delta # Construct a new state, "adding" the Delta
return new_state
# Example
generate_conditions(s_0)
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x 0 5.479121 1 -1.222431 2 7.171958 3 3.947361 4 -8.116453, experiment_data=Empty DataFrame Columns: [x, y] Index: [], models=[])
Special case: autora.state.estimator_on_state
for scikit-learn
estimators¶
The "theorist" component in an AutoRA cycle is often a scikit-learn
compatible estimator which implements a curve
fitting function like a linear, logistic or symbolic regression. scikit-learn
estimators are classes, and they have
a specific wrapper: autora.state.estimator_on_state
, used as follows:
from sklearn.linear_model import LinearRegression
estimator = LinearRegression(fit_intercept=True) # Initialize the regressor with all its parameters
theorist = autora.state.estimator_on_state(estimator) # Wrap the estimator
# Example
variables = s_0.variables # Reuse the variables from before
xs = np.linspace(-10, 10, 101) # Make an array of x-values
noise = np.random.default_rng(179).normal(0., 0.5, xs.shape) # Gaussian noise
ys = (3.5 * xs + 2. + noise) # Calculate y = 3.5 x + 2 + noise
s_1 = autora.state.StandardState( # Initialize the State with those data
variables=variables,
experiment_data=pd.DataFrame({"x":xs, "y":ys}),
)
s_1_prime = theorist(s_1) # Run the theorist
print(f"Returned models: "
f"{s_1_prime.models}")
print(f"Last model's coefficients: "
f"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}")
Returned models: [LinearRegression()] Last model's coefficients: y = [3.49729147] x + [1.99930059]
During the theorist(s_1)
call, autora.state.estimator_on_state
does the following:
- Gets the names of the independent and dependent variables from the
s_1.variables
- Gathers the values of those variables from
s_1.experiment_data
- Passes those values to the
LinearRegression().fit(x, y)
method - Constructs
Delta(models=[LinearRegression()])
with the fitted regressor - Returns
s_1 + Delta(models=[LinearRegression()])