{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The `State` mechanism"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `State` is an object representing data from an experiment, like the conditions, observed experiment data and models. \n",
"In the AutoRA framework, experimentalists, experiment runners and theorists are functions which \n",
"- operate on `States` and \n",
"- return `States`.\n",
"\n",
"The `autora.state` submodule provides classes and functions to help build these functions. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Core Principle: every procedure accepts a `State` and returns a `State`\n",
"\n",
"The AutoRA `State` mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
"- Data – stored as an immutable `State`\n",
"- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n",
"\n",
"Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:\n",
"\n",
"| Procedure | Data |\n",
"|-------------------|-----------------|\n",
"| Experimentalist | Conditions |\n",
"| Experiment Runner | Experiment Data |\n",
"| Theorist | Model |\n",
"\n",
"The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n",
"- Takes in existing Data in a `State` $S$\n",
"- Adds new data $\\Delta S$\n",
"- Returns an updated `State` $S^\\prime$ \n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"f(S) &= S + \\Delta S \\\\\n",
" &= S^\\prime\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"AutoRA includes:\n",
"- Classes to represent the Data $S$ – the `State` object (and the derived `StandardState` – a pre-defined version \n",
"with the common fields needed for cyclical experiments) \n",
"- Functions to make it easier to write procedures of the form $f(S) = S^\\prime$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass, field\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import autora.state\n",
"from autora.variable import VariableCollection, Variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `State` objects\n",
"\n",
"`State` objects contain metadata describing an experiment, and the data gathered during an experiment. Any `State` \n",
"object used in an AutoRA cycle will be a subclass of the `autora.state.State`, with the necessary fields specified. \n",
"(The `autora.state.StandardState` provides some sensible defaults.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@dataclass(frozen=True)\n",
"class BasicState(autora.state.State):\n",
" data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={\"delta\": \"extend\"})\n",
" \n",
"s = BasicState()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because it is a python dataclass, the `State` fields can be accessed using attribute notation, for example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: []\n",
"Index: []"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.data # an empty DataFrame with a column \"x\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`State` objects can be updated by adding `Delta` objects. A `Delta` represents new data, and is combined with the \n",
"existing data in the `State` object. The `State` itself is immutable by design, so adding a `Delta` to it creates a new \n",
"`State`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BasicState(data= x y\n",
"0 1 1)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s + autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]}))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When carrying out this \"addition\", `s`: \n",
"- inspects the `Delta` it has been passed and finds any field names matching fields on `s`, in this case \n",
"`data`.\n",
"- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n",
" - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n",
" - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n",
" data are concatenated to the bottom of the existing DataFrame.\n",
" \n",
" For full details on which options are available, see the documentation for the `autora.state` module. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" x | \n",
" y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" x y\n",
"0 1 1\n",
"1 2 2"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(s + \n",
" autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]})) + \n",
" autora.state.Delta(data=pd.DataFrame({\"x\":[2], \"y\":[2]}))\n",
" ).data # Access just the experiment_data on the updated State"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `StandardState`\n",
"\n",
"For typical AutoRA experiments, you can use the `autora.state.StandardState` object, which has fields for variables, \n",
"conditions, experiment data and models. You can initialize a `StandardState` object like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s_0 = autora.state.StandardState(\n",
" variables=VariableCollection(\n",
" independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
" dependent_variables=[Variable(\"y\")]\n",
" ),\n",
" conditions=pd.DataFrame({\"x\":[]}),\n",
" experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n",
" models=[]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making a function of the correct form\n",
"\n",
"There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n",
"simplest but most restrictive, to most complex but with the greatest flexibility):\n",
"- Use the `autora.state.on_state` decorator\n",
"- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`\n",
"\n",
"There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators. \n",
"\n",
"Say you have a function to generate new experimental conditions, given some variables. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def generate_conditions(variables, num_samples=5, random_state=42):\n",
" rng = np.random.default_rng(random_state) # Initialize a random number generator\n",
" conditions = pd.DataFrame() # Create a DataFrame to hold the results \n",
" for iv in variables.independent_variables: # Loop through the independent variables\n",
" c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n",
" conditions[iv.name] = c # - Save the new values to the DataFrame\n",
" return conditions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll look at each of the ways you can make this into a function of the required form. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use the `autora.state.on_state` decorator\n",
"\n",
"`autora.state.on_state` is a wrapper for functions which allows them to accept `State` objects as the first argument.\n",
"\n",
"The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n",
"returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n",
"0 5.479121\n",
"1 -1.222431\n",
"2 7.171958\n",
"3 3.947361\n",
"4 -8.116453, experiment_data=Empty DataFrame\n",
"Columns: [x, y]\n",
"Index: [], models=[])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@autora.state.on_state(output=[\"conditions\"])\n",
"def generate_conditions(variables, num_samples=5, random_state=42):\n",
" rng = np.random.default_rng(random_state) # Initialize a random number generator\n",
" conditions = pd.DataFrame() # Create a DataFrame to hold the results \n",
" for iv in variables.independent_variables: # Loop through the independent variables\n",
" c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n",
" conditions[iv.name] = c # - Save the new values to the DataFrame\n",
" return conditions\n",
"\n",
"# Example\n",
"generate_conditions(s_0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fully equivalently, you can modify `generate_conditions` to return a Delta of values with the appropriate field \n",
"names from `State`: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n",
"0 5.479121\n",
"1 -1.222431\n",
"2 7.171958\n",
"3 3.947361\n",
"4 -8.116453, experiment_data=Empty DataFrame\n",
"Columns: [x, y]\n",
"Index: [], models=[])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@autora.state.on_state\n",
"def generate_conditions(variables, num_samples=5, random_state=42):\n",
" rng = np.random.default_rng(random_state) # Initialize a random number generator\n",
" conditions = pd.DataFrame() # Create a DataFrame to hold the results \n",
" for iv in variables.independent_variables: # Loop through the independent variables\n",
" c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n",
" conditions[iv.name] = c # - Save the new values to the DataFrame\n",
" return autora.state.Delta(conditions=conditions) # Return a Delta with the appropriate names\n",
" # return {\"conditions\": conditions} # Returning a dictionary is equivalent\n",
"\n",
"# Example\n",
"generate_conditions(s_0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Deep dive: `autora.state_on_state`\n",
"The decorator notation is equivalent to the following:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n",
"0 1.521127\n",
"1 3.362120\n",
"2 1.065391\n",
"3 -5.844244\n",
"4 -6.444732, experiment_data=Empty DataFrame\n",
"Columns: [x, y]\n",
"Index: [], models=[])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def generate_conditions_inner(variables, num_samples=5, random_state=42):\n",
" rng = np.random.default_rng(random_state) # Initialize a random number generator\n",
" result = pd.DataFrame() # Create a DataFrame to hold the results \n",
" for iv in variables.independent_variables: # Loop through the independent variables\n",
" c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n",
" result[iv.name] = c # - Save the new values to the DataFrame\n",
" return result\n",
"\n",
"generate_conditions = autora.state.on_state(generate_conditions_inner, output=[\"conditions\"])\n",
"\n",
"# Example\n",
"generate_conditions(s_0, random_state=180)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"During the `generate_conditions(s_0, random_state=180)` call, `autora.state.on_state` does the following:\n",
"- Inspects the signature of `generate_conditions_inner` to see which variables are required – in this case:\n",
" - `variables`, \n",
" - `num_samples` and \n",
" - `random_state`.\n",
"- Looks for fields with those names on `s_0`:\n",
" - Finds a field called `variables`.\n",
"- Calls `generate_conditions_inner` with those fields as arguments, plus any arguments specified in the \n",
"`generate_conditions` call (here just `random_state`)\n",
"- Converts the returned value `result` into `Delta(conditions=result)` using the name specified in `output=[\"conditions\"]`\n",
"- Returns `s_0 + Delta(conditions=result)`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fully equivalently to using the `autora.state.on_state` wrapper, you can construct a function which takes and returns \n",
"`State` objects. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n",
"0 5.479121\n",
"1 -1.222431\n",
"2 7.171958\n",
"3 3.947361\n",
"4 -8.116453, experiment_data=Empty DataFrame\n",
"Columns: [x, y]\n",
"Index: [], models=[])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):\n",
" rng = np.random.default_rng(random_state) # Initialize a random number generator\n",
" conditions = pd.DataFrame() # Create a DataFrame to hold the results \n",
" for iv in state.variables.independent_variables: # Loop through the independent variables\n",
" c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n",
" conditions[iv.name] = c # - Save the new values to the DataFrame\n",
" delta = autora.state.Delta(conditions=conditions) # Construct a new Delta representing the updated data\n",
" new_state = state + delta # Construct a new state, \"adding\" the Delta\n",
" return new_state\n",
"\n",
"# Example\n",
"generate_conditions(s_0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators\n",
"\n",
"The \"theorist\" component in an AutoRA cycle is often a `scikit-learn` compatible estimator which implements a curve \n",
"fitting function like a linear, logistic or symbolic regression. `scikit-learn` estimators are classes, and they have\n",
" a specific wrapper: `autora.state.estimator_on_state`, used as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Returned models: [LinearRegression()]\n",
"Last model's coefficients: y = [3.49729147] x + [1.99930059]\n"
]
}
],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"\n",
"\n",
"estimator = LinearRegression(fit_intercept=True) # Initialize the regressor with all its parameters\n",
"theorist = autora.state.estimator_on_state(estimator) # Wrap the estimator\n",
"\n",
"\n",
"# Example\n",
"variables = s_0.variables # Reuse the variables from before \n",
"xs = np.linspace(-10, 10, 101) # Make an array of x-values \n",
"noise = np.random.default_rng(179).normal(0., 0.5, xs.shape) # Gaussian noise\n",
"ys = (3.5 * xs + 2. + noise) # Calculate y = 3.5 x + 2 + noise \n",
"\n",
"s_1 = autora.state.StandardState( # Initialize the State with those data\n",
" variables=variables,\n",
" experiment_data=pd.DataFrame({\"x\":xs, \"y\":ys}),\n",
")\n",
"s_1_prime = theorist(s_1) # Run the theorist\n",
"print(f\"Returned models: \"\n",
" f\"{s_1_prime.models}\") \n",
"print(f\"Last model's coefficients: \"\n",
" f\"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"During the `theorist(s_1)` call, `autora.state.estimator_on_state` does the following:\n",
"- Gets the names of the independent and dependent variables from the `s_1.variables`\n",
"- Gathers the values of those variables from `s_1.experiment_data`\n",
"- Passes those values to the `LinearRegression().fit(x, y)` method\n",
"- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n",
"- Returns `s_1 + Delta(models=[LinearRegression()])`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}