{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The `State` mechanism" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `State` is an object representing data from an experiment, like the conditions, observed experiment data and models. \n", "In the AutoRA framework, experimentalists, experiment runners and theorists are functions which \n", "- operate on `States` and \n", "- return `States`.\n", "\n", "The `autora.state` submodule provides classes and functions to help build these functions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Core Principle: every procedure accepts a `State` and returns a `State`\n", "\n", "The AutoRA `State` mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n", "- Data – stored as an immutable `State`\n", "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n", "\n", "Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:\n", "\n", "| Procedure | Data |\n", "|-------------------|-----------------|\n", "| Experimentalist | Conditions |\n", "| Experiment Runner | Experiment Data |\n", "| Theorist | Model |\n", "\n", "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n", "- Takes in existing Data in a `State` $S$\n", "- Adds new data $\\Delta S$\n", "- Returns an updated `State` $S^\\prime$ \n", "\n", "$$\n", "\\begin{aligned}\n", "f(S) &= S + \\Delta S \\\\\n", " &= S^\\prime\n", "\\end{aligned}\n", "$$\n", "\n", "AutoRA includes:\n", "- Classes to represent the Data $S$ – the `State` object (and the derived `StandardState` – a pre-defined version \n", "with the common fields needed for cyclical experiments) \n", "- Functions to make it easier to write procedures of the form $f(S) = S^\\prime$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dataclasses import dataclass, field\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import autora.state\n", "from autora.variable import VariableCollection, Variable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `State` objects\n", "\n", "`State` objects contain metadata describing an experiment, and the data gathered during an experiment. Any `State` \n", "object used in an AutoRA cycle will be a subclass of the `autora.state.State`, with the necessary fields specified. \n", "(The `autora.state.StandardState` provides some sensible defaults.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@dataclass(frozen=True)\n", "class BasicState(autora.state.State):\n", " data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={\"delta\": \"extend\"})\n", " \n", "s = BasicState()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because it is a python dataclass, the `State` fields can be accessed using attribute notation, for example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: []" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.data # an empty DataFrame with a column \"x\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`State` objects can be updated by adding `Delta` objects. A `Delta` represents new data, and is combined with the \n", "existing data in the `State` object. The `State` itself is immutable by design, so adding a `Delta` to it creates a new \n", "`State`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BasicState(data= x y\n", "0 1 1)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s + autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When carrying out this \"addition\", `s`: \n", "- inspects the `Delta` it has been passed and finds any field names matching fields on `s`, in this case \n", "`data`.\n", "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n", " - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n", " - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n", " data are concatenated to the bottom of the existing DataFrame.\n", " \n", " For full details on which options are available, see the documentation for the `autora.state` module. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
011
122
\n", "
" ], "text/plain": [ " x y\n", "0 1 1\n", "1 2 2" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(s + \n", " autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]})) + \n", " autora.state.Delta(data=pd.DataFrame({\"x\":[2], \"y\":[2]}))\n", " ).data # Access just the experiment_data on the updated State" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `StandardState`\n", "\n", "For typical AutoRA experiments, you can use the `autora.state.StandardState` object, which has fields for variables, \n", "conditions, experiment data and models. You can initialize a `StandardState` object like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s_0 = autora.state.StandardState(\n", " variables=VariableCollection(\n", " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", " dependent_variables=[Variable(\"y\")]\n", " ),\n", " conditions=pd.DataFrame({\"x\":[]}),\n", " experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n", " models=[]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making a function of the correct form\n", "\n", "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n", "simplest but most restrictive, to most complex but with the greatest flexibility):\n", "- Use the `autora.state.on_state` decorator\n", "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`\n", "\n", "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators. \n", "\n", "Say you have a function to generate new experimental conditions, given some variables. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_conditions(variables, num_samples=5, random_state=42):\n", " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", " for iv in variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " conditions[iv.name] = c # - Save the new values to the DataFrame\n", " return conditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll look at each of the ways you can make this into a function of the required form. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the `autora.state.on_state` decorator\n", "\n", "`autora.state.on_state` is a wrapper for functions which allows them to accept `State` objects as the first argument.\n", "\n", "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n", "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", "0 5.479121\n", "1 -1.222431\n", "2 7.171958\n", "3 3.947361\n", "4 -8.116453, experiment_data=Empty DataFrame\n", "Columns: [x, y]\n", "Index: [], models=[])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@autora.state.on_state(output=[\"conditions\"])\n", "def generate_conditions(variables, num_samples=5, random_state=42):\n", " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", " for iv in variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " conditions[iv.name] = c # - Save the new values to the DataFrame\n", " return conditions\n", "\n", "# Example\n", "generate_conditions(s_0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fully equivalently, you can modify `generate_conditions` to return a Delta of values with the appropriate field \n", "names from `State`: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", "0 5.479121\n", "1 -1.222431\n", "2 7.171958\n", "3 3.947361\n", "4 -8.116453, experiment_data=Empty DataFrame\n", "Columns: [x, y]\n", "Index: [], models=[])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@autora.state.on_state\n", "def generate_conditions(variables, num_samples=5, random_state=42):\n", " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", " for iv in variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " conditions[iv.name] = c # - Save the new values to the DataFrame\n", " return autora.state.Delta(conditions=conditions) # Return a Delta with the appropriate names\n", " # return {\"conditions\": conditions} # Returning a dictionary is equivalent\n", "\n", "# Example\n", "generate_conditions(s_0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Deep dive: `autora.state_on_state`\n", "The decorator notation is equivalent to the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", "0 1.521127\n", "1 3.362120\n", "2 1.065391\n", "3 -5.844244\n", "4 -6.444732, experiment_data=Empty DataFrame\n", "Columns: [x, y]\n", "Index: [], models=[])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def generate_conditions_inner(variables, num_samples=5, random_state=42):\n", " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", " result = pd.DataFrame() # Create a DataFrame to hold the results \n", " for iv in variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " result[iv.name] = c # - Save the new values to the DataFrame\n", " return result\n", "\n", "generate_conditions = autora.state.on_state(generate_conditions_inner, output=[\"conditions\"])\n", "\n", "# Example\n", "generate_conditions(s_0, random_state=180)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During the `generate_conditions(s_0, random_state=180)` call, `autora.state.on_state` does the following:\n", "- Inspects the signature of `generate_conditions_inner` to see which variables are required – in this case:\n", " - `variables`, \n", " - `num_samples` and \n", " - `random_state`.\n", "- Looks for fields with those names on `s_0`:\n", " - Finds a field called `variables`.\n", "- Calls `generate_conditions_inner` with those fields as arguments, plus any arguments specified in the \n", "`generate_conditions` call (here just `random_state`)\n", "- Converts the returned value `result` into `Delta(conditions=result)` using the name specified in `output=[\"conditions\"]`\n", "- Returns `s_0 + Delta(conditions=result)`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fully equivalently to using the `autora.state.on_state` wrapper, you can construct a function which takes and returns \n", "`State` objects. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", "0 5.479121\n", "1 -1.222431\n", "2 7.171958\n", "3 3.947361\n", "4 -8.116453, experiment_data=Empty DataFrame\n", "Columns: [x, y]\n", "Index: [], models=[])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):\n", " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", " for iv in state.variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " conditions[iv.name] = c # - Save the new values to the DataFrame\n", " delta = autora.state.Delta(conditions=conditions) # Construct a new Delta representing the updated data\n", " new_state = state + delta # Construct a new state, \"adding\" the Delta\n", " return new_state\n", "\n", "# Example\n", "generate_conditions(s_0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators\n", "\n", "The \"theorist\" component in an AutoRA cycle is often a `scikit-learn` compatible estimator which implements a curve \n", "fitting function like a linear, logistic or symbolic regression. `scikit-learn` estimators are classes, and they have\n", " a specific wrapper: `autora.state.estimator_on_state`, used as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned models: [LinearRegression()]\n", "Last model's coefficients: y = [3.49729147] x + [1.99930059]\n" ] } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "\n", "estimator = LinearRegression(fit_intercept=True) # Initialize the regressor with all its parameters\n", "theorist = autora.state.estimator_on_state(estimator) # Wrap the estimator\n", "\n", "\n", "# Example\n", "variables = s_0.variables # Reuse the variables from before \n", "xs = np.linspace(-10, 10, 101) # Make an array of x-values \n", "noise = np.random.default_rng(179).normal(0., 0.5, xs.shape) # Gaussian noise\n", "ys = (3.5 * xs + 2. + noise) # Calculate y = 3.5 x + 2 + noise \n", "\n", "s_1 = autora.state.StandardState( # Initialize the State with those data\n", " variables=variables,\n", " experiment_data=pd.DataFrame({\"x\":xs, \"y\":ys}),\n", ")\n", "s_1_prime = theorist(s_1) # Run the theorist\n", "print(f\"Returned models: \"\n", " f\"{s_1_prime.models}\") \n", "print(f\"Last model's coefficients: \"\n", " f\"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During the `theorist(s_1)` call, `autora.state.estimator_on_state` does the following:\n", "- Gets the names of the independent and dependent variables from the `s_1.variables`\n", "- Gathers the values of those variables from `s_1.experiment_data`\n", "- Passes those values to the `LinearRegression().fit(x, y)` method\n", "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n", "- Returns `s_1 + Delta(models=[LinearRegression()])`" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 1 }