{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The `State` mechanism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `State` is an object representing data from an experiment, like the conditions, observed experiment data and models. \n",
    "In the AutoRA framework, experimentalists, experiment runners and theorists are functions which \n",
    "- operate on `States` and \n",
    "- return `States`.\n",
    "\n",
    "The `autora.state` submodule provides classes and functions to help build these functions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Core Principle: every procedure accepts a `State` and returns a `State`\n",
    "\n",
    "The AutoRA `State` mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
    "- Data – stored as an immutable `State`\n",
    "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n",
    "\n",
    "Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:\n",
    "\n",
    "| Procedure         | Data            |\n",
    "|-------------------|-----------------|\n",
    "| Experimentalist   | Conditions      |\n",
    "| Experiment Runner | Experiment Data |\n",
    "| Theorist          | Model           |\n",
    "\n",
    "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n",
    "- Takes in existing Data in a `State` $S$\n",
    "- Adds new data $\\Delta S$\n",
    "- Returns an updated `State` $S^\\prime$  \n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "f(S) &= S + \\Delta S \\\\\n",
    "     &= S^\\prime\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "AutoRA includes:\n",
    "- Classes to represent the Data $S$ – the `State` object (and the derived `StandardState` – a pre-defined version \n",
    "with the common fields needed for cyclical experiments)  \n",
    "- Functions to make it easier to write procedures of the form $f(S) = S^\\prime$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataclasses import dataclass, field\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import autora.state\n",
    "from autora.variable import VariableCollection, Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `State` objects\n",
    "\n",
    "`State` objects contain metadata describing an experiment, and the data gathered during an experiment. Any `State` \n",
    "object used in an AutoRA cycle will be a subclass of the `autora.state.State`, with the necessary fields specified. \n",
    "(The `autora.state.StandardState` provides some sensible defaults.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@dataclass(frozen=True)\n",
    "class BasicState(autora.state.State):\n",
    "   data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={\"delta\": \"extend\"})\n",
    "   \n",
    "s = BasicState()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because it is a python dataclass, the `State` fields can be accessed using attribute notation, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.data  # an empty DataFrame with a column \"x\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`State` objects can be updated by adding `Delta` objects. A `Delta` represents new data, and is combined with the \n",
    "existing data in the `State` object. The `State` itself is immutable by design, so adding a `Delta` to it creates a new \n",
    "`State`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BasicState(data=   x  y\n",
       "0  1  1)"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s + autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When carrying out this \"addition\", `s`: \n",
    "- inspects the `Delta` it has been passed and finds any field names matching fields on `s`, in this case \n",
    "`data`.\n",
    "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n",
    "    - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n",
    "    - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n",
    "     data are concatenated to the bottom of the existing DataFrame.\n",
    "    \n",
    "    For full details on which options are available, see the documentation for the `autora.state` module. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  1  1\n",
       "1  2  2"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(s + \n",
    " autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]})) + \n",
    " autora.state.Delta(data=pd.DataFrame({\"x\":[2], \"y\":[2]}))\n",
    " ).data  # Access just the experiment_data on the updated State"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `StandardState`\n",
    "\n",
    "For typical AutoRA experiments, you can use the `autora.state.StandardState` object, which has fields for variables, \n",
    "conditions, experiment data and models. You can initialize a `StandardState` object like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_0 = autora.state.StandardState(\n",
    "    variables=VariableCollection(\n",
    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
    "        dependent_variables=[Variable(\"y\")]\n",
    "    ),\n",
    "    conditions=pd.DataFrame({\"x\":[]}),\n",
    "    experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n",
    "    models=[]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making a function of the correct form\n",
    "\n",
    "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n",
    "simplest but most restrictive, to most complex but with the greatest flexibility):\n",
    "- Use the `autora.state.on_state` decorator\n",
    "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`\n",
    "\n",
    "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators.  \n",
    "\n",
    "Say you have a function to generate new experimental conditions, given some variables. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
    "    return conditions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll look at each of the ways you can make this into a function of the required form. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use the `autora.state.on_state` decorator\n",
    "\n",
    "`autora.state.on_state` is a wrapper for functions which allows them to accept `State` objects as the first argument.\n",
    "\n",
    "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n",
    "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
       "0  5.479121\n",
       "1 -1.222431\n",
       "2  7.171958\n",
       "3  3.947361\n",
       "4 -8.116453, experiment_data=Empty DataFrame\n",
       "Columns: [x, y]\n",
       "Index: [], models=[])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@autora.state.on_state(output=[\"conditions\"])\n",
    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
    "    return conditions\n",
    "\n",
    "# Example\n",
    "generate_conditions(s_0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fully equivalently, you can modify `generate_conditions` to return a Delta of values with the appropriate field \n",
    "names from `State`: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
       "0  5.479121\n",
       "1 -1.222431\n",
       "2  7.171958\n",
       "3  3.947361\n",
       "4 -8.116453, experiment_data=Empty DataFrame\n",
       "Columns: [x, y]\n",
       "Index: [], models=[])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@autora.state.on_state\n",
    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
    "    return autora.state.Delta(conditions=conditions)        # Return a Delta with the appropriate names\n",
    "    # return {\"conditions\": conditions}                     # Returning a dictionary is equivalent\n",
    "\n",
    "# Example\n",
    "generate_conditions(s_0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Deep dive: `autora.state_on_state`\n",
    "The decorator notation is equivalent to the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
       "0  1.521127\n",
       "1  3.362120\n",
       "2  1.065391\n",
       "3 -5.844244\n",
       "4 -6.444732, experiment_data=Empty DataFrame\n",
       "Columns: [x, y]\n",
       "Index: [], models=[])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def generate_conditions_inner(variables, num_samples=5, random_state=42):\n",
    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
    "    result = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
    "        result[iv.name] = c                             #  - Save the new values to the DataFrame\n",
    "    return result\n",
    "\n",
    "generate_conditions = autora.state.on_state(generate_conditions_inner, output=[\"conditions\"])\n",
    "\n",
    "# Example\n",
    "generate_conditions(s_0, random_state=180)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During the `generate_conditions(s_0, random_state=180)` call, `autora.state.on_state` does the following:\n",
    "- Inspects the signature of `generate_conditions_inner` to see which variables are required – in this case:\n",
    "    - `variables`, \n",
    "    - `num_samples` and \n",
    "    - `random_state`.\n",
    "- Looks for fields with those names on `s_0`:\n",
    "    - Finds a field called `variables`.\n",
    "- Calls `generate_conditions_inner` with those fields as arguments, plus any arguments specified in the \n",
    "`generate_conditions` call (here just `random_state`)\n",
    "- Converts the returned value `result` into `Delta(conditions=result)` using the name specified in `output=[\"conditions\"]`\n",
    "- Returns `s_0 + Delta(conditions=result)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fully equivalently to using the `autora.state.on_state` wrapper, you can construct a function which takes and returns \n",
    "`State` objects. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
       "0  5.479121\n",
       "1 -1.222431\n",
       "2  7.171958\n",
       "3  3.947361\n",
       "4 -8.116453, experiment_data=Empty DataFrame\n",
       "Columns: [x, y]\n",
       "Index: [], models=[])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):\n",
    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
    "    for iv in state.variables.independent_variables:        # Loop through the independent variables\n",
    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
    "    delta = autora.state.Delta(conditions=conditions)       # Construct a new Delta representing the updated data\n",
    "    new_state = state + delta                               # Construct a new state, \"adding\" the Delta\n",
    "    return new_state\n",
    "\n",
    "# Example\n",
    "generate_conditions(s_0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators\n",
    "\n",
    "The \"theorist\" component in an AutoRA cycle is often a `scikit-learn` compatible estimator which implements a curve \n",
    "fitting function like a linear, logistic or symbolic regression. `scikit-learn` estimators are classes, and they have\n",
    " a specific wrapper: `autora.state.estimator_on_state`, used as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Returned models: [LinearRegression()]\n",
      "Last model's coefficients: y = [3.49729147] x + [1.99930059]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "\n",
    "estimator = LinearRegression(fit_intercept=True)       # Initialize the regressor with all its parameters\n",
    "theorist = autora.state.estimator_on_state(estimator)  # Wrap the estimator\n",
    "\n",
    "\n",
    "# Example\n",
    "variables = s_0.variables          # Reuse the variables from before \n",
    "xs = np.linspace(-10, 10, 101)     # Make an array of x-values \n",
    "noise = np.random.default_rng(179).normal(0., 0.5, xs.shape)  # Gaussian noise\n",
    "ys = (3.5 * xs + 2. + noise)       # Calculate y = 3.5 x + 2 + noise  \n",
    "\n",
    "s_1 = autora.state.StandardState(  # Initialize the State with those data\n",
    "    variables=variables,\n",
    "    experiment_data=pd.DataFrame({\"x\":xs, \"y\":ys}),\n",
    ")\n",
    "s_1_prime = theorist(s_1)         # Run the theorist\n",
    "print(f\"Returned models: \"\n",
    "      f\"{s_1_prime.models}\")      \n",
    "print(f\"Last model's coefficients: \"\n",
    "      f\"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During the `theorist(s_1)` call, `autora.state.estimator_on_state` does the following:\n",
    "- Gets the names of the independent and dependent variables from the `s_1.variables`\n",
    "- Gathers the values of those variables from `s_1.experiment_data`\n",
    "- Passes those values to the `LinearRegression().fit(x, y)` method\n",
    "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n",
    "- Returns `s_1 + Delta(models=[LinearRegression()])`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}