Introduction¶
AutoRA (Automated Research Assistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.
This notebook is the fourth of four notebooks within the basic tutorials of autora
. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to run the content of the last notebook for your current notebook. We will here provide a link to each notebook, but we will also provide a link at the end of each notebook to navigate you to the next notebook.
AutoRA Basic Tutorial I: Components
AutoRA Basic Tutorial II: Loop Constructs
AutoRA Basic Tutorial III: Functional Workflow
AutoRA Basic Tutorial IV: Customization
These notebooks provide a comprehensive introduction to the capabilities of autora
. It demonstrates the fundamental components of autora
, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.
How to use this notebook You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.
Tutorial Setup¶
We will here import some standard python packages, set seeds for replicability, and define a plotting function.
#### Installation ####
!pip install -q "autora[theorist-bms]"
!pip install -q "autora[experiment-runner-synthetic-abstract-equation]"
#### Import modules ####
from typing import Optional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sympy as sp
import torch
from autora.variable import Variable, ValueType, VariableCollection
from autora.state import StandardState, on_state, estimator_on_state
from autora.experimentalist.random import random_pool
from autora.theorist.bms import BMSRegressor
from autora.experiment_runner.synthetic.abstract.equation import equation_experiment
#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)
#### Define plot function ####
def plot_from_state(s: StandardState, expr: str):
"""
Plots the data, the ground truth model, and the current predicted model
"""
#Determine labels and variables
print(s.models[-1])
model_label = f"Model: {s.models[-1]}" if hasattr(s.models[-1],'repr') else "Model"
experiment_data = s.experiment_data.sort_values(by=["x"])
ground_x = np.linspace(s.variables.independent_variables[0].value_range[0],s.variables.independent_variables[0].value_range[1],100)
#Determine predicted ground truth
equation = sp.simplify(expr)
ground_predicted_y = [equation.evalf(subs={'x':x}) for x in ground_x]
model_predicted_y = s.models[-1].predict(ground_x.reshape(-1, 1))
#Plot the data and models
f = plt.figure(figsize=(4,3))
plt.plot(experiment_data["x"], experiment_data["y"], 'o', label = None)
plt.plot(ground_x, model_predicted_y, alpha=.8, label=model_label)
plt.plot(ground_x, ground_predicted_y, alpha=.8, label=f'Ground Truth: {expr}')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
[notice] A new release of pip is available: 23.2 -> 23.3.1 [notice] To update, run: python.exe -m pip install --upgrade pip
Customizing Automated Empirical Research Components¶
autora
is a flexible framework in which users can integrate their own experimentalists, experiment runners, and theorists in an automated empirical research workflow. This section illustrates the integration of custom autora
components. For more information on how to contribute your own modules to the autora
ecosystem, please refer to the Contributor Documentation.
To illustrate the use of custom experimentalists, experiment runners, and theorists, we consider a simple workflow:
- Generate 10 seed experimental conditions using
random_pool
- Iterate through the following steps
- Identify 3 new experimental conditions using an
experimentalist
- Collect observations using the
experiment_runner
- Identify a model relating conditions to observations using a
theorist
- Identify 3 new experimental conditions using an
Once this workflow is setup, we will replace each component with a custom function.
#### Define metadata ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
#### Define condition pool ####
conditions = random_pool(variables, num_samples=10, random_state=0)
#### Define state ####
s = StandardState(
variables = variables,
conditions = conditions,
experiment_data = pd.DataFrame(columns=["x","y"])
)
#### Define experimentalist and wrap with state functionality ####
experimentalist = on_state(random_pool, output=["conditions"])
#### Define experiment runner and wrap with state functionality ####
sin_experiment = equation_experiment(sp.simplify('sin(x)'), variables.independent_variables, variables.dependent_variables[0])
sin_runner = sin_experiment.run
experiment_runner = on_state(sin_runner, output=["experiment_data"])
#### Define theorist and wrap with state functionality ####
theorist = estimator_on_state(BMSRegressor(epochs=100))
We should quickly test to make sure everything works as expected.
print('\033[1mPrevious State:\033[0m')
print(s)
for cycle in range(2):
s = experimentalist(s, num_samples=10, random_state=42+cycle)
s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
s = theorist(s)
plot_from_state(s, 'sin(x)')
print('\n\033[1mUpdated State:\033[0m')
print(s)
INFO:autora.theorist.bms.regressor:BMS fitting started
Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 5.416539
1 4.116570
2 3.249923
3 1.733292
4 1.949954
5 0.216662
6 0.433323
7 0.000000
8 1.083308
9 5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
100%|██████████| 100/100 [00:04<00:00, 21.31it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
-0.21
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 21.63it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
sin(x)
Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 3.249923
1 4.116570
2 2.599939
3 0.216662
4 3.683247
5 0.000000
6 1.733292
7 5.416539
8 2.816600
9 3.683247, experiment_data= x y
0 0.433323 0.572248
1 4.983216 -1.483542
2 4.116570 -0.452463
3 2.816600 0.789584
4 2.599939 -0.459964
5 5.416539 -1.413252
6 0.433323 0.483809
7 4.333231 -1.087098
8 1.299969 0.955149
9 0.433323 -0.006633
10 3.249923 0.013996
11 4.116570 -0.488600
12 2.599939 0.222789
13 0.216662 -0.239366
14 3.683247 -1.511473
15 0.000000 0.485811
16 1.733292 0.995155
17 5.416539 -0.659296
18 2.816600 -0.072496
19 3.683247 0.097695, models=[sin(x), sin(x)])
Custom Experimentalists¶
Experimentalists must be implemented as functions. For instance, an experimentalist sampler function expects a pool of experimental conditions and returns a modified set of experimental conditions.
Requirements for working with the state:
- The function has a
variables
argument that accepts theVariableCollection
type - The function has a
conditions
argument that accepts apandas.DataFrame
- The function returns a
pandas.DataFrame
The custom uniform_sampler
below will select conditions that are the least represented in the data.
Note that when building custom experimentalists, we can either wrap the function with on_state(output=['conditions'])
as we did in tutorial III, or else we can use the @on_state(output=['conditions'])
decorator.
#==================================================================#
# Option 1 - Wrapping our Component #
#==================================================================#
def uniform_sample(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):
"""
An experimentalist that selects the least represented datapoints
"""
#Set rng seed
rng = np.random.default_rng(random_state)
#Retrieve the possible values
allowed_values = variables.independent_variables[0].allowed_values
#Determine the representation of each value
conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
#Sort to determine the least represented values
conditions_sort = conditions_count.argsort()
conditions_count = conditions_count[conditions_sort]
values_count = allowed_values[conditions_sort]
#Sample from values with the smallest frequency
x = values_count[conditions_count<=conditions_count[num_samples-1]]
x = rng.choice(x,num_samples)
return pd.DataFrame({"x": x})
custom_experimentalist = on_state(uniform_sample, output=["conditions"])
#==================================================================#
# Option 2 - Using a Decorator #
#==================================================================#
@on_state(output=["conditions"])
def custom_experimentalist(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):
"""
An experimentalist that selects the least represented datapoints
"""
#Set rng seed
rng = np.random.default_rng(random_state)
#Retrieve the possible values
allowed_values = variables.independent_variables[0].allowed_values
#Determine the representation of each value
conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
#Sort to determine the least represented values
conditions_sort = conditions_count.argsort()
conditions_count = conditions_count[conditions_sort]
values_count = allowed_values[conditions_sort]
#Sample from values with the smallest frequency
x = values_count[conditions_count<=conditions_count[num_samples-1]]
x = rng.choice(x,num_samples)
return pd.DataFrame({"x": x})
Now, we will re-run our initial workflow while incorporating our custom experimentalist.
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
conditions = random_pool(variables, num_samples=10, random_state=0)
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))
#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)
#Cycle
for cycle in range(5):
s = custom_experimentalist(s, num_samples = 10, random_state=42+cycle) #Our custom experimentalist
s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
s = theorist(s)
plot_from_state(s,'sin(x)')
#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
INFO:autora.theorist.bms.regressor:BMS fitting started
Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 5.416539
1 4.116570
2 3.249923
3 1.733292
4 1.949954
5 0.216662
6 0.433323
7 0.000000
8 1.083308
9 5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
100%|██████████| 100/100 [00:05<00:00, 19.54it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
-0.21
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 20.67it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
-0.09
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 22.48it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
sin(x)
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 20.75it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
0.07
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 20.74it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
sin(x)
Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 4.983216
1 5.849862
2 3.466585
3 4.116570
4 1.733292
5 3.249923
6 3.249923
7 5.199877
8 3.683247
9 4.333231, experiment_data= x y
0 5.849862 -0.267531
1 1.516631 0.478541
2 2.383277 1.062925
3 3.683247 -0.045271
4 3.683247 -1.491071
5 2.599939 -0.135536
6 5.849862 -0.355969
7 2.383277 0.529578
8 4.766554 -1.006934
9 5.849862 -0.846411
10 2.816600 0.441416
11 1.949954 1.268066
12 3.466585 -0.612066
13 5.633201 -1.059511
14 3.033262 -0.887800
15 0.000000 0.485811
16 4.333231 -0.920648
17 0.649985 0.708040
18 6.066524 -0.606768
19 2.166616 1.440938
20 1.299969 1.686531
21 3.683247 -0.464401
22 5.849862 -0.256513
23 4.983216 -0.395319
24 1.299969 1.375672
25 2.383277 0.977178
26 3.899908 -0.877285
27 4.549893 -1.496263
28 5.416539 -0.574078
29 4.766554 -1.254513
30 1.949954 0.700044
31 0.216662 -0.096459
32 0.866646 0.829807
33 0.649985 1.027126
34 0.649985 0.530925
35 6.283185 0.131242
36 2.166616 1.093523
37 1.516631 1.343437
38 0.649985 0.159225
39 3.033262 0.342529
40 4.983216 -1.215008
41 5.849862 0.189056
42 3.466585 -0.454861
43 4.116570 -0.462597
44 1.733292 0.402174
45 3.249923 -0.822852
46 3.249923 -0.119755
47 5.199877 -1.107408
48 3.683247 -0.471516
49 4.333231 -0.666329, models=[sin(x), sin(x), sin(x), sin(x), sin(x)])
Custom Experiment Runner¶
Experiment runners must be implemented as functions.
Requirements for working with the state:
- The function has a
conditions
argument that accepts apandas.DataFrame
- The function returns a
pandas.DataFrame
The custom quadratic_experiment
below will apply a quadratic transform (x + x**2
) to the conditions.
Note that when building custom experiment runners, we can either wrap the function with on_state(output=['experiment_data'])
as we did in tutorial III, or else we can use the @on_state(output=['experiment_data'])
decorator.
#==================================================================#
# Option 1 - Wrapping our Component #
#==================================================================#
def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
#Set rng seed
rng = np.random.default_rng(random_state)
#Extract conditions
x = conditions["x"]
#Compute data
y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
#Assign to dataframe
observations = conditions.assign(y = y)
return observations
custom_experiment_runner = on_state(quadratic_experiment, output=["experiment_data"])
#==================================================================#
# Option 2 - Using a Decorator #
#==================================================================#
@on_state(output=["experiment_data"])
def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
#Set rng seed
rng = np.random.default_rng(random_state)
#Extract conditions
x = conditions["x"]
#Compute data
y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
#Assign to dataframe
observations = conditions.assign(y = y)
return observations
Now, we will re-run our initial workflow while incorporating our custom experiment runner.
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
conditions = random_pool(variables, num_samples=10, random_state=0)
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))
#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)
#Cycle
for cycle in range(5):
s = experimentalist(s, num_samples = 10, random_state=42+cycle)
s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
s = theorist(s)
plot_from_state(s, 'x + x**2')
#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
INFO:autora.theorist.bms.regressor:BMS fitting started
Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 5.416539
1 4.116570
2 3.249923
3 1.733292
4 1.949954
5 0.216662
6 0.433323
7 0.000000
8 1.083308
9 5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
100%|██████████| 100/100 [00:06<00:00, 16.42it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
((x + 0.94) * x)
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:06<00:00, 14.43it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
((x ** 1.76) * 1.76)
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:05<00:00, 17.03it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
((x ** 1.76) * 1.76)
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:07<00:00, 13.49it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
(x + (x * x))
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:07<00:00, 13.87it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
(((x + 1.0) * 1.0) * x)
Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 3.249923
1 5.849862
2 1.299969
3 0.433323
4 3.466585
5 1.733292
6 1.516631
7 3.899908
8 0.866646
9 6.066524, experiment_data= x y
0 0.433323 0.773451
1 4.983216 29.295665
2 4.116570 21.437941
3 2.816600 11.220120
4 2.599939 8.384103
5 5.416539 34.104345
6 0.433323 0.685012
7 4.333231 22.952003
8 1.299969 2.981489
9 0.433323 0.194570
10 3.249923 13.934041
11 4.116570 21.401805
12 2.599939 9.066856
13 0.216662 -0.190733
14 3.683247 16.253633
15 0.000000 0.485811
16 1.733292 4.745924
17 5.416539 34.858300
18 2.816600 10.358040
19 3.683247 17.862801
20 4.333231 23.833106
21 0.649985 1.123617
22 5.199877 32.401980
23 1.516631 4.385032
24 4.116570 21.474838
25 2.599939 9.649098
26 0.433323 0.431507
27 6.283185 45.252166
28 3.466585 15.671881
29 0.866646 1.361742
30 5.849862 39.841817
31 3.683247 16.938122
32 4.549893 25.319062
33 3.249923 14.233877
34 3.249923 13.737676
35 4.766554 27.617837
36 4.549893 25.517251
37 5.199877 32.583507
38 3.466585 15.037847
39 3.249923 14.046336
40 3.249923 13.560468
41 5.849862 40.679695
42 1.299969 2.854330
43 0.433323 0.986184
44 3.466585 14.899144
45 1.733292 4.022862
46 1.516631 3.805164
47 3.899908 18.885295
48 0.866646 1.661760
49 6.066524 43.131882, models=[(((x + 1.0) * 1.0) * x), (((x + 1.0) * 1.0) * x), (((x + 1.0) * 1.0) * x), (((x + 1.0) * 1.0) * x), (((x + 1.0) * 1.0) * x)])
Custom Theorists¶
Theorists must be implemented as classes that inherit from sklearn.base.BaseEstimator
. The class must implement the following methods:
fit(self, conditions, observations)
predict(self, conditions)
Requirements for working with the state:
- The fit module function has a
conditions
argument that accepts apandas.DataFrame
- The fit module function has an
observations
argument that accepts apandas.DataFrame
- the fit function returns
self
(i.e., the model itself)
The custom PolynomialRegressor
below fits a polynomial of a specified degree.
import numpy as np
from sklearn.base import BaseEstimator
class PolynomialRegressor(BaseEstimator):
def __init__(self, degree: int = 3):
self.degree = degree
def fit(self, conditions: pd.DataFrame, observations: pd.DataFrame):
c = np.array(conditions)
o = np.array(observations)
# polyfit expects a 1D array
if c.ndim > 1:
c = c.flatten()
if o.ndim > 1:
o = o.flatten()
# fit polynomial
self.coeff = np.polyfit(c, o, self.degree)
self.polynomial = np.poly1d(self.coeff)
return self
def predict(self, conditions: pd.DataFrame):
c = np.array(conditions)
return self.polynomial(c)
custom_theorist = estimator_on_state(PolynomialRegressor())
Now, we will re-run our initial workflow while incorporating our custom theorist.
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
conditions = random_pool(variables, num_samples=10, random_state=0)
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))
#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)
#Cycle
for cycle in range(5):
s = experimentalist(s, num_samples=10, random_state=42+cycle)
s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
s = custom_theorist(s)
print(s.models[-1])
plot_from_state(s, 'sin(x)')
#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 5.416539
1 4.116570
2 3.249923
3 1.733292
4 1.949954
5 0.216662
6 0.433323
7 0.000000
8 1.083308
9 5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
PolynomialRegressor()
PolynomialRegressor()
PolynomialRegressor() PolynomialRegressor()
PolynomialRegressor() PolynomialRegressor()
PolynomialRegressor() PolynomialRegressor()
PolynomialRegressor() PolynomialRegressor()
Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 3.249923
1 5.849862
2 1.299969
3 0.433323
4 3.466585
5 1.733292
6 1.516631
7 3.899908
8 0.866646
9 6.066524, experiment_data= x y
0 0.433323 0.572248
1 4.983216 -1.483542
2 4.116570 -0.452463
3 2.816600 0.789584
4 2.599939 -0.459964
5 5.416539 -1.413252
6 0.433323 0.483809
7 4.333231 -1.087098
8 1.299969 0.955149
9 0.433323 -0.006633
10 3.249923 0.013996
11 4.116570 -0.488600
12 2.599939 0.222789
13 0.216662 -0.239366
14 3.683247 -1.511473
15 0.000000 0.485811
16 1.733292 0.995155
17 5.416539 -0.659296
18 2.816600 -0.072496
19 3.683247 0.097695
20 4.333231 -0.205995
21 0.649985 0.656327
22 5.199877 -0.720136
23 1.516631 1.566765
24 4.116570 -0.415567
25 2.599939 0.805032
26 0.433323 0.230304
27 6.283185 -0.509437
28 3.466585 -0.131217
29 0.866646 0.506182
30 5.849862 -0.648822
31 3.683247 -0.826983
32 4.549893 -0.919182
33 3.249923 0.313832
34 3.249923 -0.182368
35 4.766554 -0.867292
36 4.549893 -0.720993
37 5.199877 -0.538608
38 3.466585 -0.765251
39 3.249923 0.126291
40 3.249923 -0.359577
41 5.849862 0.189056
42 1.299969 0.827990
43 0.433323 0.784981
44 3.466585 -0.903954
45 1.733292 0.272093
46 1.516631 0.986897
47 3.899908 -0.911596
48 0.866646 0.806200
49 6.066524 0.047677, models=[PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor()])
Altogether Now¶
We have now created custom experimentalists, experiment runners, and theorists. Let's add them all to the same workflow to see our first fully customized autora
workflow.
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
conditions = random_pool(variables, num_samples=10, random_state=0)
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))
#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)
#Cycle
for cycle in range(5):
s = custom_experimentalist(s, num_samples=10, random_state=42+cycle)
s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
s = custom_theorist(s)
plot_from_state(s, 'x + x**2')
#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 5.416539
1 4.116570
2 3.249923
3 1.733292
4 1.949954
5 0.216662
6 0.433323
7 0.000000
8 1.083308
9 5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
PolynomialRegressor()
PolynomialRegressor()
PolynomialRegressor()
PolynomialRegressor()
PolynomialRegressor()
Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0. , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
3.24992343, 3.466585 , 3.68324656, 3.89990812, 4.11656968,
4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x
0 4.983216
1 5.849862
2 3.466585
3 4.116570
4 1.733292
5 3.249923
6 3.249923
7 5.199877
8 3.683247
9 4.333231, experiment_data= x y
0 5.849862 40.223108
1 1.516631 3.296808
2 2.383277 8.438513
3 3.683247 17.719834
4 3.683247 16.274034
5 2.599939 8.708530
6 5.849862 40.134670
7 2.383277 7.905166
8 4.766554 27.478194
9 5.849862 39.644228
10 2.816600 10.871952
11 1.949954 6.091364
12 3.466585 15.191032
13 5.633201 36.911813
14 3.033262 11.238020
15 0.000000 0.485811
16 4.333231 23.118453
17 0.649985 1.175330
18 6.066524 42.477437
19 2.166616 7.474088
20 1.299969 3.712871
21 3.683247 17.300704
22 5.849862 40.234126
23 4.983216 30.383888
24 1.299969 3.402012
25 2.383277 8.352766
26 3.899908 18.919606
27 4.549893 24.741981
28 5.416539 34.943519
29 4.766554 27.230615
30 1.949954 5.523342
31 0.216662 -0.047826
32 0.866646 1.685367
33 0.649985 1.494416
34 0.649985 0.998215
35 6.283185 45.892845
36 2.166616 7.126673
37 1.516631 4.161704
38 0.649985 0.626515
39 3.033262 12.468350
40 4.983216 29.564199
41 5.849862 40.679695
42 3.466585 15.348237
43 4.116570 21.427807
44 1.733292 4.152943
45 3.249923 13.097192
46 3.249923 13.800289
47 5.199877 32.014707
48 3.683247 17.293589
49 4.333231 23.372772, models=[PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor()])
Let's run the controller with the new theorist for 3 research cycles, defined by the number of models generated.
Help¶
We hope that this tutorial helped demonstrate the fundamental components of autora
, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments. We encourage you to explore other tutorials and check out the documentation.
If you encounter any issues, bugs, or questions, please reach out to us through the AutoRA Forum. Feel free to report any bugs by creating an issue in the AutoRA repository.
You may also post questions directly into the User Q&A Section.