Tutorial IV Customization¶

Introduction¶

AutoRA (Automated Research Assistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.

This notebook is the fourth of four notebooks within the basic tutorials of autora. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to run the content of the last notebook for your current notebook.

These notebooks provide a comprehensive introduction to the capabilities of autora. It demonstrates the fundamental components of autora, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.

How to use this notebook You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.

Tutorial Setup¶

We will here import some standard python packages, set seeds for replicability, and define a plotting function.

In [ ]:

Copied!





#### Installation ####
!pip install -q "autora[theorist-bms]"
!pip install -q "autora[experiment-runner-synthetic-abstract-equation]"

#### Import modules ####
from typing import Optional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sympy as sp
import torch

from autora.variable import Variable, ValueType, VariableCollection
from autora.state import StandardState, on_state, estimator_on_state
from autora.experimentalist.random import random_pool
from autora.theorist.bms import BMSRegressor
from autora.experiment_runner.synthetic.abstract.equation import equation_experiment

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

#### Define plot function ####
def plot_from_state(s: StandardState, expr: str):    
    
    """
    Plots the data, the ground truth model, and the current predicted model
    """
    
    #Determine labels and variables
    print(s.models[-1])
    model_label = f"Model: {s.models[-1]}" if hasattr(s.models[-1],'repr') else "Model"
    experiment_data = s.experiment_data.sort_values(by=["x"])
    ground_x = np.linspace(s.variables.independent_variables[0].value_range[0],s.variables.independent_variables[0].value_range[1],100)
    
    #Determine predicted ground truth
    equation = sp.simplify(expr)
    ground_predicted_y = [equation.evalf(subs={'x':x}) for x in ground_x]
    model_predicted_y = s.models[-1].predict(ground_x.reshape(-1, 1))

    #Plot the data and models
    f = plt.figure(figsize=(4,3))
    plt.plot(experiment_data["x"], experiment_data["y"], 'o', label = None)
    plt.plot(ground_x, model_predicted_y, alpha=.8, label=model_label)
    plt.plot(ground_x, ground_predicted_y, alpha=.8,  label=f'Ground Truth: {expr}')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend()
    plt.show()
#### Installation ####
!pip install -q "autora[theorist-bms]"
!pip install -q "autora[experiment-runner-synthetic-abstract-equation]"

#### Import modules ####
from typing import Optional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sympy as sp
import torch

from autora.variable import Variable, ValueType, VariableCollection
from autora.state import StandardState, on_state, estimator_on_state
from autora.experimentalist.random import random_pool
from autora.theorist.bms import BMSRegressor
from autora.experiment_runner.synthetic.abstract.equation import equation_experiment

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

#### Define plot function ####
def plot_from_state(s: StandardState, expr: str):    
    
    """
    Plots the data, the ground truth model, and the current predicted model
    """
    
    #Determine labels and variables
    print(s.models[-1])
    model_label = f"Model: {s.models[-1]}" if hasattr(s.models[-1],'repr') else "Model"
    experiment_data = s.experiment_data.sort_values(by=["x"])
    ground_x = np.linspace(s.variables.independent_variables[0].value_range[0],s.variables.independent_variables[0].value_range[1],100)
    
    #Determine predicted ground truth
    equation = sp.simplify(expr)
    ground_predicted_y = [equation.evalf(subs={'x':x}) for x in ground_x]
    model_predicted_y = s.models[-1].predict(ground_x.reshape(-1, 1))

    #Plot the data and models
    f = plt.figure(figsize=(4,3))
    plt.plot(experiment_data["x"], experiment_data["y"], 'o', label = None)
    plt.plot(ground_x, model_predicted_y, alpha=.8, label=model_label)
    plt.plot(ground_x, ground_predicted_y, alpha=.8,  label=f'Ground Truth: {expr}')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend()
    plt.show()

WARNING: typer 0.12.3 does not provide the extra 'all'
WARNING: typer 0.12.3 does not provide the extra 'all'

Customizing Automated Empirical Research Components¶

autora is a flexible framework in which users can integrate their own experimentalists, experiment runners, and theorists in an automated empirical research workflow. This section illustrates the integration of custom autora components. For more information on how to contribute your own modules to the autora ecosystem, please refer to the Contributor Documentation.

To illustrate the use of custom experimentalists, experiment runners, and theorists, we consider a simple workflow:

Generate 10 seed experimental conditions using random_pool
Iterate through the following steps
- Identify 3 new experimental conditions using an experimentalist
- Collect observations using the experiment_runner
- Identify a model relating conditions to observations using a theorist

Once this workflow is setup, we will replace each component with a custom function.

In [ ]:

Copied!





#### Define metadata ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define condition pool ####
conditions = random_pool(variables, num_samples=10, random_state=0)

#### Define state ####
s = StandardState(
    variables = variables,
    conditions = conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

#### Define experimentalist and wrap with state functionality ####
experimentalist = on_state(random_pool, output=["conditions"])

#### Define experiment runner and wrap with state functionality ####
sin_experiment = equation_experiment(sp.simplify('sin(x)'), variables.independent_variables, variables.dependent_variables[0])
sin_runner = sin_experiment.run

experiment_runner = on_state(sin_runner, output=["experiment_data"])

#### Define theorist and wrap with state functionality ####
theorist = estimator_on_state(BMSRegressor(epochs=100))
#### Define metadata ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define condition pool ####
conditions = random_pool(variables, num_samples=10, random_state=0)

#### Define state ####
s = StandardState(
    variables = variables,
    conditions = conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

#### Define experimentalist and wrap with state functionality ####
experimentalist = on_state(random_pool, output=["conditions"])

#### Define experiment runner and wrap with state functionality ####
sin_experiment = equation_experiment(sp.simplify('sin(x)'), variables.independent_variables, variables.dependent_variables[0])
sin_runner = sin_experiment.run

experiment_runner = on_state(sin_runner, output=["experiment_data"])

#### Define theorist and wrap with state functionality ####
theorist = estimator_on_state(BMSRegressor(epochs=100))

We should quickly test to make sure everything works as expected.

In [ ]:

Copied!





print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(2):
    s = experimentalist(s, num_samples=10, random_state=42+cycle)
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s, 'sin(x)')

print('\n\033[1mUpdated State:\033[0m')
print(s)
print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(2):
    s = experimentalist(s, num_samples=10, random_state=42+cycle)
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s, 'sin(x)')

print('\n\033[1mUpdated State:\033[0m')
print(s)

INFO:autora.theorist.bms.regressor:BMS fitting started

Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.416539
1  4.116570
2  3.249923
3  1.733292
4  1.949954
5  0.216662
6  0.433323
7  0.000000
8  1.083308
9  5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

100%|██████████| 100/100 [00:04<00:00, 20.17it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

sin(x)

No description has been provided for this image

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:05<00:00, 19.45it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

sin(x)

Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  3.249923
1  4.116570
2  2.599939
3  0.216662
4  3.683247
5  0.000000
6  1.733292
7  5.416539
8  2.816600
9  3.683247, experiment_data=           x         y
0   0.433323  0.572248
1   4.983216 -1.483542
2   4.116570 -0.452463
3   2.816600  0.789584
4   2.599939 -0.459964
5   5.416539 -1.413252
6   0.433323  0.483809
7   4.333231 -1.087098
8   1.299969  0.955149
9   0.433323 -0.006633
10  3.249923  0.013996
11  4.116570 -0.488600
12  2.599939  0.222789
13  0.216662 -0.239366
14  3.683247 -1.511473
15  0.000000  0.485811
16  1.733292  0.995155
17  5.416539 -0.659296
18  2.816600 -0.072496
19  3.683247  0.097695, models=[sin(x), sin(x)])

Custom Experimentalists¶

Experimentalists must be implemented as functions. For instance, an experimentalist sampler function expects a pool of experimental conditions and returns a modified set of experimental conditions.

Requirements for working with the state:

The function has a variables argument that accepts the VariableCollection type
The function has a conditions argument that accepts a pandas.DataFrame
The function returns a pandas.DataFrame

The custom uniform_sampler below will select conditions that are the least represented in the data.

Note that when building custom experimentalists, we can either wrap the function with on_state(output=['conditions']) as we did in tutorial III, or else we can use the @on_state(output=['conditions']) decorator.

In [ ]:

Copied!





#==================================================================#
#                 Option 1 - Wrapping our Component                #
#==================================================================#

def uniform_sample(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):

    """
    An experimentalist that selects the least represented datapoints
    """
    #Set rng seed
    rng = np.random.default_rng(random_state)

    #Retrieve the possible values
    allowed_values = variables.independent_variables[0].allowed_values
    
    #Determine the representation of each value
    conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
    
    #Sort to determine the least represented values
    conditions_sort = conditions_count.argsort()
    
    conditions_count = conditions_count[conditions_sort]
    values_count = allowed_values[conditions_sort]
    
    #Sample from values with the smallest frequency
    x = values_count[conditions_count<=conditions_count[num_samples-1]]
    x = rng.choice(x,num_samples)
    
    return pd.DataFrame({"x": x})

custom_experimentalist = on_state(uniform_sample, output=["conditions"])

#==================================================================#
#                   Option 2 - Using a Decorator                   #
#==================================================================#

@on_state(output=["conditions"])
def custom_experimentalist(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):

    """
    An experimentalist that selects the least represented datapoints
    """
    #Set rng seed
    rng = np.random.default_rng(random_state)

    #Retrieve the possible values
    allowed_values = variables.independent_variables[0].allowed_values
    
    #Determine the representation of each value
    conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
    
    #Sort to determine the least represented values
    conditions_sort = conditions_count.argsort()
    
    conditions_count = conditions_count[conditions_sort]
    values_count = allowed_values[conditions_sort]
    
    #Sample from values with the smallest frequency
    x = values_count[conditions_count<=conditions_count[num_samples-1]]
    x = rng.choice(x,num_samples)
    
    return pd.DataFrame({"x": x})
#==================================================================#
#                 Option 1 - Wrapping our Component                #
#==================================================================#

def uniform_sample(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):

    """
    An experimentalist that selects the least represented datapoints
    """
    #Set rng seed
    rng = np.random.default_rng(random_state)

    #Retrieve the possible values
    allowed_values = variables.independent_variables[0].allowed_values
    
    #Determine the representation of each value
    conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
    
    #Sort to determine the least represented values
    conditions_sort = conditions_count.argsort()
    
    conditions_count = conditions_count[conditions_sort]
    values_count = allowed_values[conditions_sort]
    
    #Sample from values with the smallest frequency
    x = values_count[conditions_count<=conditions_count[num_samples-1]]
    x = rng.choice(x,num_samples)
    
    return pd.DataFrame({"x": x})

custom_experimentalist = on_state(uniform_sample, output=["conditions"])

#==================================================================#
#                   Option 2 - Using a Decorator                   #
#==================================================================#

@on_state(output=["conditions"])
def custom_experimentalist(variables: VariableCollection, conditions: pd.DataFrame, num_samples: int = 1, random_state: Optional [int] = None):

    """
    An experimentalist that selects the least represented datapoints
    """
    #Set rng seed
    rng = np.random.default_rng(random_state)

    #Retrieve the possible values
    allowed_values = variables.independent_variables[0].allowed_values
    
    #Determine the representation of each value
    conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
    
    #Sort to determine the least represented values
    conditions_sort = conditions_count.argsort()
    
    conditions_count = conditions_count[conditions_sort]
    values_count = allowed_values[conditions_sort]
    
    #Sample from values with the smallest frequency
    x = values_count[conditions_count<=conditions_count[num_samples-1]]
    x = rng.choice(x,num_samples)
    
    return pd.DataFrame({"x": x})

Now, we will re-run our initial workflow while incorporating our custom experimentalist.

In [ ]:

Copied!





#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = custom_experimentalist(s, num_samples = 10, random_state=42+cycle) #Our custom experimentalist
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s,'sin(x)')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = custom_experimentalist(s, num_samples = 10, random_state=42+cycle) #Our custom experimentalist
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s,'sin(x)')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)

INFO:autora.theorist.bms.regressor:BMS fitting started

Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.416539
1  4.116570
2  3.249923
3  1.733292
4  1.949954
5  0.216662
6  0.433323
7  0.000000
8  1.083308
9  5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

100%|██████████| 100/100 [00:04<00:00, 23.27it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

-0.21

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:05<00:00, 19.52it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

cos((-1.49 + x))

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:04<00:00, 20.38it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

sin(x)

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:05<00:00, 18.74it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

sin(x)

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:04<00:00, 20.01it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

sin(x)

Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  4.983216
1  5.849862
2  3.466585
3  4.116570
4  1.733292
5  3.249923
6  3.249923
7  5.199877
8  3.683247
9  4.333231, experiment_data=           x         y
0   5.849862 -0.267531
1   1.516631  0.478541
2   2.383277  1.062925
3   3.683247 -0.045271
4   3.683247 -1.491071
5   2.599939 -0.135536
6   5.849862 -0.355969
7   2.383277  0.529578
8   4.766554 -1.006934
9   5.849862 -0.846411
10  2.816600  0.441416
11  1.949954  1.268066
12  3.466585 -0.612066
13  5.633201 -1.059511
14  3.033262 -0.887800
15  0.000000  0.485811
16  4.333231 -0.920648
17  0.649985  0.708040
18  6.066524 -0.606768
19  2.166616  1.440938
20  1.299969  1.686531
21  3.683247 -0.464401
22  5.849862 -0.256513
23  4.983216 -0.395319
24  1.299969  1.375672
25  2.383277  0.977178
26  3.899908 -0.877285
27  4.549893 -1.496263
28  5.416539 -0.574078
29  4.766554 -1.254513
30  1.949954  0.700044
31  0.216662 -0.096459
32  0.866646  0.829807
33  0.649985  1.027126
34  0.649985  0.530925
35  6.283185  0.131242
36  2.166616  1.093523
37  1.516631  1.343437
38  0.649985  0.159225
39  3.033262  0.342529
40  4.983216 -1.215008
41  5.849862  0.189056
42  3.466585 -0.454861
43  4.116570 -0.462597
44  1.733292  0.402174
45  3.249923 -0.822852
46  3.249923 -0.119755
47  5.199877 -1.107408
48  3.683247 -0.471516
49  4.333231 -0.666329, models=[sin(x), sin(x), sin(x), sin(x), sin(x)])

Custom Experiment Runner¶

Experiment runners must be implemented as functions.

Requirements for working with the state:

The function has a conditions argument that accepts a pandas.DataFrame
The function returns a pandas.DataFrame

The custom quadratic_experiment below will apply a quadratic transform (x + x**2) to the conditions.

Note that when building custom experiment runners, we can either wrap the function with on_state(output=['experiment_data']) as we did in tutorial III, or else we can use the @on_state(output=['experiment_data']) decorator.

In [ ]:

Copied!





#==================================================================#
#                 Option 1 - Wrapping our Component                #
#==================================================================#

def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
    
    #Set rng seed
    rng = np.random.default_rng(random_state)
    
    #Extract conditions
    x = conditions["x"]
    
    #Compute data
    y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
    
    #Assign to dataframe
    observations = conditions.assign(y = y)
    
    return observations

custom_experiment_runner = on_state(quadratic_experiment, output=["experiment_data"])

#==================================================================#
#                   Option 2 - Using a Decorator                   #
#==================================================================#

@on_state(output=["experiment_data"])
def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
    
    #Set rng seed
    rng = np.random.default_rng(random_state)
    
    #Extract conditions
    x = conditions["x"]
    
    #Compute data
    y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
    
    #Assign to dataframe
    observations = conditions.assign(y = y)
    
    return observations
#==================================================================#
#                 Option 1 - Wrapping our Component                #
#==================================================================#

def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
    
    #Set rng seed
    rng = np.random.default_rng(random_state)
    
    #Extract conditions
    x = conditions["x"]
    
    #Compute data
    y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
    
    #Assign to dataframe
    observations = conditions.assign(y = y)
    
    return observations

custom_experiment_runner = on_state(quadratic_experiment, output=["experiment_data"])

#==================================================================#
#                   Option 2 - Using a Decorator                   #
#==================================================================#

@on_state(output=["experiment_data"])
def quadratic_experiment(conditions: pd.DataFrame, added_noise: int = 0.01, random_state: Optional[int] = None):
    
    #Set rng seed
    rng = np.random.default_rng(random_state)
    
    #Extract conditions
    x = conditions["x"]
    
    #Compute data
    y = (x + x**2) + rng.normal(0, added_noise, size=x.shape)
    
    #Assign to dataframe
    observations = conditions.assign(y = y)
    
    return observations

Now, we will re-run our initial workflow while incorporating our custom experiment runner.

In [ ]:

Copied!





#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = experimentalist(s, num_samples = 10, random_state=42+cycle)
    s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s, 'x + x**2')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = experimentalist(s, num_samples = 10, random_state=42+cycle)
    s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = theorist(s)
    
    plot_from_state(s, 'x + x**2')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)

INFO:autora.theorist.bms.regressor:BMS fitting started

Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.416539
1  4.116570
2  3.249923
3  1.733292
4  1.949954
5  0.216662
6  0.433323
7  0.000000
8  1.083308
9  5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])

100%|██████████| 100/100 [00:06<00:00, 15.44it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

(x + ((x * x) * 0.99))

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:06<00:00, 14.67it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

((x + 0.45) ** 2)

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:06<00:00, 15.63it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

((x * (x + x)) ** 0.87)

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:05<00:00, 17.21it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

(x / (1.0 / (x + 1.0)))

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:06<00:00, 15.98it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

((x + 1.0) * x)

Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  3.249923
1  5.849862
2  1.299969
3  0.433323
4  3.466585
5  1.733292
6  1.516631
7  3.899908
8  0.866646
9  6.066524, experiment_data=           x          y
0   0.433323   0.773451
1   4.983216  29.295665
2   4.116570  21.437941
3   2.816600  11.220120
4   2.599939   8.384103
5   5.416539  34.104345
6   0.433323   0.685012
7   4.333231  22.952003
8   1.299969   2.981489
9   0.433323   0.194570
10  3.249923  13.934041
11  4.116570  21.401805
12  2.599939   9.066856
13  0.216662  -0.190733
14  3.683247  16.253633
15  0.000000   0.485811
16  1.733292   4.745924
17  5.416539  34.858300
18  2.816600  10.358040
19  3.683247  17.862801
20  4.333231  23.833106
21  0.649985   1.123617
22  5.199877  32.401980
23  1.516631   4.385032
24  4.116570  21.474838
25  2.599939   9.649098
26  0.433323   0.431507
27  6.283185  45.252166
28  3.466585  15.671881
29  0.866646   1.361742
30  5.849862  39.841817
31  3.683247  16.938122
32  4.549893  25.319062
33  3.249923  14.233877
34  3.249923  13.737676
35  4.766554  27.617837
36  4.549893  25.517251
37  5.199877  32.583507
38  3.466585  15.037847
39  3.249923  14.046336
40  3.249923  13.560468
41  5.849862  40.679695
42  1.299969   2.854330
43  0.433323   0.986184
44  3.466585  14.899144
45  1.733292   4.022862
46  1.516631   3.805164
47  3.899908  18.885295
48  0.866646   1.661760
49  6.066524  43.131882, models=[((x + 1.0) * x), ((x + 1.0) * x), ((x + 1.0) * x), ((x + 1.0) * x), ((x + 1.0) * x)])

Custom Theorists¶

Theorists must be implemented as classes that inherit from sklearn.base.BaseEstimator. The class must implement the following methods:

fit(self, conditions, observations)
predict(self, conditions)

Requirements for working with the state:

The fit module function has a conditions argument that accepts a pandas.DataFrame
The fit module function has an observations argument that accepts a pandas.DataFrame
the fit function returns self (i.e., the model itself)

The custom PolynomialRegressor below fits a polynomial of a specified degree.

In [ ]:

Copied!





import numpy as np
from sklearn.base import BaseEstimator

class PolynomialRegressor(BaseEstimator):

    def __init__(self, degree: int = 3):
        self.degree = degree

    def fit(self, conditions: pd.DataFrame, observations: pd.DataFrame):
        c = np.array(conditions)
        o = np.array(observations)

        # polyfit expects a 1D array
        if c.ndim > 1:
            c = c.flatten()

        if o.ndim > 1:
            o = o.flatten()

        # fit polynomial
        self.coeff = np.polyfit(c, o, self.degree)
        self.polynomial = np.poly1d(self.coeff)
        return self

    def predict(self, conditions: pd.DataFrame):
        c = np.array(conditions)
        return self.polynomial(c)
    
custom_theorist = estimator_on_state(PolynomialRegressor())
import numpy as np
from sklearn.base import BaseEstimator

class PolynomialRegressor(BaseEstimator):

    def __init__(self, degree: int = 3):
        self.degree = degree

    def fit(self, conditions: pd.DataFrame, observations: pd.DataFrame):
        c = np.array(conditions)
        o = np.array(observations)

        # polyfit expects a 1D array
        if c.ndim > 1:
            c = c.flatten()

        if o.ndim > 1:
            o = o.flatten()

        # fit polynomial
        self.coeff = np.polyfit(c, o, self.degree)
        self.polynomial = np.poly1d(self.coeff)
        return self

    def predict(self, conditions: pd.DataFrame):
        c = np.array(conditions)
        return self.polynomial(c)
    
custom_theorist = estimator_on_state(PolynomialRegressor())

Now, we will re-run our initial workflow while incorporating our custom theorist.

In [ ]:

Copied!





#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = experimentalist(s, num_samples=10, random_state=42+cycle)
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = custom_theorist(s)
    
    print(s.models[-1])
    plot_from_state(s, 'sin(x)')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = experimentalist(s, num_samples=10, random_state=42+cycle)
    s = experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = custom_theorist(s)
    
    print(s.models[-1])
    plot_from_state(s, 'sin(x)')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)

Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.416539
1  4.116570
2  3.249923
3  1.733292
4  1.949954
5  0.216662
6  0.433323
7  0.000000
8  1.083308
9  5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
PolynomialRegressor()
PolynomialRegressor()

PolynomialRegressor()
PolynomialRegressor()

PolynomialRegressor()
PolynomialRegressor()

PolynomialRegressor()
PolynomialRegressor()

PolynomialRegressor()
PolynomialRegressor()

Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  3.249923
1  5.849862
2  1.299969
3  0.433323
4  3.466585
5  1.733292
6  1.516631
7  3.899908
8  0.866646
9  6.066524, experiment_data=           x         y
0   0.433323  0.572248
1   4.983216 -1.483542
2   4.116570 -0.452463
3   2.816600  0.789584
4   2.599939 -0.459964
5   5.416539 -1.413252
6   0.433323  0.483809
7   4.333231 -1.087098
8   1.299969  0.955149
9   0.433323 -0.006633
10  3.249923  0.013996
11  4.116570 -0.488600
12  2.599939  0.222789
13  0.216662 -0.239366
14  3.683247 -1.511473
15  0.000000  0.485811
16  1.733292  0.995155
17  5.416539 -0.659296
18  2.816600 -0.072496
19  3.683247  0.097695
20  4.333231 -0.205995
21  0.649985  0.656327
22  5.199877 -0.720136
23  1.516631  1.566765
24  4.116570 -0.415567
25  2.599939  0.805032
26  0.433323  0.230304
27  6.283185 -0.509437
28  3.466585 -0.131217
29  0.866646  0.506182
30  5.849862 -0.648822
31  3.683247 -0.826983
32  4.549893 -0.919182
33  3.249923  0.313832
34  3.249923 -0.182368
35  4.766554 -0.867292
36  4.549893 -0.720993
37  5.199877 -0.538608
38  3.466585 -0.765251
39  3.249923  0.126291
40  3.249923 -0.359577
41  5.849862  0.189056
42  1.299969  0.827990
43  0.433323  0.784981
44  3.466585 -0.903954
45  1.733292  0.272093
46  1.516631  0.986897
47  3.899908 -0.911596
48  0.866646  0.806200
49  6.066524  0.047677, models=[PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor()])

Altogether Now¶

We have now created custom experimentalists, experiment runners, and theorists. Let's add them all to the same workflow to see our first fully customized autora workflow.

In [ ]:

Copied!





#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = custom_experimentalist(s, num_samples=10, random_state=42+cycle)
    s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = custom_theorist(s)
    
    plot_from_state(s, 'x + x**2')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)
#### First, let's reinitialize the state object to get a clean state ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 30))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

conditions = random_pool(variables, num_samples=10, random_state=0)

s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

#Report previous state
print('\033[1mPrevious State:\033[0m')
print(s)

#Cycle
for cycle in range(5):
    s = custom_experimentalist(s, num_samples=10, random_state=42+cycle)
    s = custom_experiment_runner(s, added_noise=0.5, random_state=42+cycle)
    s = custom_theorist(s)
    
    plot_from_state(s, 'x + x**2')

#Report updated state
print('\n\033[1mUpdated State:\033[0m')
print(s)

Previous State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.416539
1  4.116570
2  3.249923
3  1.733292
4  1.949954
5  0.216662
6  0.433323
7  0.000000
8  1.083308
9  5.199877, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])
PolynomialRegressor()

PolynomialRegressor()

PolynomialRegressor()

PolynomialRegressor()

PolynomialRegressor()

Updated State:
StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  4.983216
1  5.849862
2  3.466585
3  4.116570
4  1.733292
5  3.249923
6  3.249923
7  5.199877
8  3.683247
9  4.333231, experiment_data=           x          y
0   5.849862  40.223108
1   1.516631   3.296808
2   2.383277   8.438513
3   3.683247  17.719834
4   3.683247  16.274034
5   2.599939   8.708530
6   5.849862  40.134670
7   2.383277   7.905166
8   4.766554  27.478194
9   5.849862  39.644228
10  2.816600  10.871952
11  1.949954   6.091364
12  3.466585  15.191032
13  5.633201  36.911813
14  3.033262  11.238020
15  0.000000   0.485811
16  4.333231  23.118453
17  0.649985   1.175330
18  6.066524  42.477437
19  2.166616   7.474088
20  1.299969   3.712871
21  3.683247  17.300704
22  5.849862  40.234126
23  4.983216  30.383888
24  1.299969   3.402012
25  2.383277   8.352766
26  3.899908  18.919606
27  4.549893  24.741981
28  5.416539  34.943519
29  4.766554  27.230615
30  1.949954   5.523342
31  0.216662  -0.047826
32  0.866646   1.685367
33  0.649985   1.494416
34  0.649985   0.998215
35  6.283185  45.892845
36  2.166616   7.126673
37  1.516631   4.161704
38  0.649985   0.626515
39  3.033262  12.468350
40  4.983216  29.564199
41  5.849862  40.679695
42  3.466585  15.348237
43  4.116570  21.427807
44  1.733292   4.152943
45  3.249923  13.097192
46  3.249923  13.800289
47  5.199877  32.014707
48  3.683247  17.293589
49  4.333231  23.372772, models=[PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor(), PolynomialRegressor()])

Let's run the controller with the new theorist for 3 research cycles, defined by the number of models generated.

Help¶

We hope that this tutorial helped demonstrate the fundamental components of autora, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments. We encourage you to explore other tutorials and check out the documentation.

If you encounter any issues, bugs, or questions, please reach out to us through the AutoRA Forum. Feel free to report any bugs by creating an issue in the AutoRA repository.

You may also post questions directly into the User Q&A Section.