AutoRA (Automated Research Assistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.
This notebook is the second of four notebooks within the basic tutorials of autora
. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to run the content of the last notebook for your current notebook.
These notebooks provide a comprehensive introduction to the capabilities of autora
. It demonstrates the fundamental components of autora
, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.
How to use this notebook You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.
Tutorial Setup¶
This tutorial is self-contained so that you do not need to run the previous notebook to begin. However, the four notebooks are continuous so that what we define in a previous notebook should still exist within this notebook. As such, we will here re-run relevant code from past tutorials. We will not again walk you through these, but if you need a reminder what they are then go see the descriptions in previous notebooks.
#### Installation ####
!pip install -q "autora[experimentalist-falsification]"
!pip install -q "autora[experimentalist-model-disagreement]"
!pip install -q "autora[theorist-bms]"
#### Import modules ####
import numpy as np
import torch
from autora.variable import Variable, ValueType, VariableCollection
from autora.experimentalist.random import random_pool
from autora.experimentalist.falsification import falsification_sample
from autora.experimentalist.model_disagreement import model_disagreement_sample
from autora.theorist.bms import BMSRegressor
from sklearn import linear_model
#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)
#### Define ground truth and experiment runner ####
ground_truth = lambda x: np.sin(x)
run_experiment = lambda x: ground_truth(x) + np.random.normal(0, 0.1, size=x.shape)
#### Define condition pool ####
condition_pool = np.linspace(0, 2 * np.pi, 100)
#### Define variables ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=condition_pool)
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])
#### Define theorists ####
theorist_lr = linear_model.LinearRegression()
theorist_bms = BMSRegressor(epochs=100)
WARNING: typer 0.12.3 does not provide the extra 'all' WARNING: typer 0.12.3 does not provide the extra 'all' WARNING: typer 0.12.3 does not provide the extra 'all'
Loop Constructs¶
After defining all the components required for the empirical research process, we can create an automated workflow using basic loop constructs in Python.
The following code block demonstrates how to build such a workflow using the components introduced in the preceding notebook, such as
variables
(object specifying variables of the experiment)run_experiment
(function for collecting data)theorist_bms
(scikit learn estimator for discovering equations using the Bayesian Machine Scientist)random_pool
(function for generating a random pool of experimental conditions)falsification_sample
(function for identifying novel experiment conditions using the falsification sampler)
We begin with implementing the following workflow:
- Generate 3 seed experimental conditions using
random_pool
- Generate 3 seed observations using
run_experiment
- Loop through the following steps 5 times
- Identify a model relating conditions to observations using
theorist_bms
- Identify 3 new experimental conditions using
falsification_sample
- Collect 3 new observations using
run_experiment
- Add new conditions and observations to the dataset
- Identify a model relating conditions to observations using
We will here begin using the naming convention cycle
to refer to an entire AutoRA loop where the loop encounters all AutoRA components - experiment runner, theorist, experimentalist. Within the scientific method, a cycle would then be running a single iteration of the experiment. This requires the collection of data, the modelling of that data, and the conceptualization of the next iteration of this experiment. For example, if our research concerns how much information a person acquires from a photo (dependent variable) dependent on how bright the photo is (independent variable), we may first collect data with conditions of (let's say) 10%, 50%, and 90% brightness, then model our collected data to determine the relationship between brightness and photo perception, and finally determine which other brightness conditions may help us understand the true relationship. Probing other conditions - such as a brightness of 25% and of 75% would then be the next iteration of the experiment and thus, for us, the next cycle. The following code block will iterate through five of these cycles.
Example 1: Falsification Sampler¶
num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle
# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
num_samples=measurements_per_cycle)
# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)
# collect initial set of observations
observations = run_experiment(conditions)
for cycle in range(num_cycles):
# use BMS theorist to fit the model to the data
theorist_bms.fit(conditions, observations)
# obtain new conditions
new_conditions = falsification_sample(
conditions=condition_pool,
model=theorist_bms,
reference_conditions=conditions,
reference_observations=observations,
metadata=variables,
num_samples=measurements_per_cycle,
)
# obtain new observations
print(new_conditions)
new_observations = run_experiment(new_conditions)
# combine old and new conditions and observations
conditions = np.concatenate((conditions, new_conditions))
observations = np.concatenate((observations, new_observations))
# evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
print("Loss in cycle {}: {}".format(cycle, loss))
print("Discovered Model: " + theorist_bms.repr())
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 21.43it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
[[0. ] [0.06346652] [0.12693304]] Loss in cycle 0: 0.99 Discovered Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 22.62it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
[[0. ] [0.44426563] [0.38079911]] Loss in cycle 1: 0.99 Discovered Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 20.91it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
[[0. ] [0.57119866] [0.63466518]] Loss in cycle 2: 0.99 Discovered Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 20.60it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
[[0. ] [6.28318531] [6.21971879]] Loss in cycle 3: 0.99 Discovered Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 22.85it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
[[6.28318531] [6.21971879] [6.15625227]] Loss in cycle 4: 0.99 Discovered Model: sin(X0)
Example 2: Model Disagreement Sampler¶
We can easily replace components in the workflow above.
In the following code block, we add a linear regression theorist, to fit a linear model to the data. In addition, we replace falsification_sample
with model_disagreement_sample
to sample experimental conditions that differentiate most between the linear model and the model discovered by the BMS theorist.
num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle
# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
num_samples=measurements_per_cycle)
# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)
# collect initial set of observations
observations = run_experiment(conditions)
for cycle in range(num_cycles):
# use BMS theorist to fit the model to the data
theorist_bms.fit(conditions, observations)
theorist_lr.fit(conditions, observations)
# obtain new conditions
new_conditions = model_disagreement_sample(
condition_pool,
models = [theorist_bms, theorist_lr],
num_samples = measurements_per_cycle
)
# obtain new observations
print(new_conditions)
new_observations = run_experiment(new_conditions)
# combine old and new conditions and observations
conditions = np.concatenate((conditions, new_conditions))
observations = np.concatenate((observations, new_observations))
# evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
print("Loss in cycle {}: {}".format(cycle, loss))
print("Discovered BMS Model: " + theorist_bms.repr())
INFO:autora.theorist.bms.regressor:BMS fitting started 100%|██████████| 100/100 [00:04<00:00, 21.66it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
0 0 0.000000 1 0.063467 2 0.126933 Loss in cycle 0: 0.5027103676355225 Discovered BMS Model: -0.09
100%|██████████| 100/100 [00:04<00:00, 21.51it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
0 26 1.650129 25 1.586663 27 1.713596 Loss in cycle 1: 0.99 Discovered BMS Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 21.92it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
0 73 4.633056 74 4.696522 72 4.569589 Loss in cycle 2: 0.99 Discovered BMS Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 21.94it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished INFO:autora.theorist.bms.regressor:BMS fitting started
0 28 1.777063 27 1.713596 29 1.840529 Loss in cycle 3: 0.99 Discovered BMS Model: sin(X0)
100%|██████████| 100/100 [00:04<00:00, 24.37it/s] INFO:autora.theorist.bms.regressor:BMS fitting finished
0 71 4.506123 72 4.569589 70 4.442656 Loss in cycle 4: 0.99 Discovered BMS Model: sin(X0)
Next Notebook¶
While the basic loop construct is flexible, there are more convenient ways to specify a research cycle in autora
. The next notebook, AutoRA Basic Tutorial III: Functional Workflow, illustrates the use of these constructs.