Introduction¶

Basic Tutorial II: Loop Constructs¶

AutoRA (Automated Research Assistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.

This notebook is the second of four notebooks within the basic tutorials of autora. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to run the content of the last notebook for your current notebook. We will here provide a link to each notebook, but we will also provide a link at the end of each notebook to navigate you to the next notebook.

AutoRA Basic Tutorial I: Components
AutoRA Basic Tutorial II: Loop Constructs
AutoRA Basic Tutorial III: Functional Workflow
AutoRA Basic Tutorial IV: Customization

These notebooks provide a comprehensive introduction to the capabilities of autora. It demonstrates the fundamental components of autora, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.

How to use this notebook You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.

Tutorial Setup¶

This tutorial is self-contained so that you do not need to run the previous notebook to begin. However, the four notebooks are continuous so that what we define in a previous notebook should still exist within this notebook. As such, we will here re-run relevant code from past tutorials. We will not again walk you through these, but if you need a reminder what they are then go see the descriptions in previous notebooks.

In [ ]:

Copied!





#### Installation ####
!pip install -q "autora[experimentalist-falsification]"
!pip install -q "autora[experimentalist-model-disagreement]"
!pip install -q "autora[theorist-bms]"

#### Import modules ####
import numpy as np
import torch
from autora.variable import Variable, ValueType, VariableCollection
from autora.experimentalist.random import random_pool
from autora.experimentalist.falsification import falsification_sample
from autora.experimentalist.model_disagreement import model_disagreement_sample
from autora.theorist.bms import BMSRegressor
from sklearn import linear_model

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

#### Define ground truth and experiment runner ####
ground_truth = lambda x: np.sin(x)
run_experiment = lambda x: ground_truth(x) + np.random.normal(0, 0.1, size=x.shape)

#### Define condition pool ####
condition_pool = np.linspace(0, 2 * np.pi, 100)

#### Define variables ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=condition_pool)
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define theorists ####
theorist_lr = linear_model.LinearRegression()
theorist_bms = BMSRegressor(epochs=100)
#### Installation ####
!pip install -q "autora[experimentalist-falsification]"
!pip install -q "autora[experimentalist-model-disagreement]"
!pip install -q "autora[theorist-bms]"

#### Import modules ####
import numpy as np
import torch
from autora.variable import Variable, ValueType, VariableCollection
from autora.experimentalist.random import random_pool
from autora.experimentalist.falsification import falsification_sample
from autora.experimentalist.model_disagreement import model_disagreement_sample
from autora.theorist.bms import BMSRegressor
from sklearn import linear_model

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

#### Define ground truth and experiment runner ####
ground_truth = lambda x: np.sin(x)
run_experiment = lambda x: ground_truth(x) + np.random.normal(0, 0.1, size=x.shape)

#### Define condition pool ####
condition_pool = np.linspace(0, 2 * np.pi, 100)

#### Define variables ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=condition_pool)
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define theorists ####
theorist_lr = linear_model.LinearRegression()
theorist_bms = BMSRegressor(epochs=100)

[notice] A new release of pip is available: 23.2 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
WARNING: autora 3.1.1 does not provide the extra 'experimentalist-model-disagreement'

[notice] A new release of pip is available: 23.2 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 23.2 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

Loop Constructs¶

After defining all the components required for the empirical research process, we can create an automated workflow using basic loop constructs in Python.

The following code block demonstrates how to build such a workflow using the components introduced in the preceding notebook, such as

variables (object specifying variables of the experiment)
run_experiment (function for collecting data)
theorist_bms (scikit learn estimator for discovering equations using the Bayesian Machine Scientist)
random_pool (function for generating a random pool of experimental conditions)
falsification_sample (function for identifying novel experiment conditions using the falsification sampler)

We begin with implementing the following workflow:

Generate 3 seed experimental conditions using random_pool
Generate 3 seed observations using run_experiment
Loop through the following steps 5 times
- Identify a model relating conditions to observations using theorist_bms
- Identify 3 new experimental conditions using falsification_sample
- Collect 3 new observations using run_experiment
- Add new conditions and observations to the dataset

We will here begin using the naming convention cycle to refer to an entire AutoRA loop where the loop encounters all AutoRA components - experiment runner, theorist, experimentalist. Within the scientific method, a cycle would then be running a single iteration of the experiment. This requires the collection of data, the modelling of that data, and the conceptualization of the next iteration of this experiment. For example, if our research concerns how much information a person acquires from a photo (dependent variable) dependent on how bright the photo is (independent variable), we may first collect data with conditions of (let's say) 10%, 50%, and 90% brightness, then model our collected data to determine the relationship between brightness and photo perception, and finally determine which other brightness conditions may help us understand the true relationship. Probing other conditions - such as a brightness of 25% and of 75% would then be the next iteration of the experiment and thus, for us, the next cycle. The following code block will iterate through five of these cycles.

Example 1: Falsification Sampler¶

In [ ]:

Copied!





num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle

# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
                         num_samples=measurements_per_cycle)

# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)

# collect initial set of observations
observations = run_experiment(conditions)

for cycle in range(num_cycles):

  # use BMS theorist to fit the model to the data
  theorist_bms.fit(conditions, observations)

  # obtain new conditions
  new_conditions = falsification_sample(
          conditions=condition_pool,
          model=theorist_bms,
          reference_conditions=conditions,
          reference_observations=observations,
          metadata=variables,
          num_samples=measurements_per_cycle,
      )

  # obtain new observations
  print(new_conditions)
  new_observations = run_experiment(new_conditions)

  # combine old and new conditions and observations
  conditions = np.concatenate((conditions, new_conditions))
  observations = np.concatenate((observations, new_observations))

  # evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
  loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
  print("Loss in cycle {}: {}".format(cycle, loss))
  print("Discovered Model: " +  theorist_bms.repr())
num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle

# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
                         num_samples=measurements_per_cycle)

# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)

# collect initial set of observations
observations = run_experiment(conditions)

for cycle in range(num_cycles):

  # use BMS theorist to fit the model to the data
  theorist_bms.fit(conditions, observations)

  # obtain new conditions
  new_conditions = falsification_sample(
          conditions=condition_pool,
          model=theorist_bms,
          reference_conditions=conditions,
          reference_observations=observations,
          metadata=variables,
          num_samples=measurements_per_cycle,
      )

  # obtain new observations
  print(new_conditions)
  new_observations = run_experiment(new_conditions)

  # combine old and new conditions and observations
  conditions = np.concatenate((conditions, new_conditions))
  observations = np.concatenate((observations, new_observations))

  # evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
  loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
  print("Loss in cycle {}: {}".format(cycle, loss))
  print("Discovered Model: " +  theorist_bms.repr())

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:04<00:00, 23.65it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started

[[6.28318531]
 [6.21971879]
 [6.15625227]]
Loss in cycle 0: 1.0728386415534816
Discovered Model: 0.76

100%|██████████| 100/100 [00:04<00:00, 23.15it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started

[[6.28318531]
 [6.21971879]
 [1.3962634 ]]
Loss in cycle 1: 0.99
Discovered Model: sin(X0)

100%|██████████| 100/100 [00:04<00:00, 20.20it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started

[[4.25225672]
 [4.31572324]
 [4.1887902 ]]
Loss in cycle 2: inf
Discovered Model: (1.29 / X0)

100%|██████████| 100/100 [00:04<00:00, 24.55it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started

[[0.        ]
 [0.06346652]
 [0.12693304]]
Loss in cycle 3: 0.4973522295525592
Discovered Model: 0.05

100%|██████████| 100/100 [00:04<00:00, 24.49it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

[[0.06346652]
 [0.12693304]
 [0.19039955]]
Loss in cycle 4: 0.99
Discovered Model: sin(X0)

Example 2: Model Disagreement Sampler¶

We can easily replace components in the workflow above.

In the following code block, we add a linear regression theorist, to fit a linear model to the data. In addition, we replace falsification_sample with model_disagreement_sample to sample experimental conditions that differentiate most between the linear model and the model discovered by the BMS theorist.

In [ ]:

Copied!





num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle

# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
                         num_samples=measurements_per_cycle)

# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)

# collect initial set of observations
observations = run_experiment(conditions)

for cycle in range(num_cycles):

  # use BMS theorist to fit the model to the data
  theorist_bms.fit(conditions, observations)
  theorist_lr.fit(conditions, observations)

  # obtain new conditions
  new_conditions = model_disagreement_sample(
          condition_pool,
          models = [theorist_bms, theorist_lr],
          num_samples = measurements_per_cycle
      )

  # obtain new observations
  print(new_conditions)
  new_observations = run_experiment(new_conditions)

  # combine old and new conditions and observations
  conditions = np.concatenate((conditions, new_conditions))
  observations = np.concatenate((observations, new_observations))

  # evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
  loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
  print("Loss in cycle {}: {}".format(cycle, loss))
  print("Discovered BMS Model: " +  theorist_bms.repr())
num_cycles = 5 # number of empirical research cycles
measurements_per_cycle = 3 # number of data points to collect for each cycle

# generate an initial set of experimental conditions
conditions = random_pool(variables=variables,
                         num_samples=measurements_per_cycle)

# convert iterator into 2-dimensional numpy array
conditions = np.array(list(conditions.values)).reshape(-1, 1)

# collect initial set of observations
observations = run_experiment(conditions)

for cycle in range(num_cycles):

  # use BMS theorist to fit the model to the data
  theorist_bms.fit(conditions, observations)
  theorist_lr.fit(conditions, observations)

  # obtain new conditions
  new_conditions = model_disagreement_sample(
          condition_pool,
          models = [theorist_bms, theorist_lr],
          num_samples = measurements_per_cycle
      )

  # obtain new observations
  print(new_conditions)
  new_observations = run_experiment(new_conditions)

  # combine old and new conditions and observations
  conditions = np.concatenate((conditions, new_conditions))
  observations = np.concatenate((observations, new_observations))

  # evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
  loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))
  print("Loss in cycle {}: {}".format(cycle, loss))
  print("Discovered BMS Model: " +  theorist_bms.repr())

INFO:autora.theorist.bms.regressor:BMS fitting started
100%|██████████| 100/100 [00:04<00:00, 23.55it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished

           0
99  6.283185
98  6.219719
97  6.156252

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10828\1882676081.py in ?()
     29   new_observations = run_experiment(new_conditions)
     30 
     31   # combine old and new conditions and observations
     32   conditions = np.concatenate((conditions, new_conditions))
---> 33   observations = np.concatenate((observations, new_observations.reshape(-1,1)))
     34 
     35   # evaluate model of the theorist based on its ability to predict each observation from the ground truth, evaluated across the entire space of experimental conditions
     36   loss = np.mean(np.square(theorist_bms.predict(condition_pool.reshape(-1,1)) - ground_truth(condition_pool)))

c:\Users\cwill\GitHub\virtualEnvs\autoraEnv\lib\site-packages\pandas\core\generic.py in ?(self, name)
   5985             and name not in self._accessors
   5986             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5987         ):
   5988             return self[name]
-> 5989         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'reshape'

Next Notebook¶

While the basic loop construct is flexible, there are more convenient ways to specify a research cycle in autora. The next notebook illustrates the use of these constructs.

Follow this link for the next notebook tutorial: AutoRA Basic Tutorial III: Functional Workflow