Experimentalist Pipeline Examples¶
This notebook demonstrates the use of the Pipeline
class to create Experimentalists. Experimentalists consist of two main components:
- Condition Generation - Creating combinations of independent variables to test
- Experimental Design - Ensuring conditions meet design constraints.
The Pipeline
class allows us to define a series of functions to generate and process a pool of conditions that conform to an experimental design.
# Uncomment the following line when running on Google Colab
# !pip install autora
import numpy as np
from autora.variable import DV, IV, ValueType, VariableCollection
from autora.experimentalist.pipeline import Pipeline
from autora.experimentalist.pooler.grid import grid_pool
from autora.experimentalist.sampler.random_sampler import random_sample
def weber_filter(values):
return filter(lambda s: s[0] <= s[1], values)
Implementation¶
The Pipeline
class consists of a series of steps:
- One or no "pool" steps which generate experimental conditions,
- An arbitrary number of steps to apply to the pool. Examples of steps may be:
- samplers
- conditional filters
- sequencers
Example 1: Exhaustive Pool With Random Sampler¶
The examples in this notebook will create a Weber line-lengths experiment. The Weber experiment tests human detection of differences between the lengths of two lines. The first example will sample a pool with simple random sampling. We will first define the independent and dependent variables (IVs and DVs, respectively).
# Specifying Dependent and Independent Variables
# Specify independent variables
iv1 = IV(
name="S1",
allowed_values=np.linspace(0, 5, 5),
units="intensity",
variable_label="Stimulus 1 Intensity",
)
iv2 = IV(
name="S2",
allowed_values=np.linspace(0, 5, 5),
units="intensity",
variable_label="Stimulus 2 Intensity",
)
# The experimentalist pipeline doesn't actually use DVs, they are just specified here for
# example.
dv1 = DV(
name="difference_detected",
value_range=(0, 1),
units="probability",
variable_label="P(difference detected)",
type=ValueType.PROBABILITY,
)
# Variable collection with ivs and dvs
metadata = VariableCollection(
independent_variables=[iv1, iv2],
dependent_variables=[dv1],
)
Next we set up the Pipeline
with three functions:
grid_pool
- Generates an exhaustive pool of condition combinations using the Cartesian product of discrete IV values.- The discrete IV values are specified with the
allowed_values
attribute when defining the IVs.
- The discrete IV values are specified with the
weber_filer
- Filter that selects the experimental design constraint where IV1 <= IV2.random_sample
- Samples the pool of conditions
Functions that require keyword inputs are initialized using the partial
function before passing into PoolPipeline
.
## Set up pipeline functions with the partial function
# Random Sampler
# Initialize the pipeline
pipeline_random_samp = Pipeline([
("grid_pool", grid_pool),
("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
("random_sample", random_sample)
],
{"grid_pool": {"ivs": metadata.independent_variables}, "random_sample": {"n": 10}}
)
pipeline_random_samp
Pipeline(steps=[('grid_pool', <function grid_pool at 0x1077bdf70>), ('weber_filer', <function weber_filter at 0x1077c8550>), ('random_sampler', <function random_sampler at 0x1077c8160>)], params={'grid_pool': {'ivs': [IV(name='S1', value_range=None, allowed_values=array([0. , 1.25, 2.5 , 3.75, 5. ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 1 Intensity', rescale=1, is_covariate=False), IV(name='S2', value_range=None, allowed_values=array([0. , 1.25, 2.5 , 3.75, 5. ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 2 Intensity', rescale=1, is_covariate=False)]}, 'random_sampler': {'n': 10}})
The pipleine can be run by calling the run
method.
The pipeline is run twice below to illustrate that random sampling is performed. Rerunning the cell will produce different results.
# Run the Pipeline
results1 = pipeline_random_samp.run()
results2 = pipeline_random_samp.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
f'Run 2: {results2}')
Sampled Conditions: Run 1: [(3.75, 3.75), (1.25, 1.25), (1.25, 2.5), (3.75, 5.0), (2.5, 2.5), (1.25, 5.0), (2.5, 3.75), (0.0, 2.5), (0.0, 1.25), (5.0, 5.0)] Run 2: [(0.0, 5.0), (2.5, 5.0), (5.0, 5.0), (1.25, 1.25), (2.5, 3.75), (0.0, 1.25), (1.25, 3.75), (3.75, 3.75), (0.0, 0.0), (1.25, 2.5)]
An alternative method of passing an instantiated pool iterator is demonstrated below. Note the difference where grid_pool
is not initialized using the partial
function but instantiated before initializing the Pipeline
. grid_pool
returns an iterator of the exhaustive pool. This will result in unexpected behavior when the Pipeline is run multiple times.
## Set up pipeline functions with the partial function
# Pool Function
pooler_iterator = grid_pool(metadata.independent_variables)
# Initialize the pipeline
pipeline_random_samp2 = Pipeline(
[
("pool (iterator)", pooler_iterator),
("filter",weber_filter), # Filter that selects conditions with IV1 <= IV2
("sample", random_sample) # Sampler defined in the first implementation example
],
{"sample": {"n": 10}}
)
# Run the Pipeline
results1 = pipeline_random_samp2.run()
results2 = pipeline_random_samp2.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
f'Run 2: {results2}')
Sampled Conditions: Run 1: [(1.25, 1.25), (0.0, 5.0), (1.25, 5.0), (2.5, 3.75), (1.25, 2.5), (5.0, 5.0), (2.5, 5.0), (1.25, 3.75), (0.0, 1.25), (2.5, 2.5)] Run 2: []
Running the pipeline multiple times results in an empty list. This is because the iterator is exhausted after first run and no longer yields results. If the pipeline needs to be run multiple times, initializing the functions as a callable using the partial
function is recommended because the iterator will be initialized at the start of each run.
You could also use the scikit-learn "__" syntax to pass parameter sets into the pipeline:
pipeline_random_samp = Pipeline([
("grid_pool", grid_pool),
("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
("random_sample", random_sample)
],
{"grid_pool__ivs": metadata.independent_variables, "random_sample__n": 10}
)
pipeline_random_samp
Pipeline(steps=[('grid_pool', <function grid_pool at 0x1077bdf70>), ('weber_filer', <function weber_filter at 0x1077c8550>), ('random_sampler', <function random_sampler at 0x1077c8160>)], params={'grid_pool__ivs': [IV(name='S1', value_range=None, allowed_values=array([0. , 1.25, 2.5 , 3.75, 5. ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 1 Intensity', rescale=1, is_covariate=False), IV(name='S2', value_range=None, allowed_values=array([0. , 1.25, 2.5 , 3.75, 5. ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 2 Intensity', rescale=1, is_covariate=False)], 'random_sampler__n': 10})