Experimentalist Pipeline Examples¶

This notebook demonstrates the use of the Pipeline class to create Experimentalists. Experimentalists consist of two main components:

Condition Generation - Creating combinations of independent variables to test
Experimental Design - Ensuring conditions meet design constraints.

The Pipeline class allows us to define a series of functions to generate and process a pool of conditions that conform to an experimental design.

In [ ]:

Copied!

# Uncomment the following line when running on Google Colab
# !pip install autora
# Uncomment the following line when running on Google Colab
# !pip install autora

In [ ]:

Copied!





import numpy as np

from autora.variable import DV, IV, ValueType, VariableCollection
from autora.experimentalist.pipeline import Pipeline
from autora.experimentalist.pooler.grid import grid_pool
from autora.experimentalist.sampler.random_sampler import random_sample
import numpy as np

from autora.variable import DV, IV, ValueType, VariableCollection
from autora.experimentalist.pipeline import Pipeline
from autora.experimentalist.pooler.grid import grid_pool
from autora.experimentalist.sampler.random_sampler import random_sample

In [ ]:

Copied!

def weber_filter(values):
    return filter(lambda s: s[0] <= s[1], values)
def weber_filter(values):
    return filter(lambda s: s[0] <= s[1], values)

Implementation¶

The Pipeline class consists of a series of steps:

One or no "pool" steps which generate experimental conditions,
An arbitrary number of steps to apply to the pool. Examples of steps may be:
- samplers
- conditional filters
- sequencers

Example 1: Exhaustive Pool With Random Sampler¶

The examples in this notebook will create a Weber line-lengths experiment. The Weber experiment tests human detection of differences between the lengths of two lines. The first example will sample a pool with simple random sampling. We will first define the independent and dependent variables (IVs and DVs, respectively).

In [ ]:

Copied!





# Specifying  Dependent and Independent Variables
# Specify independent variables
iv1 = IV(
    name="S1",
    allowed_values=np.linspace(0, 5, 5),
    units="intensity",
    variable_label="Stimulus 1 Intensity",
)

iv2 = IV(
    name="S2",
    allowed_values=np.linspace(0, 5, 5),
    units="intensity",
    variable_label="Stimulus 2 Intensity",
)

# The experimentalist pipeline doesn't actually use DVs, they are just specified here for
# example.
dv1 = DV(
    name="difference_detected",
    value_range=(0, 1),
    units="probability",
    variable_label="P(difference detected)",
    type=ValueType.PROBABILITY,
)

# Variable collection with ivs and dvs
metadata = VariableCollection(
    independent_variables=[iv1, iv2],
    dependent_variables=[dv1],
)
# Specifying  Dependent and Independent Variables
# Specify independent variables
iv1 = IV(
    name="S1",
    allowed_values=np.linspace(0, 5, 5),
    units="intensity",
    variable_label="Stimulus 1 Intensity",
)

iv2 = IV(
    name="S2",
    allowed_values=np.linspace(0, 5, 5),
    units="intensity",
    variable_label="Stimulus 2 Intensity",
)

# The experimentalist pipeline doesn't actually use DVs, they are just specified here for
# example.
dv1 = DV(
    name="difference_detected",
    value_range=(0, 1),
    units="probability",
    variable_label="P(difference detected)",
    type=ValueType.PROBABILITY,
)

# Variable collection with ivs and dvs
metadata = VariableCollection(
    independent_variables=[iv1, iv2],
    dependent_variables=[dv1],
)

Next we set up the Pipeline with three functions:

grid_pool - Generates an exhaustive pool of condition combinations using the Cartesian product of discrete IV values.
- The discrete IV values are specified with the allowed_values attribute when defining the IVs.
weber_filer - Filter that selects the experimental design constraint where IV1 <= IV2.
random_sample - Samples the pool of conditions

Functions that require keyword inputs are initialized using the partial function before passing into PoolPipeline.

In [ ]:

Copied!





## Set up pipeline functions with the partial function
# Random Sampler

# Initialize the pipeline
pipeline_random_samp = Pipeline([
    ("grid_pool", grid_pool),
    ("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
    ("random_sample", random_sample)
],
    {"grid_pool": {"ivs": metadata.independent_variables}, "random_sample": {"n": 10}}
)
pipeline_random_samp
## Set up pipeline functions with the partial function
# Random Sampler

# Initialize the pipeline
pipeline_random_samp = Pipeline([
    ("grid_pool", grid_pool),
    ("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
    ("random_sample", random_sample)
],
    {"grid_pool": {"ivs": metadata.independent_variables}, "random_sample": {"n": 10}}
)
pipeline_random_samp

Out[ ]:

Pipeline(steps=[('grid_pool', <function grid_pool at 0x1077bdf70>), ('weber_filer', <function weber_filter at 0x1077c8550>), ('random_sampler', <function random_sampler at 0x1077c8160>)], params={'grid_pool': {'ivs': [IV(name='S1', value_range=None, allowed_values=array([0.  , 1.25, 2.5 , 3.75, 5.  ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 1 Intensity', rescale=1, is_covariate=False), IV(name='S2', value_range=None, allowed_values=array([0.  , 1.25, 2.5 , 3.75, 5.  ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 2 Intensity', rescale=1, is_covariate=False)]}, 'random_sampler': {'n': 10}})

The pipleine can be run by calling the run method.

The pipeline is run twice below to illustrate that random sampling is performed. Rerunning the cell will produce different results.

In [ ]:

Copied!





# Run the Pipeline
results1 = pipeline_random_samp.run()
results2 = pipeline_random_samp.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
      f'Run 2: {results2}')
# Run the Pipeline
results1 = pipeline_random_samp.run()
results2 = pipeline_random_samp.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
      f'Run 2: {results2}')

Sampled Conditions:
 Run 1: [(3.75, 3.75), (1.25, 1.25), (1.25, 2.5), (3.75, 5.0), (2.5, 2.5), (1.25, 5.0), (2.5, 3.75), (0.0, 2.5), (0.0, 1.25), (5.0, 5.0)]
 Run 2: [(0.0, 5.0), (2.5, 5.0), (5.0, 5.0), (1.25, 1.25), (2.5, 3.75), (0.0, 1.25), (1.25, 3.75), (3.75, 3.75), (0.0, 0.0), (1.25, 2.5)]

An alternative method of passing an instantiated pool iterator is demonstrated below. Note the difference where grid_pool is not initialized using the partial function but instantiated before initializing the Pipeline. grid_pool returns an iterator of the exhaustive pool. This will result in unexpected behavior when the Pipeline is run multiple times.

In [ ]:

Copied!





## Set up pipeline functions with the partial function
# Pool Function
pooler_iterator = grid_pool(metadata.independent_variables)

# Initialize the pipeline
pipeline_random_samp2 = Pipeline(
    [
        ("pool (iterator)", pooler_iterator),
        ("filter",weber_filter), # Filter that selects conditions with IV1 <= IV2
        ("sample", random_sample) # Sampler defined in the first implementation example
    ],
    {"sample": {"n": 10}}
)
# Run the Pipeline
results1 = pipeline_random_samp2.run()
results2 = pipeline_random_samp2.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
      f'Run 2: {results2}')
## Set up pipeline functions with the partial function
# Pool Function
pooler_iterator = grid_pool(metadata.independent_variables)

# Initialize the pipeline
pipeline_random_samp2 = Pipeline(
    [
        ("pool (iterator)", pooler_iterator),
        ("filter",weber_filter), # Filter that selects conditions with IV1 <= IV2
        ("sample", random_sample) # Sampler defined in the first implementation example
    ],
    {"sample": {"n": 10}}
)
# Run the Pipeline
results1 = pipeline_random_samp2.run()
results2 = pipeline_random_samp2.run()
print('Sampled Conditions:')
print(f' Run 1: {results1}\n',
      f'Run 2: {results2}')

Sampled Conditions:
 Run 1: [(1.25, 1.25), (0.0, 5.0), (1.25, 5.0), (2.5, 3.75), (1.25, 2.5), (5.0, 5.0), (2.5, 5.0), (1.25, 3.75), (0.0, 1.25), (2.5, 2.5)]
 Run 2: []

Running the pipeline multiple times results in an empty list. This is because the iterator is exhausted after first run and no longer yields results. If the pipeline needs to be run multiple times, initializing the functions as a callable using the partial function is recommended because the iterator will be initialized at the start of each run.

You could also use the scikit-learn "__" syntax to pass parameter sets into the pipeline:

In [ ]:

Copied!





pipeline_random_samp = Pipeline([
    ("grid_pool", grid_pool),
    ("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
    ("random_sample", random_sample)
],
    {"grid_pool__ivs": metadata.independent_variables, "random_sample__n": 10}
)
pipeline_random_samp
pipeline_random_samp = Pipeline([
    ("grid_pool", grid_pool),
    ("weber_filer", weber_filter), # Filter that selects conditions with IV1 <= IV2
    ("random_sample", random_sample)
],
    {"grid_pool__ivs": metadata.independent_variables, "random_sample__n": 10}
)
pipeline_random_samp

Out[ ]:

Pipeline(steps=[('grid_pool', <function grid_pool at 0x1077bdf70>), ('weber_filer', <function weber_filter at 0x1077c8550>), ('random_sampler', <function random_sampler at 0x1077c8160>)], params={'grid_pool__ivs': [IV(name='S1', value_range=None, allowed_values=array([0.  , 1.25, 2.5 , 3.75, 5.  ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 1 Intensity', rescale=1, is_covariate=False), IV(name='S2', value_range=None, allowed_values=array([0.  , 1.25, 2.5 , 3.75, 5.  ]), units='intensity', type=<ValueType.REAL: 'real'>, variable_label='Stimulus 2 Intensity', rescale=1, is_covariate=False)], 'random_sampler__n': 10})