Basic Usage¶

Here, we show how to randomly sample a sequence of rewards which can be used in a bandit task. A bandit task provides the participant with multiple options (number of arms). Each arm has a reward probability. Here, we show how to create a sequence of reward probabilities and rewards for a 2-arm bandit task.

In [ ]:

Copied!

import pandas as pd

from autora.experimentalist.bandit_random import bandit_random_pool_proba, \
    bandit_random_pool_from_proba, bandit_random_pool
import pandas as pd

from autora.experimentalist.bandit_random import bandit_random_pool_proba, \
    bandit_random_pool_from_proba, bandit_random_pool

This package provides functions to randomly sample a list of

probability sequences
reward sequences

Pool_proba¶

First, we can use default values, to create a sequence, where the reward probability is .5 for each arm. We need to pass in the number of arms and the length of the sequence that we want to generate:

In [28]:

Copied!

default_probability_sequences = bandit_random_pool_proba(num_probabilities=2, sequence_length=4)
default_probability_sequences
default_probability_sequences = bandit_random_pool_proba(num_probabilities=2, sequence_length=4)
default_probability_sequences

Out[28]:

[[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

We also can set initial values:

In [29]:

Copied!

constant_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[.1, .9])
constant_probability_sequence
constant_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[.1, .9])
constant_probability_sequence

Out[29]:

[[[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]]]

We can do the same for drift rates:

In [30]:

Copied!

changing_probability_sequence =  bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[0.1, .9], drift_rates=[.1, -.1])
changing_probability_sequence
changing_probability_sequence =  bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[0.1, .9], drift_rates=[.1, -.1])
changing_probability_sequence

Out[30]:

[[[0.1, 0.9],
  [0.2, 0.8],
  [0.30000000000000004, 0.7000000000000001],
  [0.4, 0.6000000000000001]]]

Instead of having a fixed initial value and drift rate, we can also sample them from a range:

In [31]:

Copied!

random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]])
random_probability_sequence
random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]])
random_probability_sequence

Out[31]:

[[[0.015097359670462374, 0.975340226214809],
  [0.08820028316160722, 0.954897827401469],
  [0.16130320665275205, 0.934455428588129],
  [0.23440613014389688, 0.914013029774789]]]

We pass in the number of sequence to generate as num_samples

In [32]:

Copied!

random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
random_probability_sequence
random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
random_probability_sequence

Out[32]:

[[[0.05277517161024149, 0.9157516813620797],
  [0.06601093773147637, 0.8512254219581823],
  [0.07924670385271125, 0.786699162554285],
  [0.09248246997394613, 0.7221729031503876]],
 [[0.03308990634055732, 0.8608567922155729],
  [0.05423027527564794, 0.8348824396142384],
  [0.07537064421073857, 0.8089080870129038],
  [0.09651101314582919, 0.7829337344115693]],
 [[0.05228116419768012, 0.9571430988304549],
  [0.10872837330001228, 0.922489870191641],
  [0.16517558240234442, 0.887836641552827],
  [0.22162279150467656, 0.8531834129140131]],
 [[0.017985053533171515, 0.9696895439983294],
  [0.07759069582130446, 0.9603867806583171],
  [0.13719633810943738, 0.9510840173183047],
  [0.1968019803975703, 0.9417812539782924]]]

We can use the created probability sequences to create reward sequences:

In [33]:

Copied!

reward_sequences = bandit_random_pool_from_proba(random_probability_sequence)
reward_sequences
reward_sequences = bandit_random_pool_from_proba(random_probability_sequence)
reward_sequences

Out[33]:

[[[0, 1], [0, 1], [0, 1], [0, 1]],
 [[0, 1], [0, 1], [0, 1], [0, 0]],
 [[0, 1], [0, 1], [0, 1], [0, 0]],
 [[0, 1], [0, 0], [1, 1], [0, 1]]]

Or, we can use bandit_random_pool with the same arguments as in the bandit_random_pool_proba to generate reward sequences directly:

In [34]:

Copied!

reward_sequences = bandit_random_pool(num_rewards=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
reward_sequences
reward_sequences = bandit_random_pool(num_rewards=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
reward_sequences

Out[34]:

[[[0, 0], [0, 1], [0, 1], [1, 1]],
 [[0, 1], [0, 1], [0, 1], [1, 1]],
 [[0, 1], [0, 1], [0, 1], [1, 1]],
 [[0, 1], [0, 0], [0, 1], [0, 1]]]

Use in State¶

!!!Warning If you want to use this in the AutoRa StandardState you need to convert the return value into a pd.DataFrame:

In [41]:

Copied!





# First, we define the variables:
from autora.variable import VariableCollection, Variable

variables = VariableCollection(
    independent_variables=[Variable(name="reward-trajectory")],
    dependent_variables=[Variable(name="choice-trajectory")]
)

# With these variables, we initialize a StandardState
from autora.state import StandardState

state = StandardState()

# Here, we want to create a random reward-sequences directly as on state function
from autora.state import Delta, on_state


@on_state()
def pool_on_state(num_rewards=2, sequence_length=10, num_samples=1, initial_probabilities=None,
                  drift_rates=None):
    sequence_as_list = bandit_random_pool(
        num_rewards=num_rewards, sequence_length=sequence_length, num_samples=num_samples,
        initial_probabilities=initial_probabilities, drift_rates=drift_rates)
    # the condition of the state expect a pandas DataFrame,
    sequence_as_df = pd.DataFrame({"reward-trajectory": sequence_as_list})
    return Delta(conditions=sequence_as_df)


# now we can use the pool_on_state on the state to create conditions:
state = pool_on_state(state)
state.conditions
# First, we define the variables:
from autora.variable import VariableCollection, Variable

variables = VariableCollection(
    independent_variables=[Variable(name="reward-trajectory")],
    dependent_variables=[Variable(name="choice-trajectory")]
)

# With these variables, we initialize a StandardState
from autora.state import StandardState

state = StandardState()

# Here, we want to create a random reward-sequences directly as on state function
from autora.state import Delta, on_state


@on_state()
def pool_on_state(num_rewards=2, sequence_length=10, num_samples=1, initial_probabilities=None,
                  drift_rates=None):
    sequence_as_list = bandit_random_pool(
        num_rewards=num_rewards, sequence_length=sequence_length, num_samples=num_samples,
        initial_probabilities=initial_probabilities, drift_rates=drift_rates)
    # the condition of the state expect a pandas DataFrame,
    sequence_as_df = pd.DataFrame({"reward-trajectory": sequence_as_list})
    return Delta(conditions=sequence_as_df)


# now we can use the pool_on_state on the state to create conditions:
state = pool_on_state(state)
state.conditions

Out[41]:

	reward-trajectory
0	[[0, 1], [0, 1], [1, 0], [0, 0], [1, 1], [0, 1...

We can pass in keyword arguments into the on_state function as well. Here, we create 3 sequences with initial values for the first arm between 0 and .3 and for the second arm between .7 and 1. And drift rates are sampled between 0 and .1, or -.1 and 0, respectively:

In [42]:

Copied!

state = pool_on_state(state, num_samples=3, initial_probabilities=[[0, .3], [.7, 1.]], drift_rates=[[0, .1], [-.1,0]])
state.conditions
state = pool_on_state(state, num_samples=3, initial_probabilities=[[0, .3], [.7, 1.]], drift_rates=[[0, .1], [-.1,0]])
state.conditions

Out[42]:

	reward-trajectory
0	[[1, 1], [1, 1], [0, 0], [1, 0], [1, 0], [0, 0...
1	[[1, 0], [0, 1], [1, 1], [1, 1], [0, 0], [0, 0...
2	[[0, 1], [0, 0], [0, 0], [1, 0], [0, 1], [0, 1...

In [ ]: