Basic Usage¶
Here, we show how to randomly sample a sequence of rewards which can be used in a bandit task. A bandit task provides the participant with multiple options (number of arms). Each arm has a reward probability. Here, we show how to create a sequence of reward probabilities and rewards for a 2-arm bandit task.
import pandas as pd
from autora.experimentalist.bandit_random import bandit_random_pool_proba, \
bandit_random_pool_from_proba, bandit_random_pool
This package provides functions to randomly sample a list of
- probability sequences
- reward sequences
Pool_proba¶
First, we can use default values, to create a sequence, where the reward probability is .5 for each arm. We need to pass in the number of arms and the length of the sequence that we want to generate:
default_probability_sequences = bandit_random_pool_proba(num_probabilities=2, sequence_length=4)
default_probability_sequences
[[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]
We also can set initial values:
constant_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[.1, .9])
constant_probability_sequence
[[[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]]]
We can do the same for drift rates:
changing_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[0.1, .9], drift_rates=[.1, -.1])
changing_probability_sequence
[[[0.1, 0.9], [0.2, 0.8], [0.30000000000000004, 0.7000000000000001], [0.4, 0.6000000000000001]]]
Instead of having a fixed initial value and drift rate, we can also sample them from a range:
random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]])
random_probability_sequence
[[[0.015097359670462374, 0.975340226214809], [0.08820028316160722, 0.954897827401469], [0.16130320665275205, 0.934455428588129], [0.23440613014389688, 0.914013029774789]]]
We pass in the number of sequence to generate as num_samples
random_probability_sequence = bandit_random_pool_proba(num_probabilities=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
random_probability_sequence
[[[0.05277517161024149, 0.9157516813620797], [0.06601093773147637, 0.8512254219581823], [0.07924670385271125, 0.786699162554285], [0.09248246997394613, 0.7221729031503876]], [[0.03308990634055732, 0.8608567922155729], [0.05423027527564794, 0.8348824396142384], [0.07537064421073857, 0.8089080870129038], [0.09651101314582919, 0.7829337344115693]], [[0.05228116419768012, 0.9571430988304549], [0.10872837330001228, 0.922489870191641], [0.16517558240234442, 0.887836641552827], [0.22162279150467656, 0.8531834129140131]], [[0.017985053533171515, 0.9696895439983294], [0.07759069582130446, 0.9603867806583171], [0.13719633810943738, 0.9510840173183047], [0.1968019803975703, 0.9417812539782924]]]
We can use the created probability sequences to create reward sequences:
reward_sequences = bandit_random_pool_from_proba(random_probability_sequence)
reward_sequences
[[[0, 1], [0, 1], [0, 1], [0, 1]], [[0, 1], [0, 1], [0, 1], [0, 0]], [[0, 1], [0, 1], [0, 1], [0, 0]], [[0, 1], [0, 0], [1, 1], [0, 1]]]
Or, we can use bandit_random_pool
with the same arguments as in the bandit_random_pool_proba
to generate reward sequences directly:
reward_sequences = bandit_random_pool(num_rewards=2, sequence_length=4, initial_probabilities=[[0.,.1], [.8, 1.]], drift_rates=[[0,.1],[-.1,0]], num_samples=4)
reward_sequences
[[[0, 0], [0, 1], [0, 1], [1, 1]], [[0, 1], [0, 1], [0, 1], [1, 1]], [[0, 1], [0, 1], [0, 1], [1, 1]], [[0, 1], [0, 0], [0, 1], [0, 1]]]
Use in State¶
!!!Warning If you want to use this in the AutoRa StandardState
you need to convert the return value into a pd.DataFrame
:
# First, we define the variables:
from autora.variable import VariableCollection, Variable
variables = VariableCollection(
independent_variables=[Variable(name="reward-trajectory")],
dependent_variables=[Variable(name="choice-trajectory")]
)
# With these variables, we initialize a StandardState
from autora.state import StandardState
state = StandardState()
# Here, we want to create a random reward-sequences directly as on state function
from autora.state import Delta, on_state
@on_state()
def pool_on_state(num_rewards=2, sequence_length=10, num_samples=1, initial_probabilities=None,
drift_rates=None):
sequence_as_list = bandit_random_pool(
num_rewards=num_rewards, sequence_length=sequence_length, num_samples=num_samples,
initial_probabilities=initial_probabilities, drift_rates=drift_rates)
# the condition of the state expect a pandas DataFrame,
sequence_as_df = pd.DataFrame({"reward-trajectory": sequence_as_list})
return Delta(conditions=sequence_as_df)
# now we can use the pool_on_state on the state to create conditions:
state = pool_on_state(state)
state.conditions
reward-trajectory | |
---|---|
0 | [[0, 1], [0, 1], [1, 0], [0, 0], [1, 1], [0, 1... |
We can pass in keyword arguments into the on_state function as well. Here, we create 3 sequences with initial values for the first arm between 0 and .3 and for the second arm between .7 and 1. And drift rates are sampled between 0 and .1, or -.1 and 0, respectively:
state = pool_on_state(state, num_samples=3, initial_probabilities=[[0, .3], [.7, 1.]], drift_rates=[[0, .1], [-.1,0]])
state.conditions
reward-trajectory | |
---|---|
0 | [[1, 1], [1, 1], [0, 0], [1, 0], [1, 0], [0, 0... |
1 | [[1, 0], [0, 1], [1, 1], [1, 1], [0, 0], [0, 0... |
2 | [[0, 1], [0, 0], [0, 0], [1, 0], [0, 1], [0, 1... |