bandit-random
This package provides functions to randomly sample a list of
- probability sequences
- reward sequences
Probability sequence
A probability sequence is a sequence of vectors with elements in the range between 0 and 1:
Example for a probability function that can be used in a 3-arm bandit task:
[[0, 1., .3], [.6, .2, .8], ...]
Reward sequence
A reward sequences uses the probabilities to generate a sequence with elements of either 0 or 1:
Example for a probability function that can be used in a 3-arm bandit task:
[[0, 1, 0], [1, 0, 1], ...]
The probability sequence can be created by specifying an initial probability for each element and a drift:
For example:
initial_proba = [0, .5, 1.]
drift = [.1, 0., -.1]
...
sequence = [[0, .5, 1.], [.1, .5, .9], [.2, .5, .8], [.3, .5, .7]...]
Instead of fixed values for the initial probability and the drift, we can also use ranges. In that case the values are randomly sampled from the range.
initial_proa = [[0, .3], [.4, .7], [.8, 1.]]
drift = [[0, .1], [.1, .2], [.2, .3]]