Skip to content

autora.experimentalist.bandit_random

Experimentalist that returns probability sequences: Sequences of vectors with elements between 0 and 1 or reward sequences: Sequences of vectors with binary elements

pool(num_rewards, sequence_length, initial_probabilities=None, sigmas=None, num_samples=1, random_state=None)

Returns a list of rewards. A reward sequence is a sequence of vectors of dimension num_probabilities. Each entry of this vector is a number between 0 and 1. We can set a fixed initial value for the reward probability of the first vector of each sequence and a constant drif rate. We can also set a range to randomly sample these values.

Parameters:

Name Type Description Default
num_rewards int

The number of rewards/ dimention of each element of the sequence

required
sequence_length int

The length of the sequence

required
initial_probabilities Optional[Iterable[Union[float, Iterable]]]

A list of initial reward-probabilities. Each

None
sigmas Optional[Iterable[Union[float, Iterable]]]

A list of constant drift rate for each element of the probabilites. Each

None
num_samples int

number of experimental conditions to select

1
random_state Optional[int]

the seed value for the random number generator

None

Returns: Sampled pool of experimental conditions

Examples:

We create a reward sequence for five two arm bandit tasks. The reward probabilities for each arm should be .5 and constant.

>>> pool(num_rewards=2, sequence_length=3, num_samples=1, random_state=42)
[[[1, 0], [1, 1], [0, 1]]]

If we want more arms:

>>> pool(num_rewards=4, sequence_length=3, num_samples=1, random_state=42)
[[[1, 0, 1, 1], [0, 1, 1, 1], [0, 0, 0, 1]]]

longer sequence:

>>> pool(num_rewards=2, sequence_length=5, num_samples=1, random_state=42)
[[[1, 0], [1, 1], [0, 1], [1, 1], [0, 0]]]

more sequences:

>>> pool(num_rewards=2, sequence_length=3, num_samples=2, random_state=42)
[[[1, 0], [1, 1], [0, 1]], [[1, 1], [0, 0], [0, 1]]]

We can set fixed initial values:

>>> pool(num_rewards=2, sequence_length=3,
...     initial_probabilities=[0.,.4],
...     random_state=42)
[[[0, 0], [0, 1], [0, 1]]]

And drift rates:

>>> pool(num_rewards=2, sequence_length=3,
...     initial_probabilities=[0.,.4],
...     sigmas=[.2, .3],
...     random_state=42)
[[[0, 0], [0, 1], [0, 1]]]

We can also sample the initial values by passing a range:

>>> pool(num_rewards=2, sequence_length=3,
...     initial_probabilities=[[0, .2],[.8, 1.]],
...     sigmas=[[0., .2], [0., .3]],
...     random_state=42)
[[[0, 1], [1, 1], [0, 1]]]
Source code in temp_dir/bandit-random/src/autora/experimentalist/bandit_random/__init__.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def pool(
        num_rewards: int,
        sequence_length: int,
        initial_probabilities: Optional[Iterable[Union[float, Iterable]]] = None,
        sigmas: Optional[Iterable[Union[float, Iterable]]] = None,
        num_samples: int = 1,
        random_state: Optional[int] = None,
) -> List[List[List[float]]]:
    """
    Returns a list of rewards.
    A reward sequence is a sequence of vectors of dimension `num_probabilities`. Each entry
    of this vector is a number between 0 and 1.
    We can set a fixed initial value for the reward probability of the first vector of each sequence
    and a constant drif rate.
    We can also set a range to randomly sample these values.


    Args:
        num_rewards: The number of rewards/ dimention of each element of the sequence
        sequence_length: The length of the sequence
        initial_probabilities: A list of initial reward-probabilities. Each
        entry can be a range.
        sigmas: A list of constant drift rate for each element of the probabilites. Each
        entry can be a range. The drift rate is defined as change per step
        num_samples: number of experimental conditions to select
        random_state: the seed value for the random number generator
    Returns:
        Sampled pool of experimental conditions

    Examples:
        We create a reward sequence for five two arm bandit tasks. The reward
        probabilities for each arm should be .5 and constant.
        >>> pool(num_rewards=2, sequence_length=3, num_samples=1, random_state=42)
        [[[1, 0], [1, 1], [0, 1]]]

        If we want more arms:
        >>> pool(num_rewards=4, sequence_length=3, num_samples=1, random_state=42)
        [[[1, 0, 1, 1], [0, 1, 1, 1], [0, 0, 0, 1]]]

        longer sequence:
        >>> pool(num_rewards=2, sequence_length=5, num_samples=1, random_state=42)
        [[[1, 0], [1, 1], [0, 1], [1, 1], [0, 0]]]

        more sequences:
        >>> pool(num_rewards=2, sequence_length=3, num_samples=2, random_state=42)
        [[[1, 0], [1, 1], [0, 1]], [[1, 1], [0, 0], [0, 1]]]

        We  can set fixed initial values:
        >>> pool(num_rewards=2, sequence_length=3,
        ...     initial_probabilities=[0.,.4],
        ...     random_state=42)
        [[[0, 0], [0, 1], [0, 1]]]

        And drift rates:
        >>> pool(num_rewards=2, sequence_length=3,
        ...     initial_probabilities=[0.,.4],
        ...     sigmas=[.2, .3],
        ...     random_state=42)
        [[[0, 0], [0, 1], [0, 1]]]

        We can also sample the initial values by passing a range:
        >>> pool(num_rewards=2, sequence_length=3,
        ...     initial_probabilities=[[0, .2],[.8, 1.]],
        ...     sigmas=[[0., .2], [0., .3]],
        ...     random_state=42)
        [[[0, 1], [1, 1], [0, 1]]]
    """
    _sequence = pool_proba(num_rewards,
                           sequence_length,
                           initial_probabilities,
                           sigmas,
                           num_samples,
                           random_state)
    return pool_from_proba(_sequence, random_state)

pool_from_proba(probability_sequence, random_state=None)

From a given probability sequence sample rewards (0 or 1)

Example

proba_sequence = pool_proba(num_probabilities=2, sequence_length=3, ... initial_probabilities=[.2,.8], ... sigmas=[.2, .1], random_state=42) proba_sequence [[[0.2, 0.8], [0.26094341595088627, 0.8750451195806458], [0.05294659470278715, 0.9691015912197671]]] pool_from_proba(proba_sequence, 42) [[[0, 1], [1, 1], [0, 1]]]

Source code in temp_dir/bandit-random/src/autora/experimentalist/bandit_random/__init__.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def pool_from_proba(
        probability_sequence: Iterable,
        random_state: Optional[int] = None,
) -> List[List[List[float]]]:
    """
    From a given probability sequence sample rewards (0 or 1)

    Example:
        >>> proba_sequence = pool_proba(num_probabilities=2, sequence_length=3,
        ...     initial_probabilities=[.2,.8],
        ...     sigmas=[.2, .1], random_state=42)
        >>> proba_sequence
        [[[0.2, 0.8], [0.26094341595088627, 0.8750451195806458], \
[0.05294659470278715, 0.9691015912197671]]]
        >>> pool_from_proba(proba_sequence, 42)
        [[[0, 1], [1, 1], [0, 1]]]
    """
    rng = np.random.default_rng(random_state)
    probability_sequence_array = _sample_from_probabilities(probability_sequence, rng)
    probability_sequence_lst = [el for el in probability_sequence_array]
    return probability_sequence_lst

pool_proba(num_probabilities, sequence_length, initial_probabilities=None, sigmas=None, num_samples=1, random_state=None)

Returns a list of probability sequences. A probability sequence is a sequence of vectors of dimension num_probabilities. Each entry of this vector is a number between 0 and 1. We can set a fixed initial value for the first vector of each sequence and a constant drif rate. We can also set a range to randomly sample these values.

Parameters:

Name Type Description Default
num_probabilities int

The number of probilities/ dimention of each element of the sequence

required
sequence_length int

The length of the sequence

required
initial_probabilities Optional[Iterable[Union[float, Iterable]]]

A list of initial values for each element of the probalities. Each

None
sigmas Optional[Iterable[Union[float, Iterable]]]

A list of sigma of the normal distribution for the drift rate of each arm. Each entry can be a range to be sampled from. The drift rate is defined as change per step

None
num_samples int

number of experimental conditions to select

1
random_state Optional[int]

the seed value for the random number generator

None

Returns: Sampled pool of experimental conditions

Examples:

We create a reward probabilty sequence for five two arm bandit tasks. The reward probabilities for each arm should be .5 and constant.

>>> pool_proba(num_probabilities=2, sequence_length=3, num_samples=1, random_state=42)
[[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

If we want more arms:

>>> pool_proba(num_probabilities=4, sequence_length=3, num_samples=1, random_state=42)
[[[0.5, 0.5, 0.5, 0.5], [0.5, 0.5, 0.5, 0.5], [0.5, 0.5, 0.5, 0.5]]]

longer sequence:

>>> pool_proba(num_probabilities=2, sequence_length=5, num_samples=1, random_state=42)
[[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

more sequences:

>>> pool_proba(num_probabilities=2, sequence_length=3, num_samples=2, random_state=42)
[[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]], [[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

We can set fixed initial values:

>>> pool_proba(num_probabilities=2, sequence_length=3,
...     initial_probabilities=[0.,.4], random_state=42)
[[[0.0, 0.4], [0.0, 0.4], [0.0, 0.4]]]

And drift rates:

>>> pool_proba(num_probabilities=2, sequence_length=3,
...     initial_probabilities=[0.,.4],
...     sigmas=[.1, .5], random_state=42)
[[[0.0, 0.4], [0.030471707975443137, 0.7752255979032286], [0.0, 1.0]]]

We can also sample the initial values by passing a range:

>>> pool_proba(num_probabilities=2, sequence_length=3,
...     initial_probabilities=[[0, .2],[.8, 1.]],
...     sigmas=[[0., .25], [0., .5]],
...     random_state=42)
[[[0.15479120971119267, 0.81883546957753], [0.23713042219259264, 0.8811974469636589], [0.34032881599649456, 0.7269307761486841]]]
Source code in temp_dir/bandit-random/src/autora/experimentalist/bandit_random/__init__.py
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def pool_proba(
        num_probabilities: int,
        sequence_length: int,
        initial_probabilities: Optional[Iterable[Union[float, Iterable]]] = None,
        sigmas: Optional[Iterable[Union[float, Iterable]]] = None,
        num_samples: int = 1,
        random_state: Optional[int] = None,
) -> List[List[List[float]]]:
    """
    Returns a list of probability sequences.
    A probability sequence is a sequence of vectors of dimension `num_probabilities`. Each entry
    of this vector is a number between 0 and 1.
    We can set a fixed initial value for the first vector of each sequence and a constant drif rate.
    We can also set a range to randomly sample these values.


    Args:
        num_probabilities: The number of probilities/ dimention of each element of the sequence
        sequence_length: The length of the sequence
        initial_probabilities: A list of initial values for each element of the probalities. Each
        entry can be a range.
        sigmas: A list of sigma of the normal distribution for the drift rate of each arm. Each
            entry can be a range to be sampled from. The drift rate is defined as change per step
        num_samples: number of experimental conditions to select
        random_state: the seed value for the random number generator
    Returns:
        Sampled pool of experimental conditions

    Examples:
        We create a reward probabilty sequence for five two arm bandit tasks. The reward
        probabilities for each arm should be .5 and constant.
        >>> pool_proba(num_probabilities=2, sequence_length=3, num_samples=1, random_state=42)
        [[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

        If we want more arms:
        >>> pool_proba(num_probabilities=4, sequence_length=3, num_samples=1, random_state=42)
        [[[0.5, 0.5, 0.5, 0.5], [0.5, 0.5, 0.5, 0.5], [0.5, 0.5, 0.5, 0.5]]]

        longer sequence:
        >>> pool_proba(num_probabilities=2, sequence_length=5, num_samples=1, random_state=42)
        [[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

        more sequences:
        >>> pool_proba(num_probabilities=2, sequence_length=3, num_samples=2, random_state=42)
        [[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]], [[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]]

        We  can set fixed initial values:
        >>> pool_proba(num_probabilities=2, sequence_length=3,
        ...     initial_probabilities=[0.,.4], random_state=42)
        [[[0.0, 0.4], [0.0, 0.4], [0.0, 0.4]]]

        And drift rates:
        >>> pool_proba(num_probabilities=2, sequence_length=3,
        ...     initial_probabilities=[0.,.4],
        ...     sigmas=[.1, .5], random_state=42)
        [[[0.0, 0.4], [0.030471707975443137, 0.7752255979032286], [0.0, 1.0]]]

        We can also sample the initial values by passing a range:
        >>> pool_proba(num_probabilities=2, sequence_length=3,
        ...     initial_probabilities=[[0, .2],[.8, 1.]],
        ...     sigmas=[[0., .25], [0., .5]],
        ...     random_state=42)
        [[[0.15479120971119267, 0.81883546957753], \
[0.23713042219259264, 0.8811974469636589], \
[0.34032881599649456, 0.7269307761486841]]]
    """
    rng = np.random.default_rng(random_state)
    if initial_probabilities:
        assert len(initial_probabilities) == num_probabilities
    else:
        initial_probabilities = [.5 for _ in range(num_probabilities)]
    if sigmas:
        assert len(sigmas) == num_probabilities
    else:
        sigmas = [0 for _ in range(num_probabilities)]
    res = []
    for _ in range(num_samples):
        seq = []
        for idx, el in enumerate(initial_probabilities):

            if _is_iterable(el):
                start = rng.uniform(el[0], el[1])
            else:
                start = el
            if _is_iterable(sigmas[idx]):
                sigma = rng.uniform(sigmas[idx][0], sigmas[idx][1])
            else:
                sigma = sigmas[idx]
            prob = [start]
            for _ in range(sequence_length - 1):
                start += rng.normal(loc=0, scale=sigma)
                start = max(0., min(start, 1.))
                prob.append(start)
            seq.append(prob)
        res.append(seq)
    for idx in range(len(res)):
        res[idx] = _transpose_matrix(res[idx])
    return res