Skip to content

autora.experimentalist.random

random_pool = pool module-attribute

Alias for pool

random_sample = sample module-attribute

Alias for sample

pool(variables, num_samples=5, random_state=None, replace=True)

Create a sequence of conditions randomly sampled from independent variables.

Parameters:

Name Type Description Default
variables VariableCollection

the description of all the variables in the AER experiment.

required
num_samples int

the number of conditions to produce

5
random_state Optional[int]

the seed value for the random number generator

None
replace bool

if True, allow repeated values

True

Returns: the generated conditions as a dataframe

Examples:

>>> from autora.state import State
>>> from autora.variable import VariableCollection, Variable
>>> from dataclasses import dataclass, field
>>> import pandas as pd
>>> import numpy as np

With one independent variable "x", and some allowed_values we get some of those values back when running the experimentalist:

>>> pool(
...     VariableCollection(
...         independent_variables=[Variable(name="x", allowed_values=range(10))
... ]), random_state=1)
   x
0  4
1  5
2  7
3  9
4  0

... with one independent variable "x", and a value_range, we get a sample of the range back when running the experimentalist:

>>> pool(
...     VariableCollection(independent_variables=[
...         Variable(name="x", value_range=(-5, 5))
... ]), random_state=1)
          x
0  0.118216
1  4.504637
2 -3.558404
3  4.486494
4 -1.881685

The allowed_values or value_range must be specified:

>>> pool(VariableCollection(independent_variables=[Variable(name="x")]))
Traceback (most recent call last):
...
ValueError: allowed_values or [value_range and type==REAL] needs to be set...

With two independent variables, we get independent samples on both axes:

>>> pool(VariableCollection(independent_variables=[
...         Variable(name="x1", allowed_values=range(1, 5)),
...         Variable(name="x2", allowed_values=range(1, 500)),
... ]), num_samples=10, replace=True, random_state=1)
   x1   x2
0   2  434
1   3  212
2   4  137
3   4  414
4   1  129
5   1  205
6   4  322
7   4  275
8   1   43
9   2   14

If any of the variables have unspecified allowed_values, we get an error:

>>> pool(
...     VariableCollection(independent_variables=[
...         Variable(name="x1", allowed_values=[1, 2]),
...         Variable(name="x2"),
... ]))
Traceback (most recent call last):
...
ValueError: allowed_values or [value_range and type==REAL] needs to be set...

We can specify arrays of allowed values:

>>> pool(
...     VariableCollection(independent_variables=[
...         Variable(name="x", allowed_values=np.linspace(-10, 10, 101)),
...         Variable(name="y", allowed_values=[3, 4]),
...         Variable(name="z", allowed_values=np.linspace(20, 30, 11)),
... ]), random_state=1)
     x  y     z
0 -0.6  3  29.0
1  0.2  4  24.0
2  5.2  4  23.0
3  9.0  3  29.0
4 -9.4  3  22.0
Source code in autora/experimentalist/random.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def pool(
    variables: VariableCollection,
    num_samples: int = 5,
    random_state: Optional[int] = None,
    replace: bool = True,
) -> pd.DataFrame:
    """
    Create a sequence of conditions randomly sampled from independent variables.

    Args:
        variables: the description of all the variables in the AER experiment.
        num_samples: the number of conditions to produce
        random_state: the seed value for the random number generator
        replace: if True, allow repeated values

    Returns: the generated conditions as a dataframe

    Examples:
        >>> from autora.state import State
        >>> from autora.variable import VariableCollection, Variable
        >>> from dataclasses import dataclass, field
        >>> import pandas as pd
        >>> import numpy as np

        With one independent variable "x", and some allowed_values we get some of those values
        back when running the experimentalist:
        >>> pool(
        ...     VariableCollection(
        ...         independent_variables=[Variable(name="x", allowed_values=range(10))
        ... ]), random_state=1)
           x
        0  4
        1  5
        2  7
        3  9
        4  0


        ... with one independent variable "x", and a value_range,
        we get a sample of the range back when running the experimentalist:
        >>> pool(
        ...     VariableCollection(independent_variables=[
        ...         Variable(name="x", value_range=(-5, 5))
        ... ]), random_state=1)
                  x
        0  0.118216
        1  4.504637
        2 -3.558404
        3  4.486494
        4 -1.881685



        The allowed_values or value_range must be specified:
        >>> pool(VariableCollection(independent_variables=[Variable(name="x")]))
        Traceback (most recent call last):
        ...
        ValueError: allowed_values or [value_range and type==REAL] needs to be set...

        With two independent variables, we get independent samples on both axes:
        >>> pool(VariableCollection(independent_variables=[
        ...         Variable(name="x1", allowed_values=range(1, 5)),
        ...         Variable(name="x2", allowed_values=range(1, 500)),
        ... ]), num_samples=10, replace=True, random_state=1)
           x1   x2
        0   2  434
        1   3  212
        2   4  137
        3   4  414
        4   1  129
        5   1  205
        6   4  322
        7   4  275
        8   1   43
        9   2   14

        If any of the variables have unspecified allowed_values, we get an error:
        >>> pool(
        ...     VariableCollection(independent_variables=[
        ...         Variable(name="x1", allowed_values=[1, 2]),
        ...         Variable(name="x2"),
        ... ]))
        Traceback (most recent call last):
        ...
        ValueError: allowed_values or [value_range and type==REAL] needs to be set...


        We can specify arrays of allowed values:

        >>> pool(
        ...     VariableCollection(independent_variables=[
        ...         Variable(name="x", allowed_values=np.linspace(-10, 10, 101)),
        ...         Variable(name="y", allowed_values=[3, 4]),
        ...         Variable(name="z", allowed_values=np.linspace(20, 30, 11)),
        ... ]), random_state=1)
             x  y     z
        0 -0.6  3  29.0
        1  0.2  4  24.0
        2  5.2  4  23.0
        3  9.0  3  29.0
        4 -9.4  3  22.0


    """
    rng = np.random.default_rng(random_state)

    raw_conditions = {}
    for iv in variables.independent_variables:
        if iv.allowed_values is not None:
            raw_conditions[iv.name] = rng.choice(
                iv.allowed_values, size=num_samples, replace=replace
            )
        elif (iv.value_range is not None) and (iv.type == ValueType.REAL):
            raw_conditions[iv.name] = rng.uniform(*iv.value_range, size=num_samples)

        else:
            raise ValueError(
                "allowed_values or [value_range and type==REAL] needs to be set for "
                "%s" % (iv)
            )

    return pd.DataFrame(raw_conditions)

sample(conditions, num_samples=1, random_state=None, replace=False)

Take a random sample from some input conditions.

Parameters:

Name Type Description Default
conditions Union[DataFrame, ndarray, recarray]

the conditions to sample from

required
num_samples int

the number of conditions to produce

1
random_state Optional[int]

the seed value for the random number generator

None
replace bool

if True, allow repeated values

False

Returns: a Result object with a field conditions containing a DataFrame of the sampled conditions

Examples:

From a pd.DataFrame:

>>> import pandas as pd
>>> sample(
...     pd.DataFrame({"x": range(100, 200)}), num_samples=5, random_state=180)
     x
0  167
1  171
2  164
3  163
4  196

From a list (returns a DataFrame):

>>> sample(range(1000), num_samples=5, random_state=180)
     0
0  270
1  908
2  109
3  331
4  978
Source code in autora/experimentalist/random.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def sample(
    conditions: Union[pd.DataFrame, np.ndarray, np.recarray],
    num_samples: int = 1,
    random_state: Optional[int] = None,
    replace: bool = False,
) -> pd.DataFrame:
    """
    Take a random sample from some input conditions.

    Args:
        conditions: the conditions to sample from
        num_samples: the number of conditions to produce
        random_state: the seed value for the random number generator
        replace: if True, allow repeated values

    Returns: a Result object with a field `conditions` containing a DataFrame of the sampled
    conditions

    Examples:
        From a pd.DataFrame:
        >>> import pandas as pd
        >>> sample(
        ...     pd.DataFrame({"x": range(100, 200)}), num_samples=5, random_state=180)
             x
        0  167
        1  171
        2  164
        3  163
        4  196

        From a list (returns a DataFrame):
        >>> sample(range(1000), num_samples=5, random_state=180)
             0
        0  270
        1  908
        2  109
        3  331
        4  978
    """
    conditions_ = pd.DataFrame(conditions)
    return pd.DataFrame.sample(
        conditions_,
        random_state=random_state,
        n=num_samples,
        replace=replace,
        ignore_index=True,
    )