Skip to content

utils

predict(model, x, y)

Maps independent variable data onto expected dependent variable data

Parameters:

Name Type Description Default
model Tree

The equation / function that best maps x onto y

required
x pd.DataFrame

The independent variables of the data

required
y pd.DataFrame

The dependent variable of the data

required
Source code in autora/theorist/bms/utils.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def predict(model: Tree, x: pd.DataFrame, y: pd.DataFrame) -> dict:
    """
    Maps independent variable data onto expected dependent variable data

    Args:
        model: The equation / function that best maps x onto y
        x: The independent variables of the data
        y: The dependent variable of the data

    Returns: Predicted values for y given x and the model as trained
    """
    plt.figure(figsize=(6, 6))
    plt.scatter(model.predict(x), y)

    all_y = np.append(y, model.predict(x))
    y_range = all_y.min().item(), all_y.max().item()
    plt.plot(y_range, y_range)

    plt.xlabel("MDL model predictions", fontsize=14)
    plt.ylabel("Actual values", fontsize=14)
    plt.show()
    return model.predict(x)

present_results(model, model_len, desc_len)

Prints out the best equation, its description length, along with a plot of how this has progressed over the course of the search tasks

Parameters:

Name Type Description Default
model Tree

The equation which best describes the data

required
model_len float

The equation loss (defined as description length)

required
desc_len List[float]

Record of equation loss over time

required
Source code in autora/theorist/bms/utils.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def present_results(model: Tree, model_len: float, desc_len: List[float]) -> None:
    """
    Prints out the best equation, its description length,
    along with a plot of how this has progressed over the course of the search tasks

    Args:
        model: The equation which best describes the data
        model_len: The equation loss (defined as description length)
        desc_len: Record of equation loss over time

    Returns: Nothing

    """
    print("Best model:\t", model)
    print("Desc. length:\t", model_len)
    plt.figure(figsize=(15, 5))
    plt.plot(desc_len)
    plt.xlabel("MCMC step", fontsize=14)
    plt.ylabel("Description length", fontsize=14)
    plt.title("MDL model: $%s$" % model.latex())
    plt.show()

run(pms, num_steps, thinning=100)

Parameters:

Name Type Description Default
pms Parallel

Parallel Machine Scientist (BMS is essentially a wrapper for pms)

required
num_steps int

number of epochs / mcmc step & tree swap iterations

required
thinning int

number of epochs between recording model loss to the trace

100

Returns:

Name Type Description
model Tree

The equation which best describes the data

model_len float

(defined as description length) loss function score

desc_len List[float]

Record of loss function score over time

Source code in autora/theorist/bms/utils.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def run(
    pms: Parallel, num_steps: int, thinning: int = 100
) -> Tuple[Tree, float, List[float]]:
    """

    Args:
        pms: Parallel Machine Scientist (BMS is essentially a wrapper for pms)
        num_steps: number of epochs / mcmc step & tree swap iterations
        thinning: number of epochs between recording model loss to the trace

    Returns:
        model: The equation which best describes the data
        model_len: (defined as description length) loss function score
        desc_len: Record of loss function score over time

    """
    desc_len, model, model_len = [], pms.t1, np.inf
    for n in range(num_steps):
        pms.mcmc_step()
        pms.tree_swap()
        if num_steps % thinning == 0:  # sample less often if we thin more
            desc_len.append(pms.t1.E)  # Add the description length to the trace
        if pms.t1.E < model_len:  # Check if this is the MDL expression so far
            model, model_len = deepcopy(pms.t1), pms.t1.E
        _logger.info("Finish iteration {}".format(n))
    return model, model_len, desc_len