Usage with Cylc workflow manager and Slurm

The command line interface can be used with cylc in environments which use a scheduler like slurm.

Prerequisites

This example requires:

slurm, e.g. on a high performance computing cluster.
familiarity with and a working installation of cylc (e.g. by going through the tutorial)
virtualenv
python (so you can run virtualenv venv -p python)

A new environment will be created during the setup phase of the cylc workflow run.

Cylc requires a site-specific setup when using a scheduler like slurm. See the cylc documentation for a guide on setting up cylc on your platform. For Oscar at Brown University, we can use the following configuration in ./global.cylc

[platforms]
    [[oscar]]
        hosts = localhost
        install target = localhost
        job runner = slurm
        retrieve job logs = True
        global init-script = """
            module load python/3.9.0
        """

Setup

To initialize the workflow, we define a file in thelib/python directory (a cylc convention) with the code for the experiment: lib/python/runner.py, including all the required functions.

import pandas as pd
from sklearn.linear_model import LinearRegression

from autora.experimentalist.grid import grid_pool
from autora.state import StandardState, estimator_on_state, on_state
from autora.variable import Variable, VariableCollection


def initial_state(_):
    state = StandardState(
        variables=VariableCollection(
            independent_variables=[Variable(name="x", allowed_values=range(100))],
            dependent_variables=[Variable(name="y")],
            covariates=[],
        ),
        conditions=None,
        experiment_data=pd.DataFrame({"x": [], "y": []}),
        models=[],
    )
    return state


experimentalist = on_state(grid_pool, output=["conditions"])

experiment_runner = on_state(
    lambda conditions: conditions.assign(y=2 * conditions["x"] + 0.5),
    output=["experiment_data"],
)

theorist = estimator_on_state(LinearRegression(fit_intercept=True))

These functions will be called in turn by the autora.workflow script.

The flow.cylc file defines the workflow.

[scheduling]
    cycling mode = integer
    initial cycle point = 0
    final cycle point = 5
    [[graph]]
        R1/0 = """
        setup_python => initial_state
        """
        R1/1 = """
            initial_state[^] => experimentalist => experiment_runner => theorist
        """
        2/P1 = """
            theorist[-P1] => experimentalist => experiment_runner => theorist
        """

[runtime]
    [[setup_python]]
        script = """
            virtualenv "$CYLC_WORKFLOW_SHARE_DIR/env" -p python
            source "$CYLC_WORKFLOW_SHARE_DIR/env/bin/activate"
            pip install --upgrade pip
            pip install -r "$CYLC_WORKFLOW_RUN_DIR/requirements.txt"
        """
        platform = oscar
        execution time limit = PT20M

    [[initial_state]]
    script = """
            $CYLC_WORKFLOW_SHARE_DIR/env/bin/python \
                -m autora.workflow \
                runner.initial_state \
                --out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/result"
        """
        platform = oscar

    [[experimentalist]]
        script = """
            $CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
                runner.experimentalist \
                --in-path "$CYLC_WORKFLOW_SHARE_DIR/$((CYLC_TASK_CYCLE_POINT - 1))/result" \
                --out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/conditions"
        """
        platform = oscar

    [[experiment_runner]]
        script = """
            $CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
                runner.experiment_runner \
                --in-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/conditions" \
                --out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/data"
        """
        platform = oscar

    [[theorist]]
        script = """
            $CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
                runner.theorist \
                --in-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/data" \
                --out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/result"
        """
        platform = oscar
        execution time limit = PT1H
        [[[directives]]]
            --partition = gpu
            --gres = gpu:1

Execution

We can call the cylc command line interface as follows, in a shell session:

Validate, install and play the flow: First, we validate the flow.cylc file:

cylc vip .

We can view the workflow running in the graphical user interface (GUI):

cylc gui

Results

We can load and interrogate the resulting object in Python as follows:

from autora.serializer import load_state

state = load_state("~/cylc-run/cylc-slurm-pip/runN/share/result")
print(state)