Usage with Cylc workflow manager and Slurm
The command line interface can be used with cylc in environments which use a scheduler like slurm.
Prerequisites
This example requires:
slurm
, e.g. on a high performance computing cluster.- familiarity with and a working installation of
cylc
(e.g. by going through the tutorial) virtualenv
python
(so you can runvirtualenv venv -p python
)
A new environment will be created during the setup phase of the cylc
workflow run.
Cylc requires a site-specific setup when using a scheduler like slurm. See the cylc documentation for a guide on setting up cylc on your platform.
For Oscar at Brown University, we can use the following configuration in
./global.cylc
[platforms]
[[oscar]]
hosts = localhost
install target = localhost
job runner = slurm
retrieve job logs = True
global init-script = """
module load python/3.9.0
"""
Setup
To initialize the workflow, we define a file in thelib/python
directory
(a cylc convention) with the code for the experiment:
lib/python/runner.py
, including all the required functions.
import pandas as pd
from sklearn.linear_model import LinearRegression
from autora.experimentalist.grid import grid_pool
from autora.state import StandardState, estimator_on_state, on_state
from autora.variable import Variable, VariableCollection
def initial_state(_):
state = StandardState(
variables=VariableCollection(
independent_variables=[Variable(name="x", allowed_values=range(100))],
dependent_variables=[Variable(name="y")],
covariates=[],
),
conditions=None,
experiment_data=pd.DataFrame({"x": [], "y": []}),
models=[],
)
return state
experimentalist = on_state(grid_pool, output=["conditions"])
experiment_runner = on_state(
lambda conditions: conditions.assign(y=2 * conditions["x"] + 0.5),
output=["experiment_data"],
)
theorist = estimator_on_state(LinearRegression(fit_intercept=True))
These functions will be called in turn by the autora.workflow
script.
The flow.cylc
file defines the workflow.
[scheduling]
cycling mode = integer
initial cycle point = 0
final cycle point = 5
[[graph]]
R1/0 = """
setup_python => initial_state
"""
R1/1 = """
initial_state[^] => experimentalist => experiment_runner => theorist
"""
2/P1 = """
theorist[-P1] => experimentalist => experiment_runner => theorist
"""
[runtime]
[[setup_python]]
script = """
virtualenv "$CYLC_WORKFLOW_SHARE_DIR/env" -p python
source "$CYLC_WORKFLOW_SHARE_DIR/env/bin/activate"
pip install --upgrade pip
pip install -r "$CYLC_WORKFLOW_RUN_DIR/requirements.txt"
"""
platform = oscar
execution time limit = PT20M
[[initial_state]]
script = """
$CYLC_WORKFLOW_SHARE_DIR/env/bin/python \
-m autora.workflow \
runner.initial_state \
--out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/result"
"""
platform = oscar
[[experimentalist]]
script = """
$CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
runner.experimentalist \
--in-path "$CYLC_WORKFLOW_SHARE_DIR/$((CYLC_TASK_CYCLE_POINT - 1))/result" \
--out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/conditions"
"""
platform = oscar
[[experiment_runner]]
script = """
$CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
runner.experiment_runner \
--in-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/conditions" \
--out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/data"
"""
platform = oscar
[[theorist]]
script = """
$CYLC_WORKFLOW_SHARE_DIR/env/bin/python -m autora.workflow \
runner.theorist \
--in-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/data" \
--out-path "$CYLC_WORKFLOW_SHARE_DIR/$CYLC_TASK_CYCLE_POINT/result"
"""
platform = oscar
execution time limit = PT1H
[[[directives]]]
--partition = gpu
--gres = gpu:1
Execution
We can call the cylc
command line interface as follows, in a shell session:
Validate, install and play the flow: First, we validate the flow.cylc
file:
cylc vip .
We can view the workflow running in the graphical user interface (GUI):
cylc gui
Results
We can load and interrogate the resulting object in Python as follows:
from autora.serializer import load_state
state = load_state("~/cylc-run/cylc-slurm-pip/runN/share/result")
print(state)