# Q-Learning

In [1]:
# Uncomment the following line when running on Google Colab
# !pip install "autora"

The q-learning experiment has to be initialized with a specific formula and effects.

In [2]:
import numpy as np
from autora.experiment_runner.synthetic.psychology.q_learning import  q_learning

s = q_learning()

Check the docstring to get information about the model

In [3]:
help(q_learning)

Help on function q_learning in module autora.experiment_runner.synthetic.psychology.q_learning:

q_learning(name='Q-Learning', learning_rate: float = 0.2, decision_noise: float = 3.0, n_actions: int = 2, forget_rate: float = 0.0, perseverance_bias: float = 0.0, correlated_reward: bool = False)
    An agent that runs simple Q-learning for an n-armed bandits tasks.
    
    Args:
        name: name of the experiment
        trials: number of trials
        learning_rate: learning rate for Q-learning
        decision_noise: softmax parameter for decision noise
        n_actions: number of actions
        forget_rate: rate of forgetting
        perseverance_bias: bias towards choosing the previously chosen action
        correlated_reward: whether rewards are correlated
    
    Examples:
        >>> experiment = q_learning()
    
        # The runner can accept numpy arrays or pandas DataFrames, but the return value will
        # always be a list of numpy arrays. Each array corresponds t

... or use the describe function:

In [4]:
from autora.experiment_runner.synthetic.utilities import describe

print(describe(s))


    An agent that runs simple Q-learning for an n-armed bandits tasks.

    Args:
        name: name of the experiment
        trials: number of trials
        learning_rate: learning rate for Q-learning
        decision_noise: softmax parameter for decision noise
        n_actions: number of actions
        forget_rate: rate of forgetting
        perseverance_bias: bias towards choosing the previously chosen action
        correlated_reward: whether rewards are correlated

    Examples:
        >>> experiment = q_learning()

        # The runner can accept numpy arrays or pandas DataFrames, but the return value will
        # always be a list of numpy arrays. Each array corresponds to the choices made by the agent
        # for each trial in the input. Thus, arrays have shape (n_trials, n_actions).
        >>> experiment.run(np.array([[0, 1], [0, 1], [0, 1], [1, 0], [1, 0], [1, 0]]),
        ...                random_state=42)
        [array([[1., 0.],
               [0., 1.],
      

The synthetic experiement `s` has properties like the name of the experiment:

In [5]:
s.name

'Q-Learning'

... a valid variables description:

In [6]:
s.variables

VariableCollection(independent_variables=[IV(name='reward array', value_range=None, allowed_values=None, units='reward', type=<ValueType.BOOLEAN: 'boolean'>, variable_label='Reward Sequence', rescale=1, is_covariate=False)], dependent_variables=[DV(name='choice array', value_range=None, allowed_values=None, units='actions', type=<ValueType.REAL: 'real'>, variable_label='Action Sequence', rescale=1, is_covariate=False)], covariates=[])

... the conditions for this experiment are reward sequences. This is a variable type not yet fully integrated in AutoRA. Therefore ther is no domain yet:

In [7]:
x = s.domain()
x

... the plotter is not implemented yet:

In [8]:
s.plotter()

NotImplementedError: 

We can wrap this functions to use with the state logic of AutoRA:
First, we create the state with the variables:

In [9]:
from autora.state import StandardState, on_state, Delta, experiment_runner_on_state, estimator_on_state
# We can get the variables from the runner
variables = s.variables

# With the variables, we initialize a StandardState
state = StandardState(variables)

Here, we use a special experimentalist that can generate random trial sequences and wrap it with the `on_state` function to use them on state:

In [10]:
%%capture
!pip install autora-experimentalist-bandit-random

In [11]:
from autora.experimentalist.bandit_random import bandit_random_pool
# Wrap the functions to use on state
# Experimentalists:

@on_state()
def pool_on_state(num_samples):
      return Delta(conditions=bandit_random_pool(num_rewards=2, sequence_length=20, num_samples=num_samples))


state = pool_on_state(state, num_samples=2)
print(state.conditions)

                                        reward array
0  [[0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1...
1  [[0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1...


Wrap the runner with the `experiment_runner_on_state` wrapper to use it on state:

In [12]:
# Runner:
run_on_state = experiment_runner_on_state(s.run)
state = run_on_state(state)

state.experiment_data

Unnamed: 0,reward array,choice array
0,"[[0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1...","[[0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0..."
1,"[[0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1...","[[0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0..."


Wrap the regressor with the `estimator_on_state` wrapper:

In [17]:
theorist = LinearRegression()
theorist_on_state = estimator_on_state(theorist)

state = theorist_on_state(state)
# Access the last model:
model = state.models[-1]


print(f"choose_A1 = "
      f"{model.coef_[0][0]:.2f}*similarity_category_A1 "
      f"{model.coef_[0][1]:.2f}*similarity_category_A2 "
      f"{model.coef_[0][2]:.2f}*similarity_category_B1 "
      f"{model.coef_[0][3]:.2f}*similarity_category_B2 "
      f"{model.intercept_[0]:+.2f} ")

Epoch 86/100 --- Loss: 0.5586882; Time: 0.0762s; Convergence value: 1.78e-01
Epoch 87/100 --- Loss: 0.7901477; Time: 0.0767s; Convergence value: 1.82e-01
Epoch 88/100 --- Loss: 0.5265486; Time: 0.0751s; Convergence value: 1.92e-01
Epoch 89/100 --- Loss: 0.4401408; Time: 0.0743s; Convergence value: 1.86e-01
Epoch 90/100 --- Loss: 0.3039415; Time: 0.0756s; Convergence value: 1.82e-01
Epoch 91/100 --- Loss: 0.3906522; Time: 0.0771s; Convergence value: 1.73e-01
Epoch 92/100 --- Loss: 0.5437022; Time: 0.0769s; Convergence value: 1.65e-01
Epoch 93/100 --- Loss: 0.4635772; Time: 0.0737s; Convergence value: 1.54e-01
Epoch 94/100 --- Loss: 0.4845441; Time: 0.0743s; Convergence value: 1.48e-01
Epoch 95/100 --- Loss: 0.2648371; Time: 0.0770s; Convergence value: 1.56e-01
Epoch 96/100 --- Loss: 0.3382604; Time: 0.0748s; Convergence value: 1.37e-01
Epoch 97/100 --- Loss: 0.2581106; Time: 0.0742s; Convergence value: 1.25e-01
Epoch 98/100 --- Loss: 0.6365235; Time: 0.0737s; Convergence value: 1.47e-01

RuntimeError: Parent directory trained_models does not exist.