(7) Generating Text-Based Experiments for LLMs¶

Large language models (LLMs) have the potential to simulate human behavior in a variety of tasks, making them valuable simulators for behavioral experiments. SweetBean provides functionality for generating prompts for LLMs based on the same experiment specification used for human participants. In this example, we will execute our task switching experiment on a large language model.

Note: If you are running this tutorial in Google Colaboratory, we recommend changing the Runtime type to "T4 GPU".

Installing sweetbean¶

In [1]:

Copied!

!pip install sweetbean
!pip install sweetbean

Requirement already satisfied: sweetbean in /Users/younesstrittmatter/Documents/GitHub/AutoResearch/sweetbean/.venv/lib/python3.11/site-packages (0.1.1.dev28+g4b938d1.d20241122)
Requirement already satisfied: jinja2 in /Users/younesstrittmatter/Documents/GitHub/AutoResearch/sweetbean/.venv/lib/python3.11/site-packages (from sweetbean) (3.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/younesstrittmatter/Documents/GitHub/AutoResearch/sweetbean/.venv/lib/python3.11/site-packages (from jinja2->sweetbean) (3.0.2)

[notice] A new release of pip is available: 23.1.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip

Experiment specification¶

Below, we have the complete task switching experiment from Tutorial (5). Let's run the code below.

In [2]:

Copied!





from sweetbean import Block, Experiment
from sweetbean.variable import TimelineVariable
from sweetbean.stimulus import Text

## Specify timeline

timeline = [
    {'color': 'red', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'green', 'word': 'GREEN', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
    {'color': 'red', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
]

# declare timeline variables

color = TimelineVariable('color')
word = TimelineVariable('word')
task = TimelineVariable('task')

# Define the instruction text blocks

instruction_welcome = Text(
    text='Welcome to our task-switching experiment.<br><br> \
          In this experiment, you will alternate between two tasks: color naming and word reading.<br><br> \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_fixation = Text(
    text='At the beginning of each trial, you will see a fixation cue:<br><br> \
          A "+" means you should perform the color-naming task.<br> \
          An "x" means you should perform the word-reading task.<br><br> \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_tasks = Text(
    text='For the color-naming task:<br> \
          Identify the COLOR of the text, ignoring the word.<br><br> \
          For the word-reading task:<br> \
          Read the WORD, ignoring its color.<br><br> \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_responses = Text(
    text='You will respond using the following keys:<br><br> \
          For RED (color or word): press the "f" key.<br> \
          For GREEN (color or word): press the "j" key.<br><br> \
          The stimulus will be displayed for a short period of time, so respond quickly.<br><br> \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_note = Text(
    text='Remember:<br> \
          Pay attention to the fixation cue ("+" for color naming or "x" for word reading)<br><br> \
          to determine the task.<br><br> \
          Press the SPACE key to BEGIN the experiment.',
    choices=[' ']
)

instruction_list = [
    instruction_welcome,
    instruction_fixation,
    instruction_tasks,
    instruction_responses,
    instruction_note
]

instruction_block = Block(instruction_list)

# Determine fixation cross based on task

from sweetbean.variable import FunctionVariable

def fixation_shape_fct(task):
    if task == 'color_naming':
        return '+'
    return 'x'


fixation_shape = FunctionVariable('fixation_shape', fixation_shape_fct, [task])

# Determine correct response based on task, color, and word

def correct_key_fct(word, color, task):
    if (task == 'word_reading' and word == 'RED') or \
        (task == 'color_naming' and color == 'red'):
        return 'f'
    return 'j'

correct_key = FunctionVariable('correct_key', correct_key_fct, [word, color, task])

# Combine stimuli
fixation = Text(1000, fixation_shape)
so_s = Text(800)
stroop = Text(2000, word, color, ['f', 'j'], correct_key)
so_f = Text(300)

# Declare block
task_switching_block = Block([fixation, so_s, stroop, so_f], timeline)
task_switching_block = Block([fixation, stroop], timeline)
experiment = Experiment([task_switching_block])
from sweetbean import Block, Experiment
from sweetbean.variable import TimelineVariable
from sweetbean.stimulus import Text

## Specify timeline

timeline = [
    {'color': 'red', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'green', 'word': 'GREEN', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
    {'color': 'red', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'word_reading'},
    {'color': 'red', 'word': 'GREEN', 'task': 'color_naming'},
    {'color': 'green', 'word': 'RED', 'task': 'color_naming'},
    {'color': 'red', 'word': 'GREEN', 'task': 'word_reading'},
]

# declare timeline variables

color = TimelineVariable('color')
word = TimelineVariable('word')
task = TimelineVariable('task')

# Define the instruction text blocks

instruction_welcome = Text(
    text='Welcome to our task-switching experiment.

 \
          In this experiment, you will alternate between two tasks: color naming and word reading.

 \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_fixation = Text(
    text='At the beginning of each trial, you will see a fixation cue:

 \
          A "+" means you should perform the color-naming task.
 \
          An "x" means you should perform the word-reading task.

 \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_tasks = Text(
    text='For the color-naming task:
 \
          Identify the COLOR of the text, ignoring the word.

 \
          For the word-reading task:
 \
          Read the WORD, ignoring its color.

 \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_responses = Text(
    text='You will respond using the following keys:

 \
          For RED (color or word): press the "f" key.
 \
          For GREEN (color or word): press the "j" key.

 \
          The stimulus will be displayed for a short period of time, so respond quickly.

 \
          Press the SPACE key to continue.',
    choices=[' ']
)

instruction_note = Text(
    text='Remember:
 \
          Pay attention to the fixation cue ("+" for color naming or "x" for word reading)

 \
          to determine the task.

 \
          Press the SPACE key to BEGIN the experiment.',
    choices=[' ']
)

instruction_list = [
    instruction_welcome,
    instruction_fixation,
    instruction_tasks,
    instruction_responses,
    instruction_note
]

instruction_block = Block(instruction_list)

# Determine fixation cross based on task

from sweetbean.variable import FunctionVariable

def fixation_shape_fct(task):
    if task == 'color_naming':
        return '+'
    return 'x'


fixation_shape = FunctionVariable('fixation_shape', fixation_shape_fct, [task])

# Determine correct response based on task, color, and word

def correct_key_fct(word, color, task):
    if (task == 'word_reading' and word == 'RED') or \
        (task == 'color_naming' and color == 'red'):
        return 'f'
    return 'j'

correct_key = FunctionVariable('correct_key', correct_key_fct, [word, color, task])

# Combine stimuli
fixation = Text(1000, fixation_shape)
so_s = Text(800)
stroop = Text(2000, word, color, ['f', 'j'], correct_key)
so_f = Text(300)

# Declare block
task_switching_block = Block([fixation, so_s, stroop, so_f], timeline)
task_switching_block = Block([fixation, stroop], timeline)
experiment = Experiment([task_switching_block])

Running the experiment in natural language¶

Instead of compiling the experiment into a web-based version for human participants, we can generate a text-based version for LLMs in natural language. Execute the following line to execute the experiment in natural language. You must enter key presses for each stimulus that requires a response.

In [3]:

Copied!

data = experiment.run_on_language(get_input=input)
data = experiment.run_on_language(get_input=input)

The run_on_language function will return a dictionary with the data from the experiment.

Execute the experiment with an LLM as participant¶

Let's install a package for running LLMs.

In [ ]:

Copied!

!pip install unsloth "xformers==0.0.28.post2"
!pip install unsloth "xformers==0.0.28.post2"

Next, we want to execute the experiment with an LLM. Here, we use Centaur – an LLM fine-tuned to human behavior in cognitive psychology experiments. However, any model (for example, using OpenAI, HuggingFace, LLama, or Google API) can be used as a synthetic participant.

In order to execute the experiment with an LLM, we need to define a function that returns the LLMs response to every instruction it receives.

In [ ]:

Copied!





from unsloth import FastLanguageModel
import transformers

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="marcelbinz/Llama-3.1-Centaur-8B-adapter",
    max_seq_length=32768,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

pipe = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True,
    pad_token_id=0,
    do_sample=True,
    temperature=1.0,
    max_new_tokens=1,
)


def generate(input):
    return pipe(input)[0]["generated_text"][len(input):]
from unsloth import FastLanguageModel
import transformers

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="marcelbinz/Llama-3.1-Centaur-8B-adapter",
    max_seq_length=32768,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

pipe = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True,
    pad_token_id=0,
    do_sample=True,
    temperature=1.0,
    max_new_tokens=1,
)


def generate(input):
    return pipe(input)[0]["generated_text"][len(input):]

Next, we can run the experiment on an LLM. Note that the LLM receives the full trial history as a prompt (including information about the responses it made in previous trials).

In [ ]:

Copied!

data = experiment.run_on_language(get_input=generate)
data = experiment.run_on_language(get_input=generate)

Let's have a look at the LLM's response. In this case, we want to index the first block of the experiment (0) and the eighth trial (7):

In [ ]:

Copied!

data[0][7]
data[0][7]

We observe that the network correctly responded with the key "J" which was incorrect.

Note that the data produced here is in the same format as the data produced via web-experiments with human participants. This functionality eases the comparison of human and LLM behavior, and enables the alignment of LLMs to behavior of human participants in combination with automated data collection via web-experiments, as supported by AutoRA.