A/B Testing

A/B testing lets you show different bot behavior to different customers and compare the results. You might want to test whether a generous refund offer retains more customers than a conservative one, or whether asking one clarifying question converts better than asking three.

Octocom handles A/B testing through condition providers and workflow variants. The idea is simple: when a workflow runs, a condition provider randomly assigns the customer to a group, and the system routes them to the matching variant — each with its own instructions.

Every customer stays in the same group for the entire conversation, and the assignment is automatically tagged so you can filter and compare results later.

A simple test: two response styles

Let's say you want to test whether a friendly, casual tone leads to better outcomes than a formal one when handling returns.

Step 1 — Create a condition provider

def evaluate_conditions(context):
    variant = get_ab_test_variant(context, "return-tone", ["casual", "formal"])

    return {
        "conditions": {
            "isCasual": variant == "casual",
            "isFormal": variant == "formal",
        },
        "data": {},
    }

That's the entire condition provider. get_ab_test_variant randomly picks a group (50/50 by default), remembers it for this conversation, and tags the conversation with ab:return-tone:casual or ab:return-tone:formal.

Step 2 — Set up workflow variants

In your "Return Request" workflow, create two variants:

Variant	Condition	Instructions
Casual	`isCasual`	Use a warm, conversational tone. Use first names. Say things like "No worries, let's get this sorted!"
Formal	`isFormal`	Use a professional tone. Address the customer formally. Say "We apologize for the inconvenience."

Both variants follow the same steps — collect order info, confirm the return — but with different wording. The system routes each customer to one variant based on the condition provider's output.

Step 3 — Measure results

After the test has run for a while, filter conversations in the dashboard by the tags ab:return-tone:casual and ab:return-tone:formal. Compare metrics like resolution rate, handoff rate, or customer satisfaction.

Controlling the split with weights

By default, get_ab_test_variant splits traffic evenly. If you want to be cautious — say, showing a new flow to only 20% of customers — use weighted variants:

def evaluate_conditions(context):
    variant = get_ab_test_variant(
        context,
        "new-cancel-flow",
        [
            {"name": "control", "weight": 4},  # 80%
            {"name": "new_flow", "weight": 1},  # 20%
        ],
    )

    return {
        "conditions": {
            "isNewFlow": variant == "new_flow",
        },
        "data": {},
    }

Weights are relative — 4 and 1 mean 80/20. You could also write 80 and 20 for the same result.

With only two groups, you only need one condition. The "control" variant is the default (last in the variant list, no conditions), and the "new_flow" variant matches when isNewFlow is true.

Variant	Condition	Instructions
New flow	`isNewFlow`	The new cancellation flow
Control	(none)	The existing cancellation flow

Combining A/B testing with real data

In practice, you often want to A/B test and check external state in the same condition provider. For example: test two different refund offer strategies, but only for orders that haven't already been refunded.

import requests

def evaluate_conditions(context):
    order_id = context["args"]["orderId"]

    # Fetch order state from your system
    response = requests.get(
        "https://api.example.com/orders",
        params={"id": order_id},
        timeout=30,
    )

    if response.status_code != 200:
        return {"conditions": {"orderNotFound": True}, "data": {}}

    order = response.json()

    # Determine A/B variant
    variant = get_ab_test_variant(context, "refund-strategy", ["standard", "generous"])

    return {
        "conditions": {
            "orderNotFound": False,
            "isRefunded": order.get("refundedAt") is not None,
            "isShipped": order.get("shippedAt") is not None,
            "isGenerous": variant == "generous",
        },
        "data": {
            "orderTotal": order.get("total"),
        },
    }

The workflow variants handle both the state checks and the A/B split:

#	Variant	Conditions	Instructions
0	Not found	`orderNotFound`	Ask customer to double-check order ID
1	Already refunded	`isRefunded`	Inform customer, no further action
2	Generous offer	`isShipped`, `isGenerous`	Offer 30% partial refund to keep the order
3	Standard offer	`isShipped`	Offer 15% partial refund to keep the order
4	Default	(none)	Process cancellation normally

Notice how the state checks (not found, already refunded) come first — they always take priority. The A/B split only matters for the "shipped" case where you're testing different retention strategies.

What happens behind the scenes

When get_ab_test_variant runs for the first time in a conversation:

It randomly assigns a group based on weights
It saves the assignment as conversation metadata (ab_test:your-test-name)
It tags the conversation with ab:your-test-name:variant-name
It records a conversation event

If the same test name is called again in the same conversation, it returns the saved assignment — the customer always sees the same variant.

Tips

Start with equal weights. Only skew the split if you're worried about the impact of the new variant. You can always adjust later.
Test one thing at a time. If you change the tone and the refund amount in the same test, you won't know which made the difference.
Give it time. A/B tests need enough conversations to be meaningful. Don't draw conclusions from 20 conversations.
Use descriptive test names. "refund-strategy-v2" is better than "test1". The name shows up in tags and metadata, so make it easy to find later.
Clean up when done. Once you've picked a winner, update the workflow to use the winning variant and remove the condition provider. There's no need to keep the test running.