-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Conditional Contextual Bandit
Conditional Contextual Bandit (CCB) is an extension over Contextual Bandit (CB), where there are multiple slots in which an action can be chosen. There is a shared context, as well as features for each action and slot. The CCB reduction (ccb_explore_adf
) calls into cb_sample
(see Sampling) and then cb_explore_adf
, and so it essentially reduces into sequential CB operations. There is an id assigned to each slot automatically in a reserved namespace. Interactions are then added for every namespace and interaction with this reserved namespace to learn the slot interactions. Rewards can be specified for any slot.
CCB provides several improvements:
- Ability to learn from any slot, not just the top action
- Diversity in predictions
- Richer learning around slot dependent situations
Since there are several calls to CB, in order for exploration to work each CB result must be sampled between each call. This is done automatically by the cb_sample
reduction by potentially swapping the top action based on the pdf produced by the underlying cb_explore_adf
call. Since exploration is done for every slot it is recommended to divide your epsilon by the number of slots in order to maintain a similar exploration amount.(epsilon/num_slots
)
The label type of CCB is CCB::label. It contains the example type as one of shared, action, slot. An outcome if it was supplied (for labelled examples) and the currently unused explicitly_included_actions. The outcome is the cost associated with this example and all action probability pairs for this slot. You can see that this information directly corresponds to the information encoded in the text format section.
struct conditional_contexual_bandit_outcome
{
float cost;
ACTION_SCORE::action_scores probabilities;
};
enum example_type : uint8_t
{
unset = 0,
shared = 1,
action = 2,
slot = 3
};
struct label {
example_type type;
conditional_contexual_bandit_outcome* outcome;
v_array<uint32_t> explicit_included_actions;
};
The prediction type for CCB is CCB::decision_scores_t
, defined as follows:
typedef v_array<ACTION_SCORE::action_scores> decision_scores_t;
This prediction contains an array of action scores for every slot. Therefore, the chosen actions are the items in index 0 of every array. The rest of the contents of each of these arrays are the results of each CB call. The probability values can be used to determine if the top action is an explore or exploit action by observing if it is the largest and unique probability or a smaller and duplicated probability.
CCB format is a multi line example format with 3 different example/line types. Lines are identified by explicit types as part of the label. This is different to the previous implicit action example type.
ccb shared | ...
ccb action | ...
ccb slot [<chosen_action>:<cost>:<probability>[,<action>:<probability,...] [action_ids_to_include,...] | ...
- Both additional sections in the
slot
label are optional - If
action_ids_to_include
is excluded then all actions are implicitly included- This is currently unsupported
- Action ids are zero indexed
- The list of action probability pairs in the first section is optional
- If included, the entire collection of probabilities must sum to 1.0
- Test labels omit the entire
chosen_action:cost:probability
section
ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5
ccb action | c:1
ccb slot | d:4
ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
The JSON format is identical to the CB format, with the addition of _slots
field. The _slots
field contains all of the slot information similar to _multi
for actions. It is an array of objects, where each object is one slot. _a
can be supplied to specify the explicit included actions but this is not included in the reduction yet.
{
"shared_feature": "feature",
"_multi": [
{
"feature1": 3.0,
"feature2": "name1"
},
{
"feature1": 2.0,
"feature2": "name2"
},
{
"feature1": 3.0,
"feature2": "name3"
}
],
"_slots": [
{
"size": "small",
"_a": [
0,
2
]
},
{
"size": "large"
}
]
}
The DSJSON format for CCB is also similar to CB. The context field, c
, is the same as for CB, where it is a valid object in VW JSON format. Therefore the slots are defined in the context field. The _outcomes
field contains an object per slot. This specifies the cost associated with this slot, the outcomes reported for this slot as well as either an array or single value for both actions and probabilities.
{
"Timestamp": "timestamp_utc",
"Version": "1",
"c": {
"shared_feature": "feature",
"_multi": [
{
"feature1": 3.0,
"feature2": "name1"
},
{
"feature1": 2.0,
"feature2": "name2"
},
{
"feature1": 3.0,
"feature2": "name3"
}
],
"_slots": [
{
"size": "small",
"_a": [
0,
2
]
},
{
"size": "large"
}
]
},
"_outcomes": [
{
"_id": "id1",
"_label_cost": 0,
"_a": [
2,
0,
1
],
"_p": [
0.67,
0.165,
0.165
],
"_o": []
},
{
"_id": "id2",
"_label_cost": 0,
"_a": 1,
"_p": 0.34,
"_o": [
-1.0,
0.0
]
}
],
"VWState": {
"m": "vm_state"
}
}
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: