-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DASH HA #3
base: master
Are you sure you want to change the base?
Conversation
experimental/saiexperimentaldashha.h
Outdated
* @flags CREATE_AND_SET | ||
* @default 100 | ||
*/ | ||
SAI_DASH_HA_ATTR_VIP_CONVERGENCE_TIMEOUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this SWITCHOVER_TIMEOUT.
add a usage comment to indicate planned switchover
* @flags CREATE_ONLY | ||
* @default empty | ||
*/ | ||
SAI_DASH_DEVICE_ATTR_TRUSTED_VNI_LIST = SAI_DASH_DEVICE_ATTR_START, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the VNI APIs in DASH, so we can use that to move forward without introducing the new one.
if any other things required on DPU level, they could go to switch extensions.
/** End of custom range base */ | ||
SAI_DASH_HA_ATTR_CUSTOM_RANGE_END | ||
|
||
} sai_dash_ha_attr_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the HA set, since it defines the DPU pair, so we could rename this to ha set, which will better align with the HLD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One mismatch is that here the HA object has multiple VIP ID support for the same DPU pair.
* @flags CREATE_AND_SET | ||
* @default empty | ||
*/ | ||
SAI_DASH_HA_ATTR_VIP_INFO_LIST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a list with structs is not going to be DASH compatible, because we won't be able to model this in p4 - the action parameter doesn't take struct as input, hence both bmv2 and API generation will fail.
* @flags CREATE_AND_SET | ||
* @default NULL | ||
*/ | ||
SAI_SWITCH_ATTR_DASH_HA_FSM_EVENT_NOTIFY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since DASH is still SAI extension, it is better to move this into the switch extension header we have in DASH.
SAI_DASH_HA_STAT_HEARTBEAT_MSG_RECEIVED, | ||
|
||
/** DASH HA stat flow sync requests sent */ | ||
SAI_DASH_HA_STAT_FLOW_SYNC_REQ_SENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to track flow sync req on ENI level, otherwise it won't be useful for live site debugging, especially post mortem analysis for historical issues (a specific VM was having problem)
SAI_DASH_HA_STAT_HEARTBEAT_MSG_SENT, | ||
|
||
/** DASH HA stat heartbeat messages received */ | ||
SAI_DASH_HA_STAT_HEARTBEAT_MSG_RECEIVED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for data path probe, we will need record the failed count. either for debugging or alerting, the failed count is usually more accurate and intuitive.
(sent can be more or less than received to due to timing issues, which is painful for logging metrics - is 95% success rate a problem? or what does 105% success rate mean...)
/** DASH HA stat flow sync acknowledgements sent */ | ||
SAI_DASH_HA_STAT_FLOW_SYNC_ACK_RECEIVED, | ||
|
||
} sai_dash_ha_stat_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please feel free to take a look of the counter design, which have the counters defined here included with many extra ones: https://github.com/r12f/DASH/blob/user/r12f/ha-session-api/documentation/high-avail/ha-api-hld.md#46-counters
* @flags CREATE_ONLY | ||
* @default 0 | ||
*/ | ||
SAI_ENI_ATTR_HA_VIP_ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be better to avoid using VIP as the id to control the HA, since it creates a strong binding between VIP and HA scope.
* @type bool | ||
* @flags MANDATORY_ON_CREATE | CREATE_ONLY | ||
*/ | ||
SAI_ENI_ATTR_HA_OWNER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name "HA owner" is a bit confusing, as it makes people think it is active side. maybe we could come up with a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like FLOW_MASTER or CONFIG_OWNER
/** HA Stop followed by HA Start API needs to be invoked */ | ||
SAI_DASH_HA_FSM_EVENT_STOP_START_REQUIRED, | ||
|
||
} sai_dash_ha_fsm_event_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be a missing feature:
for every HA state change, we need it to be reported all the way back to SDN controller, and this event is missing here. this event is critical for debugging and monitoring, so it is better to add it.
sai_get_dash_ha_stats_ext_fn get_dash_ha_stats_ext; | ||
sai_clear_dash_ha_stats_fn clear_dash_ha_stats; | ||
sai_clear_dash_ha_all_stats_fn clear_dash_ha_all_stats; | ||
sai_start_dash_ha_fsm_fn start_dash_ha_fsm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these APIs will be non-standard APIs in SAI API, and it will be better to avoid these changes. otherwise, to support it there will be a lot of additional changes in sairedis, vendor sai, client/server sai, syncd and so on.
and as an example: start_dash_ha_fsm can be an sai set API call to update the oper_role or other attribuet on ha fsm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for stop, activate, switchover.
for doing flow resimulation, we will provide the flow resimulation api in separate dedicated doc, so we don't need it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These triggers are per vip-id in the API model.
*/ | ||
typedef enum _sai_dash_ha_oper_role_t | ||
{ | ||
SAI_DASH_HA_OPER_ROLE_NONE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments to describe each role
63f8d9c
to
0773867
Compare
No description provided.