Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Behavior in Agent Registration Between .NET and Python Libraries #5314

Open
abdomohamed opened this issue Feb 1, 2025 · 2 comments
Labels
dotnet issues related to AutoGen.Net proj-extensions x-lang
Milestone

Comments

@abdomohamed
Copy link

Description: I have encountered an issue where the behavior of the runtime worker trying to register an agent to the Servicer differs between .NET and Python libraries. Below are the code snippets that demonstrate the registration process:

.NET Code:

https://github.com/microsoft/autogen/blob/756e2a4865b83276270fa8a23b86297b06d88f92/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py#L191C1-L210C60

`

    if (!_supportedAgentTypes.TryGetValue(registration.Type, out var supportedAgentTypes))
    {
        supportedAgentTypes = _supportedAgentTypes[registration.Type] = [];
    }

    if (!supportedAgentTypes.Contains(gateway))
    {
        supportedAgentTypes.Add(gateway);
    }

    var workerState = GetOrAddWorker(gateway);
    workerState.SupportedTypes.Add(registration.Type);

    await state.WriteStateAsync().ConfigureAwait(false);

`

Python Code:

https://github.com/microsoft/autogen/blob/756e2a4865b83276270fa8a23b86297b06d88f92/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py#L191C1-L210C60

`

    async with self._agent_type_to_client_id_lock:
        if request.type in self._agent_type_to_client_id:
            existing_client_id = self._agent_type_to_client_id[request.type]
            await context.abort(
                grpc.StatusCode.INVALID_ARGUMENT,
                f"Agent type {request.type} already registered with client {existing_client_id}.",
            )
        else:
            self._agent_type_to_client_id[request.type] = client_id


    return agent_worker_pb2.RegisterAgentTypeResponse()`

There is a discrepancy in the behavior of the runtime worker when registering an agent to the Servicer between the .NET and Python libraries. In the .NET version, a HashSet is used for tracking the registered agent types, making the registration operation idempotent. If the worker registers the same type again, the operation is treated as successful, but it wouldn't add the agent twice. The Python version, on the other hand, throws an exception if the agent type is already registered, preventing duplicate registrations.

This behavior in Python could be problematic if the worker-runtime doesn't receive the acknowledgment for the registration request from the Servicer. In such cases, the worker would retry the operation, leading to a scenario where the agent might already be registered, but the acknowledgment failed. This could cause the worker to keep retrying until it fails.

The ways is see to handle this issue can be to either:

  • Add a try-catch on the registration of the agent and unpack the error message to validate if it is related to this behavior.
  • Delete the registration from the Servicer, which would allow the worker to register the agent again.
@raimondasl
Copy link
Contributor

.NET link is a duplicate of Python link. It should be

public async ValueTask RegisterAgentTypeAsync(RegisterAgentTypeRequest registration, IGateway gateway)
{
if (!_supportedAgentTypes.TryGetValue(registration.Type, out var supportedAgentTypes))
{
supportedAgentTypes = _supportedAgentTypes[registration.Type] = [];
}
if (!supportedAgentTypes.Contains(gateway))
{
supportedAgentTypes.Add(gateway);
}
var workerState = GetOrAddWorker(gateway);
workerState.SupportedTypes.Add(registration.Type);
await state.WriteStateAsync().ConfigureAwait(false);
}

@raimondasl raimondasl added dotnet issues related to AutoGen.Net proj-extensions labels Feb 1, 2025
@jackgerrits
Copy link
Member

We're in the progress of rewriting the dotnet servicer, so it's a good time to fix this. In general we want it to be possible for a client to recover from losing connection and so reregistering an existing registration is something that should succeed. Perhaps we should require client id and type to match for a re-register.

Anyway, I think Python's current behavior is wrong as you pointed out so we should fix that. Thanks for the issue!

@jackgerrits jackgerrits added this to the python-v0.4.6 milestone Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dotnet issues related to AutoGen.Net proj-extensions x-lang
Projects
None yet
Development

No branches or pull requests

3 participants