Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTPYTHON-504 Add DriverInfo to MongoClients #73

Merged

Conversation

caseyclements
Copy link
Collaborator

@caseyclements caseyclements commented Jan 31, 2025

Consistently add DriverInfo to MongoClient calls so that we track metadata.

@caseyclements caseyclements requested a review from prakul January 31, 2025 17:47
vector_store = MongoDBAtlasVectorSearch(
collection=MONGODB_COLLECTION,
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb+srv://user:[email protected]/?w=majority&appName=Cluster0"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe, to make this easier on the eyes, to remove all of the metadata in the connection_string? So it's just
connection_string="mongodb+srv://user:[email protected]?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the parts in that are part of the standard Atlas connection string makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it to MONGODB_ATLAS_CLUSTER_URI

client = MongoClient(connection_string)
client = MongoClient(
connection_string,
driver=DriverInfo(name="Langchain", version=version("langchain-mongodb")),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the official capitalization LangChain, or use langchain-mongodb if we need to differentiate from langgraph (though, can we just use LangGraph?). We should not need to report language specifically, because that will be known based on the driver.

I believe the version is intended to be the library version number, which should equal 0.2.69. If it's not available, it seems the empty string is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had similar questions originally. The decision to use Langchain, not LangChain was made long ago, and fees a number of BI (business intelligence) reports. langchain-mongodb is actually our package name. (e.g. from langchain_mongodb.pipelines import vector_search_stage). In any case, I am not starting from scratch here. I am just doing a review of our monorepo and adding the DriverInfo where it could have been added previously.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the main thing for the library naming is consistency.

Regarding the version, our handshake specification says the following:

The version of the library wrapping the driver.

I would agree with what @blink1073 is suggesting below, which is adding the package or feature, if needed, to the name, for example Langchain|chat-message (and the versions would then be 0.2.69|). This is what we would do in Java. Note that:

The entire client metadata BSON document MUST NOT exceed 512 bytes.

This means that we have relatively few bytes to work with, and a long name like langchain-mongodb-chat-message increases the chance that the entire driver info is dropped from the message.

vector_store = MongoDBAtlasVectorSearch(
collection=MONGODB_COLLECTION,
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb+srv://user:[email protected]/?w=majority&appName=Cluster0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the parts in that are part of the standard Atlas connection string makes sense

collection=MONGODB_COLLECTION,
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb+srv://user:[email protected]/?w=majority&appName=Cluster0"
namespace=f"db_name.collection_name",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary f-string

Copy link
Collaborator

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

self.client = MongoClient(
connection_string,
driver=DriverInfo(
name="Langchain", version=version("langchain-mongodb")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we overload the version info to also include info about the specific feature?
for instance here it would be version=version("langchain-mongodb-chat-message")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version is only tracked at the package level. If we want to include more info, we usually add it to the name, like PyMongo|c|async, so in this case it would be Langchain|chat-message.

@caseyclements caseyclements merged commit 88a0858 into langchain-ai:main Feb 5, 2025
15 checks passed
@caseyclements caseyclements deleted the adds-driverinfo-to-mongoclients branch February 5, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants