diff --git a/docs/40-dev-env/2-setup-pre-reqs.mdx b/docs/40-dev-env/2-setup-pre-reqs.mdx index 12b0e0bb..174e014a 100644 --- a/docs/40-dev-env/2-setup-pre-reqs.mdx +++ b/docs/40-dev-env/2-setup-pre-reqs.mdx @@ -1,3 +1,3 @@ # ๐Ÿ‘ Setup prerequisites -Fill in any `` placeholders and run the cells under the **Step 1: Install libraries** and **Step 2: Setup prerequisites** sections in the notebook. \ No newline at end of file +Replace any placeholders and run the cells under the **Step 1: Install libraries** and **Step 2: Setup prerequisites** sections in the notebook. \ No newline at end of file diff --git a/docs/50-prepare-the-data/3-chunk-data.mdx b/docs/50-prepare-the-data/3-chunk-data.mdx index 4e9d138f..9d422e53 100644 --- a/docs/50-prepare-the-data/3-chunk-data.mdx +++ b/docs/50-prepare-the-data/3-chunk-data.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 4: The answers for code blocks in this section are as follows: -**CODE_BLOCK_3** +**CODE_BLOCK_1**
Answer @@ -19,18 +19,7 @@ RecursiveCharacterTextSplitter.from_tiktoken_encoder(
-**CODE_BLOCK_4** - -
-Answer -
-```python -doc[text_field] -``` -
-
- -**CODE_BLOCK_5** +**CODE_BLOCK_2**
Answer @@ -41,29 +30,13 @@ text_splitter.split_text(text)
-**CODE_BLOCK_6** - -
-Answer -
-```python -for chunk in chunks: - temp = doc.copy() - temp[text_field] = chunk - chunked_data.append(temp) -``` -
-
- -**CODE_BLOCK_7** +**CODE_BLOCK_3**
Answer
```python -for doc in docs: - chunks = get_chunks(doc, "body") - split_docs.extend(chunks) +get_chunks(doc, "body") ```
\ No newline at end of file diff --git a/docs/50-prepare-the-data/4-embed-data.mdx b/docs/50-prepare-the-data/4-embed-data.mdx index c2a21f3a..e628b7b7 100644 --- a/docs/50-prepare-the-data/4-embed-data.mdx +++ b/docs/50-prepare-the-data/4-embed-data.mdx @@ -6,38 +6,24 @@ Fill in any `` placeholders and run the cells under the **Step 5: The answers for code blocks in this section are as follows: -**CODE_BLOCK_8** +**CODE_BLOCK_4**
Answer
```python -SentenceTransformer("thenlper/gte-small") +embedding_model.encode(text) ```
-**CODE_BLOCK_9** +**CODE_BLOCK_5**
Answer
```python -embedding = embedding_model.encode(text) -return embedding.tolist() -``` -
-
- -**CODE_BLOCK_10** - -
-Answer -
-```python -for doc in tqdm(split_docs): - doc["embedding"] = get_embedding(doc["body"]) - embedded_docs.append(doc) +doc["embedding"] = get_embedding(doc["body"]) ```
diff --git a/docs/50-prepare-the-data/5-ingest-data.mdx b/docs/50-prepare-the-data/5-ingest-data.mdx index 860288ea..235a98d8 100644 --- a/docs/50-prepare-the-data/5-ingest-data.mdx +++ b/docs/50-prepare-the-data/5-ingest-data.mdx @@ -8,7 +8,7 @@ Fill in any `` placeholders and run the cells under the **Step 6: The answers for code blocks in this section are as follows: -**CODE_BLOCK_11** +**CODE_BLOCK_6**
Answer @@ -19,18 +19,7 @@ mongodb_client[DB_NAME][COLLECTION_NAME]
-**CODE_BLOCK_12** - -
-Answer -
-```python -collection.delete_many({}) -``` -
-
- -**CODE_BLOCK_13** +**CODE_BLOCK_7**
Answer diff --git a/docs/60-perform-semantic-search/3-vector-search.mdx b/docs/60-perform-semantic-search/3-vector-search.mdx index 5cb7d22a..805a3605 100644 --- a/docs/60-perform-semantic-search/3-vector-search.mdx +++ b/docs/60-perform-semantic-search/3-vector-search.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 8: The answers for code blocks in this section are as follows: -**CODE_BLOCK_14** +**CODE_BLOCK_8**
Answer @@ -17,7 +17,7 @@ get_embedding(user_query)
-**CODE_BLOCK_15** +**CODE_BLOCK_9**
Answer @@ -37,15 +37,15 @@ get_embedding(user_query) "$project": { "_id": 0, "body": 1, - "score": {"$meta": "vectorSearchScore"}, + "score": {"$meta": "vectorSearchScore"} } - }, + } ] ```
-**CODE_BLOCK_16** +**CODE_BLOCK_10**
Answer diff --git a/docs/60-perform-semantic-search/4-pre-filtering.mdx b/docs/60-perform-semantic-search/4-pre-filtering.mdx index 10a909a1..5d85fda1 100644 --- a/docs/60-perform-semantic-search/4-pre-filtering.mdx +++ b/docs/60-perform-semantic-search/4-pre-filtering.mdx @@ -2,39 +2,37 @@ Pre-filtering a technique to optimize vector search by only considering documents that match certain criteria during vector search. -Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€โ™€๏ธ Combine pre-filtering with vector search** section in the notebook to get a sense of how to combine pre-filtering with MongoDB Atlas Vector Search. +## Filter for documents where the content type is `Video` -:::caution -**DO NOT** actually modify the existing vector index definitions in the Atlas UI, or the existing pipeline definitions in the code. -::: - -The answers for code blocks in this section are as follows: - -**CODE_BLOCK_17** +To do this, you will first need to modify the vector search index you created previously. The new index definition should look as follows:
Answer
```python { - "fields": [ - { - "numDimensions": 1024, - "path": "embedding", - "similarity": "cosine", - "type": "vector" - }, - { - "path": "metadata.contentType", - "type": "filter" - } - ] + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + }, + { + "type":"filter", + "path":"metadata.contentType" + } + ] } ```
-**CODE_BLOCK_18** +Once you have updated the vector search index, fill in `` and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results. + +The answer for this code block is as follows: + +**CODE_BLOCK_11**
Answer @@ -43,9 +41,9 @@ The answers for code blocks in this section are as follows: [ { "$vectorSearch": { - "index": ATLAS_VECTOR_SEARCH_INDEX_NAME, - "queryVector": query_embedding, + "index": "vector_index", "path": "embedding", + "queryVector": query_embedding, "numCandidates": 150, "limit": 5, "filter": {"metadata.contentType": "Video"} @@ -63,35 +61,42 @@ The answers for code blocks in this section are as follows:
-**CODE_BLOCK_19** + +## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial` + +Again, you will need to modify the vector search index. The new index definition should look as follows:
Answer
```python { - "fields": [ - { - "numDimensions": 1024, - "path": "embedding", - "similarity": "cosine", - "type": "vector" - }, - { - "path": "metadata.contentType", - "type": "filter" - }, - { - "path": "updated", - "type": "filter" - } - ] + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + }, + { + "type":"filter", + "path":"metadata.contentType" + }, + { + "type":"filter", + "path":"updated" + } + ] } ```
-**CODE_BLOCK_20** +Once you have updated the vector search index, fill in `` and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results. + +The answer for this code block is as follows: + +**CODE_BLOCK_12**
Answer @@ -100,16 +105,14 @@ The answers for code blocks in this section are as follows: [ { "$vectorSearch": { - "index": ATLAS_VECTOR_SEARCH_INDEX_NAME, - "queryVector": query_embedding, + "index": "vector_index", "path": "embedding", + "queryVector": query_embedding, "numCandidates": 150, "limit": 5, "filter": { - "$and": [ - {"metadata.contentType": "Video"}, - {"updated": {"$gte": "2024-05-20"}} - ] + "metadata.contentType": "Tutorial", + "updated": {"$gte": "2024-05-19"} } } }, diff --git a/docs/70-build-rag-app/1-build-rag-app.mdx b/docs/70-build-rag-app/1-build-rag-app.mdx index 7d7cac1f..bc75df62 100644 --- a/docs/70-build-rag-app/1-build-rag-app.mdx +++ b/docs/70-build-rag-app/1-build-rag-app.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 9: The answers for code blocks in this section are as follows: -**CODE_BLOCK_21** +**CODE_BLOCK_13**
Answer @@ -17,34 +17,27 @@ vector_search(user_query)
-**CODE_BLOCK_22** +**CODE_BLOCK_14**
Answer
```python -"\n\n".join([d.get("body", "") for d in context]) +create_prompt(user_query) ```
-**CODE_BLOCK_23** +**CODE_BLOCK_15**
Answer
```python -response = fw_client.chat.completions.create( +fw_client.chat.completions.create( model=model, - temperature=0, - messages=[ - { - "role": "user", - "content": create_prompt(user_query), - } - ], + messages=[{"role": "user", "content": prompt}] ) -print(response.choices[0].message.content) ```
\ No newline at end of file diff --git a/docs/70-build-rag-app/2-add-reranking.mdx b/docs/70-build-rag-app/2-add-reranking.mdx index cf8406dd..c75eaa28 100644 --- a/docs/70-build-rag-app/2-add-reranking.mdx +++ b/docs/70-build-rag-app/2-add-reranking.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€ The answers for code blocks in this section are as follows: -**CODE_BLOCK_24** +**CODE_BLOCK_16**
Answer @@ -17,15 +17,4 @@ rerank_model.rank( ) ``` -
- -**CODE_BLOCK_25** - -
-Answer -
-```python -"\n\n".join([d.get("text", "") for d in reranked_documents]) -``` -
\ No newline at end of file diff --git a/docs/70-build-rag-app/3-stream-responses.mdx b/docs/70-build-rag-app/3-stream-responses.mdx index 1fb37998..f62611e6 100644 --- a/docs/70-build-rag-app/3-stream-responses.mdx +++ b/docs/70-build-rag-app/3-stream-responses.mdx @@ -6,7 +6,18 @@ Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€ The answers for code blocks in this section are as follows: -**CODE_BLOCK_26** +**CODE_BLOCK_17** + +
+Answer +
+```python +create_prompt(user_query) +``` +
+
+ +**CODE_BLOCK_18**
Answer @@ -14,20 +25,14 @@ The answers for code blocks in this section are as follows: ```python fw_client.chat.completions.create( model=model, - temperature=0, - stream=True, - messages=[ - { - "role": "user", - "content": create_prompt(user_query), - } - ], + messages=[{"role": "user", "content": prompt}], + stream=True ) ```
-**CODE_BLOCK_27** +**CODE_BLOCK_19**
Answer diff --git a/docs/80-add-memory/2-add-memory.mdx b/docs/80-add-memory/2-add-memory.mdx index 172a7c09..4ed04fae 100644 --- a/docs/80-add-memory/2-add-memory.mdx +++ b/docs/80-add-memory/2-add-memory.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 10: The answers for code blocks in this section are as follows: -**CODE_BLOCK_28** +**CODE_BLOCK_20**
Answer @@ -17,23 +17,7 @@ history_collection.create_index("session_id")
-**CODE_BLOCK_29** - -
-Answer -
-```python -{ - "session_id": session_id, - "role": role, - "content": content, - "timestamp": datetime.now(), -} -``` -
-
- -**CODE_BLOCK_30** +**CODE_BLOCK_21**
Answer @@ -44,7 +28,7 @@ history_collection.insert_one(message)
-**CODE_BLOCK_31** +**CODE_BLOCK_22**
Answer @@ -55,42 +39,29 @@ history_collection.find({"session_id": session_id}).sort("timestamp", 1)
-**CODE_BLOCK_32** - -
-Answer -
-```python -[{"role": msg["role"], "content": msg["content"]} for msg in cursor] -``` -
-
- -**CODE_BLOCK_33** +**CODE_BLOCK_23**
Answer
```python -message_history = retrieve_session_history(session_id) -messages.extend(message_history) +retrieve_session_history(session_id) ```
-**CODE_BLOCK_34** +**CODE_BLOCK_24**
Answer
```python -user_message = {"role": "user", "content": user_query} -messages.append(user_message) +{"role": "user", "content": user_query} ```
-**CODE_BLOCK_35** +**CODE_BLOCK_25**
Answer