diff --git a/docs/Getting Started.ipynb b/docs/Getting Started.ipynb index f73dfffe..8ab8e6a8 100644 --- a/docs/Getting Started.ipynb +++ b/docs/Getting Started.ipynb @@ -9,35 +9,33 @@ "\n", "Here you will learn how to use the fastembed package to embed your data into a vector space. The package is designed to be easy to use and fast. It is built on top of the [ONNX](https://onnx.ai/) standard, which allows for fast inference on a variety of hardware (called Runtimes in ONNX). \n", "\n", - "## Installation\n", + "## Quick Start\n", "\n", - "To get started, install the fastembed package with pip." + "The fastembed package is designed to be easy to use. The main class is the `Embedding` class. It takes a list of strings as input and returns a list of vectors as output. The `Embedding` class is initialized with a model file." ] }, { "cell_type": "code", "execution_count": null, - "id": "a3b5612a", + "id": "ada95c6a", "metadata": {}, "outputs": [], "source": [ - "!pip install fastembed" + "!pip install fastembed --upgrade # Install fastembed" ] }, { "cell_type": "markdown", - "id": "6b5b503b", + "id": "ed81d725", "metadata": {}, "source": [ - "# Quick Start\n", - "\n", - "The fastembed package is designed to be easy to use. The main class is the `Embedding` class. It takes a list of strings as input and returns a list of vectors as output. The `Embedding` class is initialized with a model file." + "Make the necessary imports, initialize the `Embedding` class, and embed your data into vectors:" ] }, { "cell_type": "code", "execution_count": null, - "id": "8349e72d", + "id": "b61c6552", "metadata": {}, "outputs": [], "source": [ @@ -63,7 +61,7 @@ "id": "8c49ae50", "metadata": {}, "source": [ - "## Explanation of what is happening" + "## What is happening under the hood?" ] }, { @@ -93,10 +91,10 @@ "source": [ "Notice that we are using the FlagEmbedding -- which is state of the Art and beats OpenAI's Embedding by a large margin. \n", "\n", - "## Prepare your Documents\n", + "### Prepare your Documents\n", "You can define a list of documents that you'd like to encode. These can be sentences, paragraphs, or even entire documents. \n", "\n", - "### Format of the List:\n", + "#### Format of the List:\n", "1. List of Strings: Your documents must be in a list, and each document must be a string.\n", "2. For Retrieval Tasks: If you're working with queries and passages, you can add special labels to them:\n", "- **Queries**: Add \"query:\" at the beginning of each query string.\n", @@ -125,7 +123,7 @@ "id": "1cb3cc87", "metadata": {}, "source": [ - "## Load the Embedding Model Weights\n", + "### Load the Embedding Model Weights\n", "Next, initialize the Embedding class with the desired parameters. Here, \"BAAI/bge-small-en\" is the pre-trained model name, and max_length=512 is the maximum token length for each document." ] }, @@ -145,7 +143,7 @@ "id": "5549d501", "metadata": {}, "source": [ - "## Embed your Documents\n", + "### Embed your Documents\n", "\n", "Use the encode method of the embedding model to transform the documents into a List of np.array. The method returns a generator, so we cast it to a list to get the embeddings." ]