Quick quicktour (#234)

lavague-ai · May 16, 2024 · 52e5e5a · 52e5e5a
1 parent 73024d7
commit 52e5e5a
Show file tree

Hide file tree

Showing 2 changed files with 27 additions and 130 deletions.
diff --git a/docs/docs/get-started/quick-tour-notebook/quick-tour.ipynb b/docs/docs/get-started/quick-tour-notebook/quick-tour.ipynb
@@ -51,26 +51,6 @@
     "!pip install lavague"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Next, we will initialize the default Selenium webdriver, which will be used to execute our actions on the web."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "id": "ysYtqlbPF19H"
-   },
-   "outputs": [],
-   "source": [
-    "from lavague.drivers.selenium import SeleniumDriver\n",
-    "\n",
-    "selenium_driver = SeleniumDriver()"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -107,12 +87,11 @@
     "id": "GhHF9m4zd8mJ"
    },
    "source": [
-    "We will then build an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.\n",
+    "## ActionEngine\n",
+    "\n",
+    "An agent is made up of two components: an `ActionEngine` and a `WorldModel`.\n",
     "\n",
-    "By default, our`AcionEngine` will use the following configuration:\n",
-    "- LLM: `OpenAI's gpt-4-1106-preview`\n",
-    "- Embedder: `OpenAI's text-embedding-3-large`\n",
-    "- Retriever: `OPSM retriever`"
+    "Let's start by initializing an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them."
    ]
   },
   {
@@ -124,7 +103,9 @@
    "outputs": [],
    "source": [
     "from lavague.core import ActionEngine\n",
+    "from lavague.drivers.selenium import SeleniumDriver\n",
     "\n",
+    "selenium_driver = SeleniumDriver()\n",
     "action_engine = ActionEngine(selenium_driver)"
    ]
   },
@@ -136,71 +117,7 @@
    "source": [
     "# World model\n",
     "\n",
-    "Here we will introduce World Models, which are models whose goal is to take a given set of:\n",
-    "- Objective: here the goal to be achieved\n",
-    "- State: here a screenshot of the current page\n",
-    "\n",
-    "and outputs an instruction that our `ActionEngine` can turn into Selenium code.\n",
-    "\n",
-    "Our current world model uses GPT4 with Vision to output an instruction using a screenshot and a given objective.\n",
-    "\n",
-    "We can have a look at the current prompt template we will use:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "35tLJc9Hd8mK",
-    "outputId": "917ccf77-2508-47a9-bcca-fad63a79b2f9"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "You are an AI system specialized in high level reasoning. Your goal is to generate instructions for other specialized AIs to perform web actions to reach objectives given by humans.\n",
-      "Your inputs are an objective in natural language, as well as a screenshot of the current page of the browser.\n",
-      "Your output are a list of thoughts in bullet points detailling your reasoning, followed by your conclusion on what the next step should be in the form of an instruction.\n",
-      "You can assume the instruction is used by another AI to generate the action code to select the element to be interacted with and perform the action demanded by the human.\n",
-      "\n",
-      "The instruction should be detailled as possible and only contain the next step. \n",
-      "Do not make assumptions about elements you do not see.\n",
-      "If the objective is already achieved in the screenshot, provide the instruction 'STOP'.\n",
-      "\n",
-      "Here are previous examples:\n",
-      "${examples}\n",
-      "\n",
-      "Objective: ${objective}\n",
-      "Thought:\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "from lavague.core.world_model import WORLD_MODEL_PROMPT_TEMPLATE\n",
-    "\n",
-    "print(WORLD_MODEL_PROMPT_TEMPLATE.template)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "EoLE2TCVd8mM"
-   },
-   "source": [
-    "Next, we will initialize our WorldModel. To do this, we need to provide the WorldModel with knowledge on how to interact with our chosen website. This knowledge consists of  previous examples for this website of turning observations into instructions, that are then turned into actions.\n",
-    "\n",
-    "We can initialize our WorldModel with one of three methods, allowing us to provide this knowledge in different formats:\n",
-    "- `WorldModel.from_hub(\"URL_SLUG\")` : with the `from_hub()` method, we can pull the knowledge from a `.txt` file in the `examples/knowledge` folder of our GitHub repo, which acts as a hub for sharing knowledge files. For our `examples/knowledge/hf_example.txt` file, we provide `hf_example` as input to our `from_hub()` method.\n",
-    "- `WorldModel.from_local(\"PATH_TO_LOCAL_FILE\")`: With the `from_local()` method, you can provide knowledge from a local file.\n",
-    "- `WorldModel(\"KNOWLEDGE_AS_STRING\")`: You can also directly initialize a `WorldModel` with your knowledge as a string.\n",
-    "\n",
-    "For the purposes of this demo, we will use the `from_hub()` method."
+    "Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions on this website being broken down into a chain of thoughts and then the next instruction to be passed to the `ActionEngine`."
    ]
   },
   {
@@ -222,7 +139,7 @@
     "id": "umuDUrJNbsGe"
    },
    "source": [
-    "# Demo"
+    "# WebAgent Demo"
    ]
   },
   {
@@ -231,20 +148,9 @@
     "id": "3M6DerDyd8mM"
    },
    "source": [
-    "We can now play with it, with a small example where we show our World Model can help achieve a specific goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning, by providing instructions to our `ActionEngine`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "Lk86AwLjF19J"
-   },
-   "outputs": [],
-   "source": [
-    "from lavague.core import WebAgent\n",
+    "We can now use these two elements to initialize a `WebAgent` and start playing with it!\n",
     "\n",
-    "agent = WebAgent(action_engine, world_model)"
+    "In the following example, we show how our agent can achieve a user-defined goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning."
    ]
   },
   {
@@ -287,6 +193,10 @@
     }
    ],
    "source": [
+    "from lavague.core import WebAgent\n",
+    "\n",
+    "agent = WebAgent(action_engine, world_model)\n",
+    "\n",
     "agent.get(\"https://huggingface.co/docs\")\n",
     "agent.run(\"Go on the quicktour of PEFT\")"
    ]

diff --git a/docs/docs/get-started/quick-tour.md b/docs/docs/get-started/quick-tour.md
@@ -5,10 +5,12 @@
 
 !!! tips "Pre-requisites"
 
-    **Note**: We use OpenAI's models, for the embedding, LLM and Vision model. You will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for this example to work.
+    - We use OpenAI's models, for the embedding, LLM and Vision model. You will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for this example to work.
 
     If you don't have an OpenAI API key, please get one [here](https://platform.openai.com/docs/quickstart/developer-quickstart)
 
+    - Our package currently only supports python versions 3.10 or greater. Please upgrade your python version if you are using a version below this.
+
 ## Installation
 
 We start by downloading LaVague.
@@ -20,9 +22,12 @@ pip install lavague
 !!! tip "OPENAI_API_KEY"
     If you haven't already set a valid OpenAI API Key as the `OPENAI_API_KEY` environment variable in your local environment, you will need to do that now.
 
+
 ## Action Engine
 
-Next, we will build an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.
+An agent is made up of two components: an `ActionEngine` and a `WorldModel`.
+
+Let's start by initializing an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.
 
 ```python
 from lavague.core import ActionEngine
@@ -32,27 +37,9 @@ selenium_driver = SeleniumDriver()
 action_engine = ActionEngine(selenium_driver)
 ```
 
-## World model
-
-Here we will introduce World Models, which are models whose goal is to take a given set of:
-- Objective: here the goal to be achieved
-- State: here a screenshot of the current page
-
-and outputs an instruction that our `ActionEngine` can turn into Selenium code.
-
-Our current world model uses GPT4 with Vision to output an instruction using a screenshot and a given objective.
-
-We can have a look at the current prompt template [here](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/world_model.py#L77).
+## World Model
 
-Next, we will initialize our WorldModel. To do this, we need to provide the WorldModel with knowledge on how to interact with our chosen website. This knowledge consists of  previous examples for this website of turning observations into instructions, that are then turned into actions.
-
-We can initialize our WorldModel with one of three methods, allowing us to provide this knowledge in different formats:
-
-- `WorldModel.from_hub("URL_SLUG")` : with the `from_hub()` method, we can pull the knowledge from a `.txt` file in the `examples/knowledge` folder of our GitHub repo, which acts as a hub for sharing knowledge files. For our `examples/knowledge/hf_example.txt` file, we provide `hf_example` as input to our `from_hub()` method.
-- `WorldModel.from_local("PATH_TO_LOCAL_FILE")`: With the `from_local()` method, you can provide knowledge from a local file.
-- `WorldModel("KNOWLEDGE_AS_STRING")`: You can also directly initialize a `WorldModel` with your knowledge as a string.
-
-For the purposes of this demo, we will use the `from_hub()` method.
+Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions on this website being broken down into a chain of thoughts and then the next instruction that needs to be passed to the `ActionEngine`.
 
 ```python
 from lavague.core import WorldModel
@@ -62,17 +49,17 @@ world_model = WorldModel.from_hub("hf_example")
 
 ## Demo
 
-We can now play with it, with a small example where we show our World Model can help achieve a specific goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning, by providing instructions to our `ActionEngine`:
+We can now use these two elements to initialize a `WebAgent` and start playing with it!
+
+In the following example, we show how our agent can achieve a user-defined goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning.
 
 ```python
 from lavague.core import WebAgent
 
 agent = WebAgent(action_engine, world_model)
-```
 
-```python
 agent.get("https://huggingface.co/docs")
 agent.run("Go on the quicktour of PEFT")
 ```
 
-![qt_output](../../assets/demo_agent_hf.gif)
+![qt_output](../../assets/demo_agent_hf.gif)