Skip to content

Commit

Permalink
Quick quicktour (#234)
Browse files Browse the repository at this point in the history
  • Loading branch information
lyie28 authored May 16, 2024
1 parent 73024d7 commit 52e5e5a
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 130 deletions.
118 changes: 14 additions & 104 deletions docs/docs/get-started/quick-tour-notebook/quick-tour.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,26 +51,6 @@
"!pip install lavague"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will initialize the default Selenium webdriver, which will be used to execute our actions on the web."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "ysYtqlbPF19H"
},
"outputs": [],
"source": [
"from lavague.drivers.selenium import SeleniumDriver\n",
"\n",
"selenium_driver = SeleniumDriver()"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -107,12 +87,11 @@
"id": "GhHF9m4zd8mJ"
},
"source": [
"We will then build an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.\n",
"## ActionEngine\n",
"\n",
"An agent is made up of two components: an `ActionEngine` and a `WorldModel`.\n",
"\n",
"By default, our`AcionEngine` will use the following configuration:\n",
"- LLM: `OpenAI's gpt-4-1106-preview`\n",
"- Embedder: `OpenAI's text-embedding-3-large`\n",
"- Retriever: `OPSM retriever`"
"Let's start by initializing an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them."
]
},
{
Expand All @@ -124,7 +103,9 @@
"outputs": [],
"source": [
"from lavague.core import ActionEngine\n",
"from lavague.drivers.selenium import SeleniumDriver\n",
"\n",
"selenium_driver = SeleniumDriver()\n",
"action_engine = ActionEngine(selenium_driver)"
]
},
Expand All @@ -136,71 +117,7 @@
"source": [
"# World model\n",
"\n",
"Here we will introduce World Models, which are models whose goal is to take a given set of:\n",
"- Objective: here the goal to be achieved\n",
"- State: here a screenshot of the current page\n",
"\n",
"and outputs an instruction that our `ActionEngine` can turn into Selenium code.\n",
"\n",
"Our current world model uses GPT4 with Vision to output an instruction using a screenshot and a given objective.\n",
"\n",
"We can have a look at the current prompt template we will use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "35tLJc9Hd8mK",
"outputId": "917ccf77-2508-47a9-bcca-fad63a79b2f9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"You are an AI system specialized in high level reasoning. Your goal is to generate instructions for other specialized AIs to perform web actions to reach objectives given by humans.\n",
"Your inputs are an objective in natural language, as well as a screenshot of the current page of the browser.\n",
"Your output are a list of thoughts in bullet points detailling your reasoning, followed by your conclusion on what the next step should be in the form of an instruction.\n",
"You can assume the instruction is used by another AI to generate the action code to select the element to be interacted with and perform the action demanded by the human.\n",
"\n",
"The instruction should be detailled as possible and only contain the next step. \n",
"Do not make assumptions about elements you do not see.\n",
"If the objective is already achieved in the screenshot, provide the instruction 'STOP'.\n",
"\n",
"Here are previous examples:\n",
"${examples}\n",
"\n",
"Objective: ${objective}\n",
"Thought:\n",
"\n"
]
}
],
"source": [
"from lavague.core.world_model import WORLD_MODEL_PROMPT_TEMPLATE\n",
"\n",
"print(WORLD_MODEL_PROMPT_TEMPLATE.template)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EoLE2TCVd8mM"
},
"source": [
"Next, we will initialize our WorldModel. To do this, we need to provide the WorldModel with knowledge on how to interact with our chosen website. This knowledge consists of previous examples for this website of turning observations into instructions, that are then turned into actions.\n",
"\n",
"We can initialize our WorldModel with one of three methods, allowing us to provide this knowledge in different formats:\n",
"- `WorldModel.from_hub(\"URL_SLUG\")` : with the `from_hub()` method, we can pull the knowledge from a `.txt` file in the `examples/knowledge` folder of our GitHub repo, which acts as a hub for sharing knowledge files. For our `examples/knowledge/hf_example.txt` file, we provide `hf_example` as input to our `from_hub()` method.\n",
"- `WorldModel.from_local(\"PATH_TO_LOCAL_FILE\")`: With the `from_local()` method, you can provide knowledge from a local file.\n",
"- `WorldModel(\"KNOWLEDGE_AS_STRING\")`: You can also directly initialize a `WorldModel` with your knowledge as a string.\n",
"\n",
"For the purposes of this demo, we will use the `from_hub()` method."
"Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions on this website being broken down into a chain of thoughts and then the next instruction to be passed to the `ActionEngine`."
]
},
{
Expand All @@ -222,7 +139,7 @@
"id": "umuDUrJNbsGe"
},
"source": [
"# Demo"
"# WebAgent Demo"
]
},
{
Expand All @@ -231,20 +148,9 @@
"id": "3M6DerDyd8mM"
},
"source": [
"We can now play with it, with a small example where we show our World Model can help achieve a specific goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning, by providing instructions to our `ActionEngine`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Lk86AwLjF19J"
},
"outputs": [],
"source": [
"from lavague.core import WebAgent\n",
"We can now use these two elements to initialize a `WebAgent` and start playing with it!\n",
"\n",
"agent = WebAgent(action_engine, world_model)"
"In the following example, we show how our agent can achieve a user-defined goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning."
]
},
{
Expand Down Expand Up @@ -287,6 +193,10 @@
}
],
"source": [
"from lavague.core import WebAgent\n",
"\n",
"agent = WebAgent(action_engine, world_model)\n",
"\n",
"agent.get(\"https://huggingface.co/docs\")\n",
"agent.run(\"Go on the quicktour of PEFT\")"
]
Expand Down
39 changes: 13 additions & 26 deletions docs/docs/get-started/quick-tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@

!!! tips "Pre-requisites"

**Note**: We use OpenAI's models, for the embedding, LLM and Vision model. You will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for this example to work.
- We use OpenAI's models, for the embedding, LLM and Vision model. You will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for this example to work.

If you don't have an OpenAI API key, please get one [here](https://platform.openai.com/docs/quickstart/developer-quickstart)

- Our package currently only supports python versions 3.10 or greater. Please upgrade your python version if you are using a version below this.

## Installation

We start by downloading LaVague.
Expand All @@ -20,9 +22,12 @@ pip install lavague
!!! tip "OPENAI_API_KEY"
If you haven't already set a valid OpenAI API Key as the `OPENAI_API_KEY` environment variable in your local environment, you will need to do that now.


## Action Engine

Next, we will build an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.
An agent is made up of two components: an `ActionEngine` and a `WorldModel`.

Let's start by initializing an `ActionEngine`, which is responsible for generating automation code for text instructions and executing them.

```python
from lavague.core import ActionEngine
Expand All @@ -32,27 +37,9 @@ selenium_driver = SeleniumDriver()
action_engine = ActionEngine(selenium_driver)
```

## World model

Here we will introduce World Models, which are models whose goal is to take a given set of:
- Objective: here the goal to be achieved
- State: here a screenshot of the current page

and outputs an instruction that our `ActionEngine` can turn into Selenium code.

Our current world model uses GPT4 with Vision to output an instruction using a screenshot and a given objective.

We can have a look at the current prompt template [here](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/world_model.py#L77).
## World Model

Next, we will initialize our WorldModel. To do this, we need to provide the WorldModel with knowledge on how to interact with our chosen website. This knowledge consists of previous examples for this website of turning observations into instructions, that are then turned into actions.

We can initialize our WorldModel with one of three methods, allowing us to provide this knowledge in different formats:

- `WorldModel.from_hub("URL_SLUG")` : with the `from_hub()` method, we can pull the knowledge from a `.txt` file in the `examples/knowledge` folder of our GitHub repo, which acts as a hub for sharing knowledge files. For our `examples/knowledge/hf_example.txt` file, we provide `hf_example` as input to our `from_hub()` method.
- `WorldModel.from_local("PATH_TO_LOCAL_FILE")`: With the `from_local()` method, you can provide knowledge from a local file.
- `WorldModel("KNOWLEDGE_AS_STRING")`: You can also directly initialize a `WorldModel` with your knowledge as a string.

For the purposes of this demo, we will use the `from_hub()` method.
Next, we will initialize our `WorldModel`, providing it with examples of global objectives for actions on this website being broken down into a chain of thoughts and then the next instruction that needs to be passed to the `ActionEngine`.

```python
from lavague.core import WorldModel
Expand All @@ -62,17 +49,17 @@ world_model = WorldModel.from_hub("hf_example")

## Demo

We can now play with it, with a small example where we show our World Model can help achieve a specific goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning, by providing instructions to our `ActionEngine`:
We can now use these two elements to initialize a `WebAgent` and start playing with it!

In the following example, we show how our agent can achieve a user-defined goal, here going on the quicktour of Hugging Face's PEFT framework for model finetuning.

```python
from lavague.core import WebAgent

agent = WebAgent(action_engine, world_model)
```

```python
agent.get("https://huggingface.co/docs")
agent.run("Go on the quicktour of PEFT")
```

![qt_output](../../assets/demo_agent_hf.gif)
![qt_output](../../assets/demo_agent_hf.gif)

0 comments on commit 52e5e5a

Please sign in to comment.