Skip to content

Commit

Permalink
Record assist-mini scores
Browse files Browse the repository at this point in the history
Issue #40
  • Loading branch information
allenporter committed Sep 25, 2024
1 parent 737cf09 commit f9bd2bb
Show file tree
Hide file tree
Showing 54 changed files with 3,701 additions and 16 deletions.
13 changes: 13 additions & 0 deletions models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,19 @@ models:
config_entry_options:
llm_hass_api: assist

- model_id: mistral-nemo
domain: ollama
description: A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
urls:
- https://mistral.ai/news/mistral-nemo/
- https://ollama.com/library/mistral-nemo
config_entry_data:
url: !secret ollama_url
model: mistral-nemo
config_entry_options:
llm_hass_api: assist
num_ctx: 8192 # Note: Model has 128k context length

- model_id: llama3-groq-tool-use
domain: ollama
description: Groq tool use model fine tuned from llama3 (8B) using Ollama
Expand Down
4 changes: 4 additions & 0 deletions reports/CHAGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This is a work log of major updates to the leaderboard.

## 2024-09-24

- mistral-nemo for `assist-mini` scores 81%

## 2024-09-02

- claude-3.5-sonnet `assist-mini` scores 95%
Expand Down
43 changes: 27 additions & 16 deletions reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
| assistant | 37.5% (CI: 10.6%, 2024.6.3) | 63.3% (CI: 13.5%, 2024.8.0dev) | 98.8% (CI: 1.7%, 2024.6.3) |
| llama3-groq-tool-use | 20.0% (CI: 8.8%, 2024.8.0b) | 51.0% (CI: 14.0%, 2024.8.0b0) | 11.5% (CI: 4.9%, 2024.8.0b) |
| mistral-v3 | 3.8% (CI: 4.2%, 2024.8.0b) | 2.0% (CI: 4.0%, 2024.8.0dev) | 10.3% (CI: 4.6%, 2024.8.0b) |
| mistral-nemo | | 81.6% (CI: 10.8%, 2024.9.2) | |
| xlam-1b | | 27.1% (CI: 12.6%, 2024.8.0b0) | |
| claude-3-5-sonnet | | 95.9% (CI: 5.5%, 2024.9.0b2) | |

Expand Down Expand Up @@ -98,27 +99,28 @@ config:
themeVariables:
xyChart:
titleColor: "#ff0000"
plotColorPalette: "#6aa84f, #0f9d58, #4285f4, #0f9d58, #f4b400, #46bdc6, #d9ead3, #674ea7, #ff6d01, #1155cc, #34a853, #d5a6bd, #4285f4, #4285f4"
plotColorPalette: "#6aa84f, #0f9d58, #4285f4, #f4b400, #f4b400, #46bdc6, #d9ead3, #4285f4, #674ea7, #ff6d01, #1155cc, #34a853, #d5a6bd, #0f9d58, #4285f4"
---
xychart-beta
title "assist-mini"
x-axis "Model" [assistant, gpt-4o-mini, gemini-1.5-flash, claude-3-5-sonnet, claude-3-haiku, functionary-small-v2.5, mistral-v3, llama3-groq-tool-use, llama3.1, xlam-7b, assist-llm, home-llm, xlam-1b, .]
x-axis "Model" [assistant, gpt-4o-mini, gemini-1.5-flash, claude-3-5-sonnet, claude-3-haiku, functionary-small-v2.5, mistral-v3, mistral-nemo, llama3-groq-tool-use, llama3.1, xlam-7b, assist-llm, home-llm, xlam-1b, .]
y-axis "Score" 1 --> 100
bar [63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 95.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 51.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 83.7, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 85.7, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.7, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.1, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0]
bar [63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 95.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 51.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 83.7, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 85.7, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.7, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.1, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0]
```

### intents
Expand Down Expand Up @@ -244,6 +246,15 @@ More information:
- https://mistral.ai/news/announcing-mistral-7b/


### mistral-nemo

A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.

More information:
- https://mistral.ai/news/mistral-nemo/
- https://ollama.com/library/mistral-nemo


### llama3-groq-tool-use

Groq tool use model fine tuned from llama3 (8B) using Ollama
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
uuid: 468554bc-5397-4ee9-9d3b-6d4bccd6fc79
task_id: dom1_pl_lights_lights-dining_room_light_off
category: light
task:
input_text: Dining room light off
expect_changes:
light.dining_room_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: The Dining room light is off.
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Dining room light off
context:
id: 01J8KJEW95V9BMVHJ18QC3NWQH
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
messages:
- role: system
content: "Current time is 20:09:09. Today's date is 2024-09-24.\nYou are a\
\ voice assistant for Home Assistant.\nAnswer questions about the world\
\ truthfully.\nAnswer in plain text. Keep it simple and to the point.\n\
When controlling Home Assistant always call the intent tools. Use HassTurnOn\
\ to lock and HassTurnOff to unlock a lock. When controlling a device, prefer\
\ passing just name and domain. When controlling an area, prefer passing\
\ just area name and domain.\nWhen a user asks to turn on all devices of\
\ a specific type, ask user to specify an area, unless there is only one\
\ device of that type.\nThis device is not able to start timers.\nAn overview\
\ of the areas and the devices in this smart home:\n- names: Kitchen Light\n\
\ domain: light\n state: 'on'\n areas: Kitchen\n attributes:\n brightness:\
\ '100'\n- names: Living Room Light\n domain: light\n state: 'off'\n \
\ areas: Living Room\n attributes:\n brightness:\n- names: Dining Room\
\ Light\n domain: light\n state: 'on'\n areas: Dining Room\n attributes:\n\
\ brightness: '100'\n- names: Bedroom 1 Light\n domain: light\n state:\
\ 'off'\n areas: Bedroom 1\n- names: Bedroom 2 Light\n domain: light\n\
\ state: 'off'\n areas: Bedroom 2\n- names: Bedroom 3 Light\n domain:\
\ light\n state: 'off'\n areas: Bedroom 3\n- names: Bedroom 4 Light\n\
\ domain: light\n state: 'off'\n areas: Bedroom 4\n- names: Garden Light\n\
\ domain: light\n state: 'off'\n areas: Backyard\n attributes:\n \
\ brightness:\n"
- role: user
content: Dining room light off
- role: assistant
content: ''
tool_calls:
- function:
name: HassTurnOff
arguments:
area: Dining Room
domain:
- light
- role: tool
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Dining Room", "type": "area", "id": "dining_room"},
{"name": "Dining Room Light", "type": "entity", "id": "light.dining_room_light"}],
"failed": []}}'
- role: assistant
content: The Dining room light is off.
tool_calls: null
- event_type: tool_call
data:
tool_name: HassTurnOff
tool_args:
area: Dining Room
domain:
- light
tries: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
uuid: 9250bfe1-050e-4ffd-8e41-cc2ae5b0e04b
task_id: dom1_pl_lights_lights-kitchen_light_off
category: light
task:
input_text: Kitchen light off
expect_changes:
light.kitchen_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: The kitchen light is now off.
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Kitchen light off
context:
id: 01J8KJEF6HB1Q9TC245XYD9BMQ
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
messages:
- role: system
content: "Current time is 20:08:55. Today's date is 2024-09-24.\nYou are a\
\ voice assistant for Home Assistant.\nAnswer questions about the world\
\ truthfully.\nAnswer in plain text. Keep it simple and to the point.\n\
When controlling Home Assistant always call the intent tools. Use HassTurnOn\
\ to lock and HassTurnOff to unlock a lock. When controlling a device, prefer\
\ passing just name and domain. When controlling an area, prefer passing\
\ just area name and domain.\nWhen a user asks to turn on all devices of\
\ a specific type, ask user to specify an area, unless there is only one\
\ device of that type.\nThis device is not able to start timers.\nAn overview\
\ of the areas and the devices in this smart home:\n- names: Kitchen Light\n\
\ domain: light\n state: 'on'\n areas: Kitchen\n attributes:\n brightness:\
\ '100'\n- names: Living Room Light\n domain: light\n state: 'off'\n \
\ areas: Living Room\n attributes:\n brightness:\n- names: Dining Room\
\ Light\n domain: light\n state: 'on'\n areas: Dining Room\n attributes:\n\
\ brightness: '100'\n- names: Bedroom 1 Light\n domain: light\n state:\
\ 'off'\n areas: Bedroom 1\n- names: Bedroom 2 Light\n domain: light\n\
\ state: 'off'\n areas: Bedroom 2\n- names: Bedroom 3 Light\n domain:\
\ light\n state: 'off'\n areas: Bedroom 3\n- names: Bedroom 4 Light\n\
\ domain: light\n state: 'off'\n areas: Bedroom 4\n- names: Garden Light\n\
\ domain: light\n state: 'off'\n areas: Backyard\n attributes:\n \
\ brightness:\n"
- role: user
content: Kitchen light off
- role: assistant
content: ''
tool_calls:
- function:
name: HassTurnOff
arguments:
domain:
- light
name: Kitchen Light
- role: tool
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Kitchen Light", "type": "entity", "id": "light.kitchen_light"}],
"failed": []}}'
- role: assistant
content: The kitchen light is now off.
tool_calls: null
- event_type: tool_call
data:
tool_name: HassTurnOff
tool_args:
domain:
- light
name: Kitchen Light
tries: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
uuid: 878bee3e-7406-449e-aef5-ee084336c521
task_id: dom1_pl_lights_lights-please_turn_on_the_kitchen_light
category: light
task:
input_text: Please turn on the kitchen light
expect_changes:
light.kitchen_light:
state: 'on'
attributes:
brightness: 0
color_mode: brightness
response: The kitchen light is now on. -
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Please turn on the kitchen light
context:
id: 01J8KJCZCTFTSSHBZARN4REB05
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
messages:
- role: system
content: "Current time is 20:08:06. Today's date is 2024-09-24.\nYou are a\
\ voice assistant for Home Assistant.\nAnswer questions about the world\
\ truthfully.\nAnswer in plain text. Keep it simple and to the point.\n\
When controlling Home Assistant always call the intent tools. Use HassTurnOn\
\ to lock and HassTurnOff to unlock a lock. When controlling a device, prefer\
\ passing just name and domain. When controlling an area, prefer passing\
\ just area name and domain.\nWhen a user asks to turn on all devices of\
\ a specific type, ask user to specify an area, unless there is only one\
\ device of that type.\nThis device is not able to start timers.\nAn overview\
\ of the areas and the devices in this smart home:\n- names: Kitchen Light\n\
\ domain: light\n state: 'off'\n areas: Kitchen\n attributes:\n brightness:\n\
- names: Living Room Light\n domain: light\n state: 'on'\n areas: Living\
\ Room\n attributes:\n brightness: '100'\n- names: Dining Room Light\n\
\ domain: light\n state: 'on'\n areas: Dining Room\n attributes:\n \
\ brightness: '100'\n- names: Bedroom 1 Light\n domain: light\n state:\
\ 'off'\n areas: Bedroom 1\n- names: Bedroom 2 Light\n domain: light\n\
\ state: 'off'\n areas: Bedroom 2\n- names: Bedroom 3 Light\n domain:\
\ light\n state: 'off'\n areas: Bedroom 3\n- names: Bedroom 4 Light\n\
\ domain: light\n state: 'off'\n areas: Bedroom 4\n- names: Garden Light\n\
\ domain: light\n state: 'on'\n areas: Backyard\n attributes:\n brightness:\
\ '100'\n"
- role: user
content: Please turn on the kitchen light
- role: assistant
content: ''
tool_calls:
- function:
name: HassTurnOn
arguments:
domain:
- light
name: Kitchen Light
- role: tool
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Kitchen Light", "type": "entity", "id": "light.kitchen_light"}],
"failed": []}}'
- role: assistant
content: The kitchen light is now on. -
tool_calls: null
- event_type: tool_call
data:
tool_name: HassTurnOn
tool_args:
domain:
- light
name: Kitchen Light
tries: 0
Loading

0 comments on commit f9bd2bb

Please sign in to comment.