Skip to content

Commit

Permalink
Evaluate claude-3-haiku on assist mini
Browse files Browse the repository at this point in the history
  • Loading branch information
allenporter committed Sep 2, 2024
1 parent eb06f77 commit cf68c7b
Show file tree
Hide file tree
Showing 52 changed files with 3,603 additions and 19 deletions.
16 changes: 14 additions & 2 deletions models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,29 @@ models:
chat_model: models/gemini-1.5-flash-latest
llm_hass_api: assist


- model_id: claude-3-5-sonnet
domain: anthropic
description: Google Generative AI integration using gemini flash (v1.5)
description: Anthropic integration using Claude 3.5 Sonnet
urls:
- https://blog.google/products/gemini/google-gemini-new-features-july-2024/
- https://www.anthropic.com/news/claude-3-5-sonnet
config_entry_data:
api_key: !secret anthropic_api_key
config_entry_options:
chat_model: claude-3-5-sonnet-20240620
llm_hass_api: assist

- model_id: claude-3-haiku
domain: anthropic
description: Anthropic integration using Claude 3 Haiku
urls:
- https://www.anthropic.com/news/claude-3-haiku
config_entry_data:
api_key: !secret anthropic_api_key
config_entry_options:
chat_model: claude-3-haiku-20240307
llm_hass_api: assist

#
# Open Source smaller function calling models
#
Expand Down
44 changes: 27 additions & 17 deletions reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
| mistral-v3 | 3.8% (CI: 4.2%, 2024.8.0b) | 2.0% (CI: 4.0%, 2024.8.0dev) | 10.3% (CI: 4.6%, 2024.8.0b) |
| xlam-1b | | 27.1% (CI: 12.6%, 2024.8.0b0) | |
| claude-3-5-sonnet | | 95.9% (CI: 5.5%, 2024.9.0b2) | |
| claude-3-haiku | | 98.0% (CI: 4.0%, 2024.9.0b2) | |

Implementation notes:
- CI is large given small number of samples in the datasets.
Expand Down Expand Up @@ -96,26 +97,27 @@ config:
themeVariables:
xyChart:
titleColor: "#ff0000"
plotColorPalette: "#d5a6bd, #0f9d58, #4285f4, #4285f4, #ff6d01, #674ea7, #6aa84f, #34a853, #46bdc6, #fbbc04, #1155cc, #d9ead3, #4285f4"
plotColorPalette: "#d5a6bd, #0f9d58, #4285f4, #4285f4, #0f9d58, #ff6d01, #674ea7, #6aa84f, #34a853, #46bdc6, #fbbc04, #1155cc, #d9ead3, #4285f4"
---
xychart-beta
title "assist-mini"
x-axis "Model" [assistant, gpt-4o-mini, gemini-1.5-flash, claude-3-5-sonnet, functionary-small-v2.5, mistral-v3, llama3-groq-tool-use, llama3.1, xlam-7b, assist-llm, home-llm, xlam-1b, .]
x-axis "Model" [assistant, gpt-4o-mini, gemini-1.5-flash, claude-3-5-sonnet, claude-3-haiku, functionary-small-v2.5, mistral-v3, llama3-groq-tool-use, llama3.1, xlam-7b, assist-llm, home-llm, xlam-1b, .]
y-axis "Score" 1 --> 100
bar [63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 95.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 51.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 83.7, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 85.7, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.7, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.1, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0]
bar [63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 95.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 51.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 83.7, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 85.7, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.7, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.1, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0]
```

### intents
Expand Down Expand Up @@ -207,10 +209,18 @@ More information:

### claude-3-5-sonnet

Google Generative AI integration using gemini flash (v1.5)
Anthropic integration using Claude 3.5 Sonnet

More information:
- https://blog.google/products/gemini/google-gemini-new-features-july-2024/
- https://www.anthropic.com/news/claude-3-5-sonnet


### claude-3-haiku

Anthropic integration using Claude 3 Haiku

More information:
- https://www.anthropic.com/news/claude-3-haiku


### functionary-small-v2.5
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
uuid: 31c8000e-04a4-478b-9d49-d40a11194013
task_id: dom1_pl_lights_lights-dining_room_light_off
category: light
task:
input_text: Dining room light off
expect_changes:
light.dining_room_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: I have turned off the Dining Room Light.
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Dining room light off
context:
id: 01J6SPSMEQDS0JXW3RY77YE9AM
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:48:58. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
\ to unlock a lock. When controlling a device, prefer passing just name and\
\ domain. When controlling an area, prefer passing just area name and domain.\n\
When a user asks to turn on all devices of a specific type, ask user to specify\
\ an area, unless there is only one device of that type.\nThis device is not\
\ able to start timers.\nAn overview of the areas and the devices in this\
\ smart home:\n- names: Kitchen Light\n domain: light\n state: 'on'\n areas:\
\ Kitchen\n attributes:\n brightness: '100'\n- names: Living Room Light\n\
\ domain: light\n state: 'off'\n areas: Living Room\n attributes:\n \
\ brightness:\n- names: Dining Room Light\n domain: light\n state: 'on'\n\
\ areas: Dining Room\n attributes:\n brightness: '100'\n- names: Bedroom\
\ 1 Light\n domain: light\n state: 'off'\n areas: Bedroom 1\n- names: Bedroom\
\ 2 Light\n domain: light\n state: 'off'\n areas: Bedroom 2\n- names: Bedroom\
\ 3 Light\n domain: light\n state: 'off'\n areas: Bedroom 3\n- names: Bedroom\
\ 4 Light\n domain: light\n state: 'off'\n areas: Bedroom 4\n- names: Garden\
\ Light\n domain: light\n state: 'off'\n areas: Backyard\n attributes:\n\
\ brightness:\n"
messages:
- role: user
content: Dining room light off
- role: assistant
content:
- type: tool_use
id: toolu_01TFiYNB4X2JxFnjNt6vTF1K
name: HassTurnOff
input:
domain:
- light
name: Dining Room Light
- role: user
content:
- type: tool_result
tool_use_id: toolu_01TFiYNB4X2JxFnjNt6vTF1K
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Dining Room Light", "type": "entity", "id":
"light.dining_room_light"}], "failed": []}}'
- role: assistant
content:
- type: text
text: I have turned off the Dining Room Light.
- event_type: tool_call
data:
tool_name: HassTurnOff
tool_args:
domain:
- light
name: Dining Room Light
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
uuid: 1edf6335-7b44-4490-a082-57cf359b6bd0
task_id: dom1_pl_lights_lights-kitchen_light_off
category: light
task:
input_text: Kitchen light off
expect_changes:
light.kitchen_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: The kitchen light has been turned off.
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Kitchen light off
context:
id: 01J6SQ7DNJ8EGRJ17DB0T8ASJE
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:56:30. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
\ to unlock a lock. When controlling a device, prefer passing just name and\
\ domain. When controlling an area, prefer passing just area name and domain.\n\
When a user asks to turn on all devices of a specific type, ask user to specify\
\ an area, unless there is only one device of that type.\nThis device is not\
\ able to start timers.\nAn overview of the areas and the devices in this\
\ smart home:\n- names: Kitchen Light\n domain: light\n state: 'on'\n areas:\
\ Kitchen\n attributes:\n brightness: '100'\n- names: Living Room Light\n\
\ domain: light\n state: 'off'\n areas: Living Room\n attributes:\n \
\ brightness:\n- names: Dining Room Light\n domain: light\n state: 'on'\n\
\ areas: Dining Room\n attributes:\n brightness: '100'\n- names: Bedroom\
\ 1 Light\n domain: light\n state: 'off'\n areas: Bedroom 1\n- names: Bedroom\
\ 2 Light\n domain: light\n state: 'off'\n areas: Bedroom 2\n- names: Bedroom\
\ 3 Light\n domain: light\n state: 'off'\n areas: Bedroom 3\n- names: Bedroom\
\ 4 Light\n domain: light\n state: 'off'\n areas: Bedroom 4\n- names: Garden\
\ Light\n domain: light\n state: 'off'\n areas: Backyard\n attributes:\n\
\ brightness:\n"
messages:
- role: user
content: Kitchen light off
- role: assistant
content:
- type: tool_use
id: toolu_01AvSVpEZrT7bHJRCPN1KaMi
name: HassTurnOff
input:
name: Kitchen Light
domain: light
- role: user
content:
- type: tool_result
tool_use_id: toolu_01AvSVpEZrT7bHJRCPN1KaMi
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Kitchen Light", "type": "entity", "id": "light.kitchen_light"}],
"failed": []}}'
- role: assistant
content:
- type: text
text: The kitchen light has been turned off.
- event_type: tool_call
data:
tool_name: HassTurnOff
tool_args:
name: Kitchen Light
domain: light
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
uuid: 1cdc6296-2d9b-40af-89d6-647cbd598d16
task_id: dom1_pl_lights_lights-please_turn_on_the_kitchen_light
category: light
task:
input_text: Please turn on the kitchen light
expect_changes:
light.kitchen_light:
state: 'on'
attributes:
brightness: 0
color_mode: brightness
response: I've turned on the Kitchen Light.
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Please turn on the kitchen light
context:
id: 01J6SPNZ2A72M83JJE2EB5KTZC
parent_id: null
user_id: null
conversation_id: null
device_id: null
language: en
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:46:58. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
\ to unlock a lock. When controlling a device, prefer passing just name and\
\ domain. When controlling an area, prefer passing just area name and domain.\n\
When a user asks to turn on all devices of a specific type, ask user to specify\
\ an area, unless there is only one device of that type.\nThis device is not\
\ able to start timers.\nAn overview of the areas and the devices in this\
\ smart home:\n- names: Kitchen Light\n domain: light\n state: 'off'\n \
\ areas: Kitchen\n attributes:\n brightness:\n- names: Living Room Light\n\
\ domain: light\n state: 'on'\n areas: Living Room\n attributes:\n \
\ brightness: '100'\n- names: Dining Room Light\n domain: light\n state:\
\ 'on'\n areas: Dining Room\n attributes:\n brightness: '100'\n- names:\
\ Bedroom 1 Light\n domain: light\n state: 'off'\n areas: Bedroom 1\n-\
\ names: Bedroom 2 Light\n domain: light\n state: 'off'\n areas: Bedroom\
\ 2\n- names: Bedroom 3 Light\n domain: light\n state: 'off'\n areas: Bedroom\
\ 3\n- names: Bedroom 4 Light\n domain: light\n state: 'off'\n areas: Bedroom\
\ 4\n- names: Garden Light\n domain: light\n state: 'on'\n areas: Backyard\n\
\ attributes:\n brightness: '100'\n"
messages:
- role: user
content: Please turn on the kitchen light
- role: assistant
content:
- type: tool_use
id: toolu_01JTjfrJj5BqAndXa6Jaxtp4
name: HassTurnOn
input:
name: Kitchen Light
domain: light
- role: user
content:
- type: tool_result
tool_use_id: toolu_01JTjfrJj5BqAndXa6Jaxtp4
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Kitchen Light", "type": "entity", "id": "light.kitchen_light"}],
"failed": []}}'
- role: assistant
content:
- type: text
text: I've turned on the Kitchen Light.
- event_type: tool_call
data:
tool_name: HassTurnOn
tool_args:
name: Kitchen Light
domain: light
Loading

0 comments on commit cf68c7b

Please sign in to comment.