Skip to content

Commit

Permalink
Re-run claude 3.5 sonnet timeouts
Browse files Browse the repository at this point in the history
  • Loading branch information
allenporter committed Sep 2, 2024
1 parent 9e956a6 commit eb06f77
Show file tree
Hide file tree
Showing 15 changed files with 380 additions and 135 deletions.
4 changes: 2 additions & 2 deletions reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
| llama3-groq-tool-use | 20.0% (CI: 8.8%, 2024.8.0b) | 51.0% (CI: 14.0%, 2024.8.0b0) | 11.5% (CI: 4.9%, 2024.8.0b) |
| mistral-v3 | 3.8% (CI: 4.2%, 2024.8.0b) | 2.0% (CI: 4.0%, 2024.8.0dev) | 10.3% (CI: 4.6%, 2024.8.0b) |
| xlam-1b | | 27.1% (CI: 12.6%, 2024.8.0b0) | |
| claude-3-5-sonnet | | 81.6% (CI: 10.8%, 2024.9.0b2) | |
| claude-3-5-sonnet | | 95.9% (CI: 5.5%, 2024.9.0b2) | |

Implementation notes:
- CI is large given small number of samples in the datasets.
Expand Down Expand Up @@ -106,7 +106,7 @@ xychart-beta
bar [63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 98.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 81.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 95.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 63.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
bar [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 51.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
uuid: b9660328-0b9b-4ceb-bb4f-f0d5a595b39c
uuid: f35b7635-1d2d-4a6e-b260-f59bfb3ff214
task_id: dom1_pl_lights_lights-dining_room_light_off
category: light
task:
Expand All @@ -10,24 +10,16 @@ task:
attributes:
brightness: null
color_mode: null
response: 'Timeout: '
response: The dining room light has been turned off successfully. Is there anything
else you'd like me to do?
context:
unexpected_states:
light.dining_room_light:
expected:
brightness: null
color_mode: null
state: 'off'
got:
brightness: 100
color_mode: brightness
state: 'on'
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Dining room light off
context:
id: 01J6SMBHDT7RVB12EMFBAPQJA9
id: 01J6SMY7NV8EGRJ17DB0T8ASJE
parent_id: null
user_id: null
conversation_id: null
Expand All @@ -36,7 +28,7 @@ context:
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:06:19. Today's date is 2024-09-02.\nYou are a voice\
system: "Current time is 08:16:32. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
Expand All @@ -59,3 +51,33 @@ context:
messages:
- role: user
content: Dining room light off
- role: assistant
content:
- type: text
text: Certainly! I'll turn off the dining room light for you.
- type: tool_use
id: toolu_01E2713rU2miPbPXXdz3XUyg
name: HassTurnOff
input:
name: Dining Room Light
domain:
- light
- role: user
content:
- type: tool_result
tool_use_id: toolu_01E2713rU2miPbPXXdz3XUyg
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Dining Room Light", "type": "entity", "id":
"light.dining_room_light"}], "failed": []}}'
- role: assistant
content:
- type: text
text: The dining room light has been turned off successfully. Is there anything
else you'd like me to do?
- event_type: tool_call
data:
tool_name: HassTurnOff
tool_args:
name: Dining Room Light
domain:
- light
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
uuid: 22a95543-dc8e-48b6-bd19-4c8f63a5ff7a
uuid: fd0215a0-67d1-4dc9-9307-1e2f0a02180d
task_id: dom1_pl_lights_lights-turn_on_the_living_room_light
category: light
task:
Expand All @@ -10,24 +10,16 @@ task:
attributes:
brightness: 0
color_mode: brightness
response: 'Timeout: '
response: The living room light has been turned on successfully. Is there anything
else you'd like me to do?
context:
unexpected_states:
light.living_room_light:
expected:
brightness: 0
color_mode: brightness
state: 'on'
got:
brightness: null
color_mode: null
state: 'off'
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Turn on the living room light
context:
id: 01J6SM9RZA2DKVS9BF0G8DV90R
id: 01J6SN4TFE8EGRJ17DB0T8ASJE
parent_id: null
user_id: null
conversation_id: null
Expand All @@ -36,7 +28,7 @@ context:
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:05:22. Today's date is 2024-09-02.\nYou are a voice\
system: "Current time is 08:20:08. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
Expand All @@ -59,3 +51,33 @@ context:
messages:
- role: user
content: Turn on the living room light
- role: assistant
content:
- type: text
text: Certainly! I'll turn on the living room light for you.
- type: tool_use
id: toolu_01A23tjxKn1VKE8j5ZDYcJKM
name: HassTurnOn
input:
name: Living Room Light
domain:
- light
- role: user
content:
- type: tool_result
tool_use_id: toolu_01A23tjxKn1VKE8j5ZDYcJKM
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Living Room Light", "type": "entity", "id":
"light.living_room_light"}], "failed": []}}'
- role: assistant
content:
- type: text
text: The living room light has been turned on successfully. Is there anything
else you'd like me to do?
- event_type: tool_call
data:
tool_name: HassTurnOn
tool_args:
name: Living Room Light
domain:
- light
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
uuid: 3523582e-c9de-4d30-8071-bc47bb5987ad
uuid: e5d186c3-f320-43a6-b266-738adb608afa
task_id: dom1_pl_todo_todo-add_history_homework_to_my_tasks
category: todo
task:
Expand All @@ -8,15 +8,16 @@ task:
todo.personal_tasks:
state: '1'
attributes: null
response: 'Timeout: '
response: Great! I've added "history homework" to your Personal Tasks list. Is there
anything else you'd like me to do for you?
context:
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Add history homework to my tasks
context:
id: 01J6SMCZSMN039782V50206KCA
id: 01J6SMZCSTEVC7WJHENTWJ76BX
parent_id: null
user_id: null
conversation_id: null
Expand All @@ -25,7 +26,7 @@ context:
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:07:07. Today's date is 2024-09-02.\nYou are a voice\
system: "Current time is 08:17:10. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
Expand All @@ -41,20 +42,25 @@ context:
- role: assistant
content:
- type: text
text: Certainly! I can help you add "history homework" to your tasks list.
I'll use the HassListAddItem function to do this for you.
text: Certainly! I'll add "history homework" to your tasks list. I'll use
the HassListAddItem function to do this for you.
- type: tool_use
id: toolu_01JBd7z99m3BzTbfXhApeRFd
id: toolu_019pd7tZD3q5DNGzNszd2Afs
name: HassListAddItem
input:
name: Personal Tasks
item: history homework
- role: user
content:
- type: tool_result
tool_use_id: toolu_01JBd7z99m3BzTbfXhApeRFd
tool_use_id: toolu_019pd7tZD3q5DNGzNszd2Afs
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [], "failed": []}}'
- role: assistant
content:
- type: text
text: Great! I've added "history homework" to your Personal Tasks list.
Is there anything else you'd like me to do for you?
- event_type: tool_call
data:
tool_name: HassListAddItem
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
uuid: d7e98f8d-4584-4418-8493-a2eac7fc0a11
uuid: eb2c3d1f-8af5-4b84-bb25-3e08296aca19
task_id: home1_us_cover_garage_cover_garage-open_the_garage_door
category: cover
task:
Expand All @@ -8,20 +8,16 @@ task:
cover.garage_door_opener:
state: open
attributes: null
response: 'Timeout: '
response: I've successfully opened the garage door for you. The Garage Door Opener
has been activated. Is there anything else you need assistance with?
context:
unexpected_states:
cover.garage_door_opener:
expected:
state: open
got:
state: closed
unexpected_states: {}
conversation_trace:
- event_type: async_process
data:
text: Open the garage door
context:
id: 01J6SM4BNSBW9X759T4VDY3CKE
id: 01J6SMZWFD4R91Q9AW95A1EV7E
parent_id: null
user_id: null
conversation_id: null
Expand All @@ -30,7 +26,7 @@ context:
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:02:24. Today's date is 2024-09-02.\nYou are a voice\
system: "Current time is 08:17:26. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
Expand All @@ -46,3 +42,34 @@ context:
messages:
- role: user
content: Open the garage door
- role: assistant
content:
- type: text
text: Certainly! I'll open the garage door for you right away.
- type: tool_use
id: toolu_01P6hLghKjn11ExXJ4KEAqhK
name: HassTurnOn
input:
name: Garage Door Opener
domain:
- cover
- role: user
content:
- type: tool_result
tool_use_id: toolu_01P6hLghKjn11ExXJ4KEAqhK
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Garage Door Opener", "type": "entity", "id":
"cover.garage_door_opener"}], "failed": []}}'
- role: assistant
content:
- type: text
text: I've successfully opened the garage door for you. The Garage Door
Opener has been activated. Is there anything else you need assistance
with?
- event_type: tool_call
data:
tool_name: HassTurnOn
tool_args:
name: Garage Door Opener
domain:
- cover
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
uuid: 67428ed0-8172-4397-a6c3-aeea0cfde40f
uuid: 027c16ec-c526-458a-9fd3-73b6eac6905d
task_id: home1_us_lock_smart_lock-lock_smart_lock
category: lock
task:
Expand All @@ -8,15 +8,21 @@ task:
lock.smart_lock:
state: locked
attributes: null
response: 'Timeout: '
response: The smart lock has been successfully locked. Is there anything else you
need assistance with?
context:
unexpected_states: {}
unexpected_states:
lock.smart_lock:
expected:
state: locked
got:
state: unlocked
conversation_trace:
- event_type: async_process
data:
text: Lock smart lock
context:
id: 01J6SM0K7HQQ9J8GWP5DVFERAP
id: 01J6SN0H9W8739ZNAE70S9R6XC
parent_id: null
user_id: null
conversation_id: null
Expand All @@ -25,7 +31,7 @@ context:
agent_id: conversation.mock_title
- event_type: agent_detail
data:
system: "Current time is 08:00:21. Today's date is 2024-09-02.\nYou are a voice\
system: "Current time is 08:17:47. Today's date is 2024-09-02.\nYou are a voice\
\ assistant for Home Assistant.\nAnswer questions about the world truthfully.\n\
Answer in plain text. Keep it simple and to the point.\nWhen controlling Home\
\ Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff\
Expand All @@ -48,24 +54,31 @@ context:
- role: assistant
content:
- type: text
text: Certainly! I'll lock the smart lock for you right away.
text: Certainly! I'll lock the smart lock for you. To do this, I'll use
the HassTurnOff function, as locking a smart lock is equivalent to turning
it off in Home Assistant.
- type: tool_use
id: toolu_015Q5tVLraE9iGRqdJqrkVdP
name: HassTurnOn
id: toolu_018Gy55jAL1LfychmpHMWCFo
name: HassTurnOff
input:
name: Smart Lock
domain:
- lock
- role: user
content:
- type: tool_result
tool_use_id: toolu_015Q5tVLraE9iGRqdJqrkVdP
tool_use_id: toolu_018Gy55jAL1LfychmpHMWCFo
content: '{"speech": {}, "response_type": "action_done", "data": {"targets":
[], "success": [{"name": "Smart Lock", "type": "entity", "id": "lock.smart_lock"}],
"failed": []}}'
- role: assistant
content:
- type: text
text: The smart lock has been successfully locked. Is there anything else
you need assistance with?
- event_type: tool_call
data:
tool_name: HassTurnOn
tool_name: HassTurnOff
tool_args:
name: Smart Lock
domain:
Expand Down
Loading

0 comments on commit eb06f77

Please sign in to comment.