Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Jan 27, 2025
1 parent 0e5a1ea commit a12c7f8
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 9 deletions.
20 changes: 11 additions & 9 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated January 26, 2025.</p>
<p>Tests are run every day at 1am PT. Last updated January 27, 2025.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand All @@ -58,12 +58,12 @@ <h1>How's GPT-4o Doing?</h1>
<div class="feature_header" style="min-height: auto">
<div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
<h2>Response Time</h2>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.74 seconds</b> per request.</p>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.73 seconds</b> per request.</p>
<p class="subtitle">This number only accounts for requests made by this application.</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>3.74 s</p>
<p>3.73 s</p>
</div>
</div>
</div>
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>7</pre>
<pre>8</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.4, 'y': 0.35, 'width': 0.3, 'height': 0.25}</pre>
<pre>{'x': 0.42, 'y': 0.35, 'width': 0.18, 'height': 0.28}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -361,7 +361,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
{
"R": 82,
"G": 0,
"B": 128
"B": 138
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -403,7 +403,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.018</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -417,10 +417,12 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>It appears that the dataset captures cars on the road with bounding boxes (red boxes). The image shows several cars labeled correctly, but there is at least one car (the white car on the right) that seems unlabeled.
<pre>To count the missing annotations, I would need to know the total number of cars visible in the image versus the number of cars with red bounding boxes. Based on the image:

Here's the result in JSON format:
1. **Cars annotated with bounding boxes:** There are 6 red bounding boxes visible.
2. **Cars visible in the scene, including unannotated ones:** It appears there is one car near the farthest end of the scene without a bounding box on it.

### JSON Output:
```json
{
"missing": 1
Expand Down
106 changes: 106 additions & 0 deletions results/2025-01-27.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.006400000000000001,
"pass_fail": "Pass",
"response_time": 1.9188237190246582,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.00882,
"pass_fail": "Fail",
"response_time": 1.8202457427978516,
"result": "8"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.00988,
"pass_fail": "Fail",
"response_time": 2.523519992828369,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00974,
"pass_fail": "Pass",
"response_time": 8.458041667938232,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 2.442870616912842,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015070000000000002,
"pass_fail": "Pass",
"response_time": 3.94942045211792,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.3245492371705965,
"success": false,
"price": 0.01044,
"pass_fail": "Fail",
"response_time": 2.722054958343506,
"result": "{'x': 0.42, 'y': 0.35, 'width': 0.18, 'height': 0.28}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01174,
"pass_fail": "Fail",
"response_time": 2.396726131439209,
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9673202614379085,
"success": false,
"price": 0.009850000000000001,
"pass_fail": "Fail",
"response_time": 2.0315816402435303,
"result": "```json\n{\n \"R\": 82,\n \"G\": 0,\n \"B\": 138\n}\n```"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.01787,
"pass_fail": "Fail",
"response_time": 3.923919439315796,
"result": "To count the missing annotations, I would need to know the total number of cars visible in the image versus the number of cars with red bounding boxes. Based on the image:\n\n1. **Cars annotated with bounding boxes:** There are 6 red bounding boxes visible.\n2. **Cars visible in the scene, including unannotated ones:** It appears there is one car near the farthest end of the scene without a bounding box on it.\n\n### JSON Output:\n```json\n{\n \"missing\": 1\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.009720000000000001,
"pass_fail": "Fail",
"response_time": 3.5829694271087646,
"result": "```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.00636,
"pass_fail": "Pass",
"response_time": 1.9920110702514648,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.006860000000000001,
"pass_fail": "Pass",
"response_time": 1.6189548969268799,
"result": "charybdis indubitable"
}
}

0 comments on commit a12c7f8

Please sign in to comment.