From 8c4f9f1d8fa1a843f2115026a13b52d36dc50d83 Mon Sep 17 00:00:00 2001 From: sachaarbonel Date: Wed, 13 Mar 2024 14:38:39 +0530 Subject: [PATCH] update docs upon feedback --- .../docs/api/recording/recording_calls.mdx | 2 +- .../api/transcription/transcribing_calls.mdx | 31 ++++++++++++++----- .../docusaurus/docs/api/webhooks/events.mdx | 4 +++ 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/docusaurus/video/docusaurus/docs/api/recording/recording_calls.mdx b/docusaurus/video/docusaurus/docs/api/recording/recording_calls.mdx index e2154def..79ddfeba 100644 --- a/docusaurus/video/docusaurus/docs/api/recording/recording_calls.mdx +++ b/docusaurus/video/docusaurus/docs/api/recording/recording_calls.mdx @@ -10,7 +10,7 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Calls can be recorded for later use. Calls recording can be started/stopped via API calls or configured to start automatically when the first user joins the call. -Call recording is done by Stream server-side and later stored on AWS S3. You can also configure your Stream application to have files stored on your own S3 bucket (in that case, storage costs will not apply). +Call recording is done by Stream server-side and later stored on AWS S3. There is no charge for storage of recordings. You can also configure your Stream application to have files stored on your own S3 bucket. By default, calls will be recorded as mp4 video files. You can configure recording to only capture the audio. diff --git a/docusaurus/video/docusaurus/docs/api/transcription/transcribing_calls.mdx b/docusaurus/video/docusaurus/docs/api/transcription/transcribing_calls.mdx index 640bcb56..e620c856 100644 --- a/docusaurus/video/docusaurus/docs/api/transcription/transcribing_calls.mdx +++ b/docusaurus/video/docusaurus/docs/api/transcription/transcribing_calls.mdx @@ -8,11 +8,13 @@ title: Transcribing calls import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -Transcribing calls allows for the conversion of spoken words into written text. Transcription can be started/stopped via API calls or configured to start automatically when the first user joins the call. Call transcription is done by the Stream server-side and later stored on AWS S3. You can also configure your Stream application to have files stored on your own S3 bucket (in that case, storage costs will not apply). +Transcribing calls allows for the conversion of spoken words into written text. Transcription can be started/stopped via API calls or configured to start automatically when the first user joins the call. Call transcription is done by the Stream server-side and later stored on AWS S3. There is no charge for storage of transcriptions. You can also configure your Stream application to have files stored on your own S3 bucket. By default, transcriptions will be provided in a jsonl file. -Note: Transcriptions will capture all speakers in a single file. +> **Note:** Transcriptions will capture all speakers in a single file. + +> **Note:** It's important to note that transcriptions should not be used as a replacement for closed captioning (CC). We have it planned on our [roadmap](https://github.com/GetStream/protocol/discussions/127) to support CC in the future. ## Start and stop call transcription @@ -31,10 +33,10 @@ call.stopTranscription(); ```py -// starts transcribing +# starts transcribing call.start_transcription() -// stops the transcription for the call +# stops the transcription for the call call.stop_transcription() ``` @@ -82,10 +84,23 @@ curl -X GET "https://video.stream-io-api.com/video/call/default/${CALL_ID}/trans These events are sent to users connected to the call and your webhook/SQS: -- `call.transcription_started` when the call transcription has started -- `call.transcription_stopped` when the call transcription has stopped -- `call.transcription_ready` when the transcription is available for download -- `call.transcription_failed` when transcribing fails for any reason +- `call.transcription_started` sent when the transcription of the call has started +- `call.transcription_stopped` this event is sent only when the transcription is explicitly stopped through an API call, not in cases where the transcription process encounters an error. +- `call.transcription_ready` dispatched when the transcription is completed and available for download. An example payload of this event is detailed below. +- `call.transcription_failed` sent if the transcription process encounters any issues. + + +## Transcription JSONL file format + + ```jsonl + {"type":"speech", "start_time": "2024-02-28T08:18:18.061031795Z", "stop_time":"2024-02-28T08:18:22.401031795Z", "speaker_id": "Sacha_Arbonel", "text": "hello"} + {"type":"speech", "start_time": "2024-02-28T08:18:22.401031795Z", "stop_time":"2024-02-28T08:18:26.741031795Z", "speaker_id": "Sacha_Arbonel", "text": "how are you"} + {"type":"speech", "start_time": "2024-02-28T08:18:26.741031795Z", "stop_time":"2024-02-28T08:18:31.081031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm good"} + {"type":"speech", "start_time": "2024-02-28T08:18:31.081031795Z", "stop_time":"2024-02-28T08:18:35.421031795Z", "speaker_id": "Tommaso_Barbugli", "text": "how about you"} + {"type":"speech", "start_time": "2024-02-28T08:18:35.421031795Z", "stop_time":"2024-02-28T08:18:39.761031795Z", "speaker_id": "Sacha_Arbonel", "text": "I'm good too"} + {"type":"speech", "start_time": "2024-02-28T08:18:39.761031795Z", "stop_time":"2024-02-28T08:18:44.101031795Z", "speaker_id": "Tommaso_Barbugli", "text": "that's great"} + {"type":"speech", "start_time": "2024-02-28T08:18:44.101031795Z", "stop_time":"2024-02-28T08:18:48.441031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm glad to hear that"} + ``` ## User Permissions diff --git a/docusaurus/video/docusaurus/docs/api/webhooks/events.mdx b/docusaurus/video/docusaurus/docs/api/webhooks/events.mdx index 5ac85e79..fcbae0ec 100644 --- a/docusaurus/video/docusaurus/docs/api/webhooks/events.mdx +++ b/docusaurus/video/docusaurus/docs/api/webhooks/events.mdx @@ -39,6 +39,10 @@ Here you can find the list of events are sent to Webhook and SQS. | call.recording_stopped | Sent when call recording has stopped | | call.recording_ready | Sent when the recording is available for download | | call.recording_failed | Sent when recording fails for any reason | +| call.transcription_started | Sent when the transcription has started | +| call.transcription_stopped | Sent when the transcription is stopped | +| call.transcription_ready | Sent when the transcription is ready | +| call.transcription_failed | Sent when the transcription fails | You can find the definition of each events in the OpenAPI spec available [here](https://github.com/GetStream/protocol/blob/main/openapi/video-openapi.yaml)