Support for the element? #4

davidschachter · 2020-10-02T00:23:04Z

Basic support for , defined in section "3.3.2 mark Element" of the SSML specification, would call back to a user-supplied function when the is encountered. This might be done by replacing line 287, utterance.onend = resume; , with a callback that would execute the user-supplied function, then call resume. That only works for a mark at the end of an utterance. To support multiple marks at the end, a queue of callbacks would be needed. Likewise, a mark at the beginning of an utterance could be handled with the onstart event.

However, a mark in the middle of an utterance can't work. The utterance would need to be split into multiple utterances and my experience with WebSpeech browser implementations is that they mess up the prosody in this case.

More sophisticated support for would include the startmark and endmark attributes on the <speech> tag, per section "3.1.1.1 Trimming Attributes" of the SSML specification.

belmec · 2020-10-06T10:49:44Z

j'ai vraiment besoin d'utiliser mark. comment je peux faire. merci de me fournir un exemple concret . très important merçi

guest271314 · 2020-10-15T03:36:32Z

However, a mark in the middle of an utterance can't work. The utterance would need to be split into multiple utterances

According to the specification https://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/#S3.3.2  element does not affect speech synthesis output

When processing a mark element, a synthesis processor must do one or both of the following:

inform the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output.
- when audio output of the SSML document reaches the mark, issue an event that includes the required name attribute of the element. The hosting environment defines the destination of the event.

The mark element does not affect the speech output process.

The element can be placed anywhere in the input SSML.

Do you propose that in this case hosting environment is the code in this repository and one or both of the requirements be met by notifying the code, that is the caller of the parser instance?

and my experience with WebSpeech browser implementations is that they mess up the prosody in this case.

Can you provide a minimal, complete, verifiable example?

More sophisticated support for would include the startmark and endmark attributes on the tag, per section "3.1.1.1 Trimming Attributes" of the SSML specification.

Are you referring to the <speak> tag?

Web Speech API specification does not provide any means to capture or adjust the raw audio output in "real-time".

What is possible is either capturing the audio output using getUserMedia() to get a MediaStreamTrack of the output, which can be processed through Web Audio API, or calling the speech synthesis engine directly then adjusting the output dynamically based on SSML.

Another alternative is implementing the  tag per specification in code first, then testing using espeak-ng or other speech synthesis application locally, where it is possible get the audio output based on specific inputs, and adjust the raw data before outputting to speakers or file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for the <mark> element? #4

Support for the <mark> element? #4

davidschachter commented Oct 2, 2020

belmec commented Oct 6, 2020

guest271314 commented Oct 15, 2020

Support for the <mark> element? #4

Support for the <mark> element? #4

Comments

davidschachter commented Oct 2, 2020

belmec commented Oct 6, 2020

guest271314 commented Oct 15, 2020