Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the <mark> element? #4

Open
davidschachter opened this issue Oct 2, 2020 · 2 comments
Open

Support for the <mark> element? #4

davidschachter opened this issue Oct 2, 2020 · 2 comments

Comments

@davidschachter
Copy link

Basic support for <mark>, defined in section "3.3.2 mark Element" of the SSML specification, would call back to a user-supplied function when the is encountered. This might be done by replacing line 287, utterance.onend = resume; , with a callback that would execute the user-supplied function, then call resume. That only works for a mark at the end of an utterance. To support multiple marks at the end, a queue of callbacks would be needed. Likewise, a mark at the beginning of an utterance could be handled with the onstart event.

However, a mark in the middle of an utterance can't work. The utterance would need to be split into multiple utterances and my experience with WebSpeech browser implementations is that they mess up the prosody in this case.

More sophisticated support for <mark> would include the startmark and endmark attributes on the <speech> tag, per section "3.1.1.1 Trimming Attributes" of the SSML specification.

@belmec
Copy link

belmec commented Oct 6, 2020

j'ai vraiment besoin d'utiliser mark. comment je peux faire. merci de me fournir un exemple concret . très important merçi

@guest271314
Copy link
Owner

However, a mark in the middle of an utterance can't work. The utterance would need to be split into multiple utterances

According to the specification https://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/#S3.3.2 <mark> element does not affect speech synthesis output

When processing a mark element, a synthesis processor must do one or both of the following:

  • inform the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output.
    - when audio output of the SSML document reaches the mark, issue an event that includes the required name attribute of the element. The hosting environment defines the destination of the event.

The mark element does not affect the speech output process.

The element can be placed anywhere in the input SSML.

Do you propose that in this case hosting environment is the code in this repository and one or both of the requirements be met by notifying the code, that is the caller of the parser instance?

and my experience with WebSpeech browser implementations is that they mess up the prosody in this case.

Can you provide a minimal, complete, verifiable example?

More sophisticated support for would include the startmark and endmark attributes on the tag, per section "3.1.1.1 Trimming Attributes" of the SSML specification.

Are you referring to the <speak> tag?

Web Speech API specification does not provide any means to capture or adjust the raw audio output in "real-time".

What is possible is either capturing the audio output using getUserMedia() to get a MediaStreamTrack of the output, which can be processed through Web Audio API, or calling the speech synthesis engine directly then adjusting the output dynamically based on SSML.

Another alternative is implementing the <mark> tag per specification in code first, then testing using espeak-ng or other speech synthesis application locally, where it is possible get the audio output based on specific inputs, and adjust the raw data before outputting to speakers or file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants