Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws::Translate::Client - translate_text Returns Random Do-Not-Translate IDs or Skips Translations Depending on < and > Character Usage #3191

Open
1 task
chadrschroeder opened this issue Feb 13, 2025 · 1 comment
Labels
service-api General API label for AWS Services.

Comments

@chadrschroeder
Copy link

Describe the bug

Sorry if this is the wrong place to report this since this is an issue with the API and not the Ruby client, but I noticed this when testing out the do-not-translate syntax and I have examples in Ruby to illustrate the problems. There are 2 issues which are probably related.

A. Internal IDs Are Returned

# Good.  Skips translating the text.
Aws::Translate::Client.new.translate_text(text: '<span translate="no">Skip me</span>', source_language_code: 'en', target_language_code: 'fr').translated_text
 => "<span translate=\"no\">Skip me</span>"

# Good.  Extra ">" on the right side doesn't cause problems.
Aws::Translate::Client.new.translate_text(text: '<span translate="no">Skip me</span>>', source_language_code: 'en', target_language_code: 'fr').translated_text
 => "<span translate=\"no\">Skip me</span> >"

# Bad.  Extra "<" on the left side returns gibberish.
Aws::Translate::Client.new.translate_text(text: '<<span translate="no">Skip me</span>', source_language_code: 'en', target_language_code: 'fr').translated_text
 => "<DNT_GEBKJMMFHEHCKOAJBKHKJHCAkDHDNDDD"

This appears to be returning a unique random ID for some kind of "Do-Not-Translate" element because the response always starts with "DNT_" or "dnt_", followed by 32 random characters.

This isn't a huge issue because the example is contrived and the input isn't valid HTML, but there may be a problem somewhere down in the do-not-translate parsing which reveals internal values.

B. Text Between Greater Than and Less Than Characters Can Remain Untranslated

Probably as a result of how the do-not-translate parsing works, there's also a different issue where text between < and > characters sometimes isn't translated depending on other punctuation involved. I'm not sure how to avoid this because even when using &lt; and &gt; to escape the characters some text may remain untranslated.

# Good.  Translates everything when there's a "." between the "<" and ">"
text = "If expenses < revenue then we have a profit. But if expenses > revenue then we have a loss."
Aws::Translate::Client.new.translate_text(text: text, source_language_code: 'en', target_language_code: 'fr').translated_text
 => "Si les dépenses sont inférieures aux recettes, nous avons un bénéfice. Mais si dépenses > recettes, nous avons une perte."

# Bad.  Skips Translating text between "<" and ">".
text = "If expenses < revenue then we have a profit, but if expenses > revenue then we have a loss."
Aws::Translate::Client.new.translate_text(text: text, source_language_code: 'en', target_language_code: 'fr').translated_text
 => "Si les dépenses sont < revenue then we have a profit, but if expenses > des recettes, nous avons une perte."

# Bad.  Skips Translating text between "<" and ">".
text = "If expenses &lt; revenue then we have a profit, but if expenses &gt; revenue then we have a loss."
Aws::Translate::Client.new.translate_text(text: text, source_language_code: 'en', target_language_code: 'fr').translated_text
 => "Si les dépenses sont < revenue then we have a profit, but if expenses > des recettes, nous avons une perte."

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

A. Do not return DNT ID values when there are extra < or > characters in the text.
B. Do not skip translating text between &lt; and &gt; characters.

Current Behavior

A. Returns "<DNT_..." value which wipes out some of the input text.
B. Skips translating text between &lt; and &gt; characters.

Reproduction Steps

# Returns internal DNT ID
Aws::Translate::Client.new.translate_text(text: '<<span translate="no">Skip me</span>', source_language_code: 'en', target_language_code: 'fr').translated_text
 => "<DNT_GEBKJMMFHEHCKOAJBKHKJHCAkDHDNDDD"

# Doesn't translate text between "<" and ">" characters
text = "If expenses &lt; revenue then we have a profit, but if expenses &gt; revenue then we have a loss."
Aws::Translate::Client.new.translate_text(text: text, source_language_code: 'en', target_language_code: 'fr').translated_text
 => "Si les dépenses sont < revenue then we have a profit, but if expenses > des recettes, nous avons une perte."

Possible Solution

No response

Additional Information/Context

No response

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-translate (1.79.0)

Environment details (Version of Ruby, OS environment)

Ruby 3.2.6, macOS Sonoma 14.7.3

@chadrschroeder chadrschroeder added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 13, 2025
@mullermp
Copy link
Contributor

Thanks for the bug report. I agree this is for the service team, so not actionable by us. I created an internal ticket at V1674562147.

@mullermp mullermp added service-api General API label for AWS Services. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-api General API label for AWS Services.
Projects
None yet
Development

No branches or pull requests

2 participants