Skip to content
This repository has been archived by the owner on Jan 21, 2023. It is now read-only.

Screenshot shows boxes instead of Korean characters #108

Closed
ryanckulp opened this issue Apr 13, 2020 · 9 comments
Closed

Screenshot shows boxes instead of Korean characters #108

ryanckulp opened this issue Apr 13, 2020 · 9 comments
Labels
bug Something isn't working

Comments

@ryanckulp
Copy link

ryanckulp commented Apr 13, 2020

hi folks,

great utility, deploying to a site with thousands of pages. my challenge is: Korean (한글) characters aren't working.

this is similar to my issue (Hindi language):
puppeteer/puppeteer#5311

here's what i'm seeing in my Zeit-deployed production instance, w/ text "경찰" World:

Screen Shot 2020-04-13 at 11 46 07 PM

but in localhost:3000 it works great!

Screen Shot 2020-04-13 at 11 44 41 PM

i've tried changing:

  • UTF
  • lang
  • fonts (e.g. Arial vs Inter)
  • hard-coding Korean characters into the template e.g. <div class="heading">경찰 test (inside template.js)

i tried these measures to ensure local sanitizers are doing anything funky.

every one of my tests yields the same: works in localhost, doesn't in production. thanks for reading.

@styfle styfle added the bug Something isn't working label Apr 13, 2020
@styfle
Copy link
Member

styfle commented Apr 13, 2020

Hi @ryanckulp

We haven't tried posting in non-english languages since all our content is English at this time.

This could be related to #70.

@styfle styfle changed the title Puppeteer screenshots with boxes > characters Screenshot shows boxes instead of Korean characters Apr 13, 2020
@ryanckulp
Copy link
Author

update.. after 10 hours i'm almost. there.

tldr -- replaced puppateer w/ selenium-webdriver.

one question about Zeit server config: what's the executable path i should be using for Chrome, in prod?

tried: /tmp/chromium // /usr/local/bin/chromedriver

the latter works locally but not in production, says "PATH NOT FOUND."

@styfle
Copy link
Member

styfle commented Apr 14, 2020

ZEIT Now doesn't ship Chrome so you would have to consult the docs for selenium-webdriver to find out where it installs Chrome for Linux. If you can set a path, I would suggest /tmp because that is the only writable directory on the filesystem.

@AntwanSherif
Copy link

@ryanckulp Could you share with me how did you solve this issue? Also it it applicable for Arabic language for example?

@r0mflip
Copy link

r0mflip commented May 31, 2020

Not all fonts support all scripts. Some of the suggested ones would be to use from the Noto family, but even then it doesn't include CJK, Devanagri, Arabic, ...

If #70 works then it would be awesome that we can the required font and tweak the CSS a bit to support the required script. IMHO all-in-one solution might be a far fetch.

@hoosan
Copy link

hoosan commented Jul 11, 2020

I have run into the same "tofu" issue with Japanese characters and got stuck for a while.
As @r0mflip mentioned, adding the Noto family was the way to go.
I share the procedure in case someone is in the same trouble.

  1. Put a font file (e.g. *.otf) in "_fonts" directory.
    image

  2. Read the font file in template.ts

const noto = readFileSync(`${__dirname}/../_fonts/NotoSansJP-Black.otf`).toString('base64');

image

  1. Inject it to CSS
    @font-face {
        font-family: 'Noto Sans Japanese';
        font-style: normal;
        font-weight: normal;
        src: url(data:font/otf;charset=utf-8;base64,${noto}) format('opentype');
    }
  1. Add its font family name to the heading class
    .heading {
        font-family: 'Inter', 'Noto Sans Japanese', sans-serif;
        font-size: ${sanitizeHtml(fontSize)};
        font-style: normal;
        color: ${foreground};
        line-height: 1.8;
    }

That's it. It works fine most of the time.

image

An unresolved issue is that the some characters are still not visible.

image

But If I double the character ("あ" in this case), the problem has gone.

image

Dunno why.. I will try to add some more font files and see how it goes.

@fkymy fkymy mentioned this issue Mar 3, 2022
@tomy0000000
Copy link

I want to point out that even though most of us can get around with this problem by injecting Noto fonts in to HTML templates for now, the true issue here is actually lies within the chrome-aws-lambda.

I did some experiment:

  • Try running on my local dev: worked -> which means the currently provided fonts is fine.
  • Deployed on Vercel: failed -> Something happened on the cloud went wrong.
  • Deployed another instance with OG_HTML_DEBUG=1 envs: worked -> The render html can properly display, which leaves the only possibility to the screenshot function.

@ryanckulp
Copy link
Author

@tomy0000000 thanks for the fix! really appreciate it.

@leerob
Copy link
Member

leerob commented Jan 19, 2023

Closing in favor of #226

@leerob leerob closed this as completed Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants