Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.$detach() returns null when being used on image element #274

Open
markfixgg opened this issue Jun 13, 2024 · 6 comments
Open

.$detach() returns null when being used on image element #274

markfixgg opened this issue Jun 13, 2024 · 6 comments

Comments

@markfixgg
Copy link

markfixgg commented Jun 13, 2024

I am trying to extract image element from page to get it's as base64, there is few approaches that i see:

  1. intercept HTTP request using tab.on("resource")
  2. extract "src" and load image by making request
  3. extract element from page and draw it inside of canvas and then we can use canvas.toDataURL to get base64

I prefer third approach as it is not intend to make extra calls and can be reused as many times as i want without affecting performance. I know for sure that .$detach() on image element worked before, because i tested it and had successful results... but now it returns NULL instead of ISuperElement

Here is snippet to replicate issue:

import Hero from '@ulixee/hero-playground';

(async () => {
    const hero = new Hero({
        showChrome: true,
        userAgent: '~ chrome >= 105 && windows >= 10'
    });

    await hero.goto('https://nopecha.com/demo/recaptcha#hard');

    const iframeElement = await hero.querySelector('div[class="g-recaptcha"] iframe').$waitForVisible();

    const iframe = await hero.getFrameEnvironment(iframeElement);
    if (!iframe) return console.log('Iframe not loaded');

    const checkbox = await iframe.querySelector('span[class*="recaptcha-checkbox"][aria-checked="false"]').$waitForVisible();
    await checkbox.$click();

    await (async () => {
        const iframeElement = await hero.querySelector('iframe[title*="challenge"]').$waitForVisible();

        const iframe = await hero.getFrameEnvironment(iframeElement);
        if (!iframe) return console.log('Iframe not loaded');

        const image = await iframe.querySelector('img[class*="rc-image-tile"]').$waitForVisible();

        console.log(image); // => image is loaded and attributes such as "src" can be extracted
        console.log(await image.$detach()); // => returns null
    })()

    await new Promise((resolve) => setTimeout(resolve, 60000));
})();
@markfixgg markfixgg changed the title .$detach() not working on image element .$detach() returns null when being used on image element Jun 13, 2024
@blakebyrnes
Copy link
Contributor

It seems like detach is indeed broken here. You could look in the logs/session database to try to figure out if there's any kind of error shown.

However, this won't work in detached dom in any case. Canvas doesn't produce dom changes, and we haven't yet built anything to record all the canvas changes that occur.

i think your best option is actually to use toDataURL() on the image itself in page. Does that api not work?

@markfixgg
Copy link
Author

markfixgg commented Jun 13, 2024

image doesn't have such method "toDataURL" if I am not wrong

Regarding canvas - I use canvas on NodeJS side:

import { createCanvas } from 'canvas';

export const getBase64Image = async (image: ISuperElement) => {
    const canvas = createCanvas(Number(await image.width), Number(await image.height));

    const ctx = canvas.getContext("2d");
              ctx.drawImage(await image.$detach() as any, 0, 0);

    return canvas.toDataURL("image/png").replace(/^data:image\/?[A-z]*;base64,/, '');
}

@blakebyrnes
Copy link
Contributor

blakebyrnes commented Jun 13, 2024

Sorry, I confused myself on this one. The 1st option is the preferred approach if these are http images (eg, not page drawn) since it won't require any extra work. The backend is already loading the image, so this is just a step of sending it to client. It will also exist in your session database if that's preferable. Is there a reason not to use 1st?

@blakebyrnes
Copy link
Contributor

I guess you are wanting base64. The data will be raw buffer, so you would just add toString('base64') on a modern version of node.

@markfixgg
Copy link
Author

markfixgg commented Jun 14, 2024

This approach also acceptable for me and i am using it right now, works perfectly as well. But i would leave this issue open if you don't against. Thank you for your reply, and whenever i will have some free time, i will try to figure out why detach is not working on image elements, and maybe even will try to contribute to fix this issue

@mpopov
Copy link

mpopov commented Jun 19, 2024

Also .$detach() returns null when hero instance is created with viewport option set. Without viewport option it works okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants