Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical Deserialization issue #167

Closed
Wicpar opened this issue Aug 12, 2023 · 18 comments · Fixed by #197
Closed

Critical Deserialization issue #167

Wicpar opened this issue Aug 12, 2023 · 18 comments · Fixed by #197

Comments

@Wicpar
Copy link

Wicpar commented Aug 12, 2023

Hi, i have been using this crate for months without issues and suddenly i get these errors:

2023-08-12T23:41:01.307650Z ERROR chromiumoxide::conn: Failed to deserialize WS response data did not match any variant of untagged enum Message
2023-08-12T23:41:01.307668Z ERROR chromiumoxide::handler: WS Connection error: Serde(Error("data did not match any variant of untagged enum Message", line: 0, column: 0))

this makes no sense, i reinstalled the fetched (using the fetcher) chromium, reinstalled the crate. The code hasn't changed from when it worked...

This is highly blocking as my service depends heavily on this crate.

The only explaination i can imagine is that the Message enum depends on remote data and there is no catchall to resolve potential changes.

@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

same issues happen when i run it on WSL, so it proves it's not system dependent, i verified there is no code change related to chromium, restarted the computer, so it must be a change in chromium.

@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

The last Message to be parsed before the issues is the following:

{"method":"Page.lifecycleEvent","params":{"frameId":"B0FCF18A982213C9947D313EAA8F934A","loaderId":"547DA6CC3D4A41314EA08A88BFA62B21","name":"commit","timestamp":1531.878478},"sessionId":"E0BCD37484373226136272710B8CB432"}

Edit: Confirmed to originate from

tracing::error!("Failed to deserialize WS response {}", err);

@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

CdpEventMessage seems to be the curlpit, but it does have a catchall, it should match

@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

my entire code is the following, you should be able to try it out with minimal modification:

#[derive(Clone)]
pub struct PDFService {
    browser: Arc<Browser>,
    semaphore: Arc<Semaphore>,
}

impl PDFService {
    pub async fn new(templates: Option<&TemplateService>) -> anyhow::Result<Self> {
        let download_path = Path::new("./download");
        tokio::fs::create_dir_all(&download_path).await?;
        let fetcher = BrowserFetcher::new(
            BrowserFetcherOptions::builder()
                .with_path(&download_path)
                .build()?,
        );
        info!("Loading Chromium...");
        let info = fetcher.fetch().await?;
        let browser = Arc::new(Self::browser(&info.executable_path).await?);
        info!("Chromium Loaded: {}", info);
        let this = Self {
            browser,
            semaphore: Arc::new(Semaphore::new(
                available_parallelism()
                    .unwrap_or(NonZeroUsize::new(1).unwrap())
                    .get(),
            )),
        };
        if let Some(templates) = templates {
            // move this part to a better place once more templates are added
            info!("Verifying Builtin templates...");
            CertificateTemplate::test(&this, templates).await?;
        }
        Ok(this)
    }

    async fn browser(path: impl AsRef<Path>) -> anyhow::Result<Browser> {
        let (browser, mut handler) = Browser::launch(
            BrowserConfig::builder()
                .chrome_executable(path)
                .request_timeout(Duration::from_secs(1))
                .no_sandbox()
                .build()
                .map_err(|it| anyhow!("Failed to create Chromium context: {}", it))?,
        )
        .await?;
        // spawn a new task that continuously polls the handler, it is required to drive the events, it will end and deallocate when the handler closes
        let _handle = tokio::task::spawn(async move {
            while let Some(h) = handler.next().await {
                if let Err(err) = h {
                    error!("Error in chromium handler: {}", err);
                    break;
                }
            }
        });
        Ok(browser)
    }

    pub(crate) async fn render_template(&self, html: String) -> anyhow::Result<Bytes> {
        let _semaphore = self.semaphore.acquire().await?; // do at most n simultaneously
        let html = html.replace(
            "<head>",
            &format!(
                "<head><script>{}</script>",
                include_str!("pagedjs.polyfill.min.js")
            ),
        );
        let page = self.browser.new_page("about:blank").await?; // <-- error here
        let res = page
            .set_content(html)
            .await?
            .pdf(
                chromiumoxide::cdp::browser_protocol::page::PrintToPdfParams {
                    prefer_css_page_size: Some(true),
                    print_background: Some(true),
                    ..Default::default()
                },
            )
            .await?;
        page.close().await?;
        Ok(res.into())
    }
}

@Wicpar Wicpar changed the title Sudden connection issues Critical Deserialization issue Aug 13, 2023
@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

  • clearing cache: nope
  • reinstalling: nope
  • restarting: nope
  • pulling from git: nope
  • changing the relevant conn.rs lines to the following:
 match Stream::poll_next(Pin::new(&mut pin.ws), cx) {
          Poll::Ready(Some(Ok(msg))) => {
              let data = msg.into_data();
              return match serde_json::from_slice::<Message<T>>(&data) {
                  Ok(msg) => {
                      tracing::trace!("Received {:?}", msg);
                      Poll::Ready(Some(Ok(msg)))
                  }
                  Err(_) => {
                      return match serde_json::from_slice::<T>(&data) {
                          Ok(msg) => {
                              tracing::trace!("Received {:?}", msg);
                              Poll::Ready(Some(Ok(Message::Event(msg)))) // <-- actually goes here !
                          }
                          Err(err) => {
                              tracing::error!("Failed to deserialize WS response {}", err);
                              Poll::Ready(Some(Err(err.into())))
                          }
                      };
                  }
              };
          }
          Poll::Ready(Some(Err(err))) => {
              return Poll::Ready(Some(Err(CdpError::Ws(err))));
          }
          _ => {}
      }

fixes it

It's the craziest thing, looks like a serde issue, at least i have a temp fix.

@Wicpar
Copy link
Author

Wicpar commented Aug 13, 2023

it seems to be in an issue with how serde visits untagged enums.
a quick and dirty implemnetation of the deserializer fixes it too:

impl<'de, T>  Deserialize<'de> for Message<T> where T: for<'t> Deserialize<'t> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de> {
        let value = Value::deserialize(deserializer)?;
        if let Ok(response) = serde_json::from_value(value.clone()) {
            return Ok(Self::Response(response))
        }
        use serde::de::{Error};
        Ok(Self::Event(serde_json::from_value(value.clone()).map_err(|err|Error::custom(format!("{:?}", err)))?))
    }
}

i'm open to doing a PR, i think the best utility-wise would be to implement deserialize manually, maybe there is a better non-bugged way to implement it. Maybe CdpEventMessage can be changed to play nicer with untagged enums

@Sytten
Copy link
Contributor

Sytten commented Dec 12, 2023

Did we end up fixing this issue?

@Wicpar
Copy link
Author

Wicpar commented Dec 12, 2023

i don't think so, i still use my fork with the fix.

@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 18, 2023

I have had a similar issue, but in my case the issue comes from an invalid message:

{
  "requestId": "...",
  "blockedCookies": [
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=5; expires=Mon, 18 Dec 2023 01:25:44 GMT; Max-Age=3600; Path=/",
      "cookie": {
        "name": "...",
        "value": "5",
        "domain": "...",
        "path": "/",
        "expires": 1702862744.118586,
        "size": 21,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 20,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 11,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    },
    {
      "blockedReasons": ["SameSiteUnspecifiedTreatedAsLax"],
      "cookieLine": "...=\"\"; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/",
      "cookie": {
        "name": "...",
        "value": "\"\"",
        "domain": "...",
        "path": "/",
        "expires": null,
        "size": 12,
        "httpOnly": false,
        "secure": false,
        "session": false,
        "priority": "Medium",
        "sameParty": false,
        "sourceScheme": "Secure",
        "sourcePort": 443
      }
    }
  ],
  "headers": {
    "server": "cloudflare",
    "more": "redacted"
  },
  "resourceIPAddressSpace": "Public",
  "statusCode": 200,
  "cookiePartitionKey": "https://...",
  "cookiePartitionKeyOpaque": false
}

The error was at "invalid type: null, expected f64", line: 31, column: 33.
It is, in fact, looking like the browser sends bad data - either that or the parsed/codegen is wrong, or the PDL is wrong.

Anyhow, it looks like in any case, a failure to parse a message can be shoved under the rug in 95% of cases - so I propose that we make the parsing errors non-critical as opposed to crashing the whole handler.
In my case, I do not care about that particular message type that crashes and takes everything else (that I actually do care about) with it.

@Wicpar
Copy link
Author

Wicpar commented Jan 8, 2024

Closing this as it seems fixed in latest version

@Wicpar Wicpar closed this as completed Jan 8, 2024
@MOZGIII
Copy link
Contributor

MOZGIII commented Jan 8, 2024

It's not fixed - it's a design issue. If you don't encounter it it is because you don't receive messages that the CDP schema you have can't handle.
The issue is unknown responses message the whole protocol, instead of being ignored.

So the issue is likely hidden in the latest version, rather being fundamentally fixed.

@Wicpar
Copy link
Author

Wicpar commented Jan 8, 2024

i switched to short lived browser instances it might mitigate the issue

@ryo33
Copy link
Contributor

ryo33 commented Feb 14, 2024

I guess the source of the original problem came from #[serde(untagged)]. A derived deserializer of an untagged enum does not work correctly with serde_json and f64 etc somewhere in its type definition. The CdpEvent type seems to contain these types which is problematic on using an untagged enum and serde_json.
This seems the reason why #194 works as a fix.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 14, 2024

Interesting, is there a test you could reproduce it with? It seems regression prevention would be very necessary here.
If it is indeed a bug within serde and not the missing types - it would clearly be a better fix than skipping an error.

However this does not mean the loop should be crashing when it encounters a parsing error...

@Wicpar
Copy link
Author

Wicpar commented Feb 14, 2024

Big structures that can be partially parsed should be partially parsed. if unknown data is deserialized it sould be deserialized as unknown.

@ryo33
Copy link
Contributor

ryo33 commented Feb 15, 2024

However this does not mean the loop should be crashing when it encounters a parsing error...

Big structures that can be partially parsed should be partially parsed. if unknown data is deserialized it sould be deserialized as unknown.

I agree with both. I think adding CdpEvent::Unknown(serde_json::Value) as a variant is the first choice.
I believe there is no chance of failing to parse a JSON intended to be a response (by Chromium) because the Response type seems durable to unexpected compatibility issues. (I said the same thing on #206 (comment)) So we can ignore the case, an invalid response would be sent from Chromium. So, I believe that adding the variant that catches invalid events solves almost all cases we met and will meet.

Also, I think crashing the handler is ok in the case the Chromium sends an invalid response intended as a response, not an event. Because it will break the semantics of the request/response pattern if skipped. (Conversely, I think it's acceptable to skip processing invalid events if Chromium intended it as an event, not a response, and it's documented on event handler APIs in chromiumoxide.)

@Wicpar
Copy link
Author

Wicpar commented Feb 15, 2024

Yes, if the whole struct is unusable it should crash, what i meant is if the return value contains a single invalid element in an array it should not invalidate the rest even in responses. One can then choose to retry if the value is likely wrong.

@ryo33
Copy link
Contributor

ryo33 commented Feb 16, 2024

IMO one of the greatest benefits of using chromiumoxide is that the types reflect the exact protocol definition. If we are going to allow the nice partial parsing as you meant (if I understand it correctly), it introduces some complexity to the type generation and user experience. But, indeed, chromium occasionally sends invalid responses or events to the protocol, so we need some way to deal with the situation. If at most the handler does not crash immediately and we can know the wrong message, we can choose how to resolve that, like ignoring, filing an issue to crbug, or handling the serde_json::Value in custom code for the situation.

I think the problem is that we cannot prevent the handler crashes, without staying to the chromium version that worked before, like MOZGIII said. As for events, I suggest the solution on #207, and as for responses it already handles the parsing error without crashing (but we still cannot get the serde_json::Value of the wrong response, and I'd like to fix this in another pull request).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants