-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critical Deserialization issue #167
Comments
same issues happen when i run it on WSL, so it proves it's not system dependent, i verified there is no code change related to chromium, restarted the computer, so it must be a change in chromium. |
The last {"method":"Page.lifecycleEvent","params":{"frameId":"B0FCF18A982213C9947D313EAA8F934A","loaderId":"547DA6CC3D4A41314EA08A88BFA62B21","name":"commit","timestamp":1531.878478},"sessionId":"E0BCD37484373226136272710B8CB432"} Edit: Confirmed to originate from Line 145 in 939a126
|
CdpEventMessage seems to be the curlpit, but it does have a catchall, it should match |
my entire code is the following, you should be able to try it out with minimal modification: #[derive(Clone)]
pub struct PDFService {
browser: Arc<Browser>,
semaphore: Arc<Semaphore>,
}
impl PDFService {
pub async fn new(templates: Option<&TemplateService>) -> anyhow::Result<Self> {
let download_path = Path::new("./download");
tokio::fs::create_dir_all(&download_path).await?;
let fetcher = BrowserFetcher::new(
BrowserFetcherOptions::builder()
.with_path(&download_path)
.build()?,
);
info!("Loading Chromium...");
let info = fetcher.fetch().await?;
let browser = Arc::new(Self::browser(&info.executable_path).await?);
info!("Chromium Loaded: {}", info);
let this = Self {
browser,
semaphore: Arc::new(Semaphore::new(
available_parallelism()
.unwrap_or(NonZeroUsize::new(1).unwrap())
.get(),
)),
};
if let Some(templates) = templates {
// move this part to a better place once more templates are added
info!("Verifying Builtin templates...");
CertificateTemplate::test(&this, templates).await?;
}
Ok(this)
}
async fn browser(path: impl AsRef<Path>) -> anyhow::Result<Browser> {
let (browser, mut handler) = Browser::launch(
BrowserConfig::builder()
.chrome_executable(path)
.request_timeout(Duration::from_secs(1))
.no_sandbox()
.build()
.map_err(|it| anyhow!("Failed to create Chromium context: {}", it))?,
)
.await?;
// spawn a new task that continuously polls the handler, it is required to drive the events, it will end and deallocate when the handler closes
let _handle = tokio::task::spawn(async move {
while let Some(h) = handler.next().await {
if let Err(err) = h {
error!("Error in chromium handler: {}", err);
break;
}
}
});
Ok(browser)
}
pub(crate) async fn render_template(&self, html: String) -> anyhow::Result<Bytes> {
let _semaphore = self.semaphore.acquire().await?; // do at most n simultaneously
let html = html.replace(
"<head>",
&format!(
"<head><script>{}</script>",
include_str!("pagedjs.polyfill.min.js")
),
);
let page = self.browser.new_page("about:blank").await?; // <-- error here
let res = page
.set_content(html)
.await?
.pdf(
chromiumoxide::cdp::browser_protocol::page::PrintToPdfParams {
prefer_css_page_size: Some(true),
print_background: Some(true),
..Default::default()
},
)
.await?;
page.close().await?;
Ok(res.into())
}
} |
match Stream::poll_next(Pin::new(&mut pin.ws), cx) {
Poll::Ready(Some(Ok(msg))) => {
let data = msg.into_data();
return match serde_json::from_slice::<Message<T>>(&data) {
Ok(msg) => {
tracing::trace!("Received {:?}", msg);
Poll::Ready(Some(Ok(msg)))
}
Err(_) => {
return match serde_json::from_slice::<T>(&data) {
Ok(msg) => {
tracing::trace!("Received {:?}", msg);
Poll::Ready(Some(Ok(Message::Event(msg)))) // <-- actually goes here !
}
Err(err) => {
tracing::error!("Failed to deserialize WS response {}", err);
Poll::Ready(Some(Err(err.into())))
}
};
}
};
}
Poll::Ready(Some(Err(err))) => {
return Poll::Ready(Some(Err(CdpError::Ws(err))));
}
_ => {}
} fixes it It's the craziest thing, looks like a serde issue, at least i have a temp fix. |
it seems to be in an issue with how serde visits untagged enums. impl<'de, T> Deserialize<'de> for Message<T> where T: for<'t> Deserialize<'t> {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: Deserializer<'de> {
let value = Value::deserialize(deserializer)?;
if let Ok(response) = serde_json::from_value(value.clone()) {
return Ok(Self::Response(response))
}
use serde::de::{Error};
Ok(Self::Event(serde_json::from_value(value.clone()).map_err(|err|Error::custom(format!("{:?}", err)))?))
}
} i'm open to doing a PR, i think the best utility-wise would be to implement deserialize manually, maybe there is a better non-bugged way to implement it. Maybe CdpEventMessage can be changed to play nicer with untagged enums |
Did we end up fixing this issue? |
i don't think so, i still use my fork with the fix. |
I have had a similar issue, but in my case the issue comes from an invalid message:
The error was at Anyhow, it looks like in any case, a failure to parse a message can be shoved under the rug in 95% of cases - so I propose that we make the parsing errors non-critical as opposed to crashing the whole handler. |
Closing this as it seems fixed in latest version |
It's not fixed - it's a design issue. If you don't encounter it it is because you don't receive messages that the CDP schema you have can't handle. So the issue is likely hidden in the latest version, rather being fundamentally fixed. |
i switched to short lived browser instances it might mitigate the issue |
I guess the source of the original problem came from |
Interesting, is there a test you could reproduce it with? It seems regression prevention would be very necessary here. However this does not mean the loop should be crashing when it encounters a parsing error... |
Big structures that can be partially parsed should be partially parsed. if unknown data is deserialized it sould be deserialized as unknown. |
I agree with both. I think adding Also, I think crashing the handler is ok in the case the Chromium sends an invalid response intended as a response, not an event. Because it will break the semantics of the request/response pattern if skipped. (Conversely, I think it's acceptable to skip processing invalid events if Chromium intended it as an event, not a response, and it's documented on event handler APIs in chromiumoxide.) |
Yes, if the whole struct is unusable it should crash, what i meant is if the return value contains a single invalid element in an array it should not invalidate the rest even in responses. One can then choose to retry if the value is likely wrong. |
IMO one of the greatest benefits of using chromiumoxide is that the types reflect the exact protocol definition. If we are going to allow the nice partial parsing as you meant (if I understand it correctly), it introduces some complexity to the type generation and user experience. But, indeed, chromium occasionally sends invalid responses or events to the protocol, so we need some way to deal with the situation. If at most the handler does not crash immediately and we can know the wrong message, we can choose how to resolve that, like ignoring, filing an issue to crbug, or handling the I think the problem is that we cannot prevent the handler crashes, without staying to the chromium version that worked before, like MOZGIII said. As for events, I suggest the solution on #207, and as for responses it already handles the parsing error without crashing (but we still cannot get the |
Hi, i have been using this crate for months without issues and suddenly i get these errors:
this makes no sense, i reinstalled the fetched (using the fetcher) chromium, reinstalled the crate. The code hasn't changed from when it worked...
This is highly blocking as my service depends heavily on this crate.
The only explaination i can imagine is that the
Message
enum depends on remote data and there is no catchall to resolve potential changes.The text was updated successfully, but these errors were encountered: