Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adoption agency limit #297

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DemiMarie
Copy link

@DemiMarie DemiMarie commented Aug 6, 2017

This hardens the code against denial of service attacks by only going back a certain number of elements (set at parser construction time) in the list of active formatting elements and the stack of open elements. With this change, I have not been able to find any ways to cause html5ever to hang, even on pathological input.

This is a breaking change, since it adds an element to TreeBuilderOpts and causes non-conforming parsing of very deeply nested input. It is intended that the parsing of valid input should not be affected.

@DemiMarie
Copy link
Author

My understanding is that it is impossible to solve #289 without imposing some arbitrary limit – otherwise, it is possible to produce DOMs that are quadratic size.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #274) made this pull request unmergeable. Please resolve the merge conflicts.

@Ygg01
Copy link
Contributor

Ygg01 commented Aug 9, 2017

@DemiMarie #274 moved implentations from html5ever/src/tree_builder/actions.rs into html5ever/src/tree_builder/mod.rs, which broke this change. You will need to rebase.

@DemiMarie
Copy link
Author

@Ygg01 Rebased and squashed.

I went back to a limit of 10, because I doubt that most web pages will hit that limit, and because the limit*8 is roughly the “amplification factor” that a malicious document can achieve. This does cause some tests to fail though.

Part of this can be solved by better algorithms, but part of it is inherent to (and IMO a bug in) the HTML parsing algorithm, which allows for quadratically-sized DOMs to be created from linearly-sized input.

This hardens the code against denial of service attacks by only going
back a certain number of elements (set at parser construction time)
in the list of active formatting elements and the stack of open
elements.
@Ygg01
Copy link
Contributor

Ygg01 commented Aug 11, 2017

Hm. It seems your code is causing some errors in tree building. Or perhaps testing requires larger limit? I'll inspect once I'm home.

@DemiMarie
Copy link
Author

DemiMarie commented Aug 11, 2017 via email

@Ygg01
Copy link
Contributor

Ygg01 commented Aug 11, 2017

I think that what is really needed is a better data structure. At some point,

Are there any examples what said data structure is?

@DemiMarie
Copy link
Author

@Ygg01 Testing requires a larger limit. I chose the limit I did because the code is already slow and I didn't want for an attacker to be able to DoS a server.

I think that what is really needed is a better data structure. At some point, PRs should be issued against the spec that replace the algorithm with an equivalent but faster version. The algorithm in the spec should be one that people can actually use securely.

@DemiMarie
Copy link
Author

DemiMarie commented Aug 12, 2017 via email

@usamec
Copy link

usamec commented Oct 18, 2017

Any updates on this PR?
(having HTML parser, which does not blow on pathological pages would be really nice).

@SimonSapin
Copy link
Member

@hsivonen, what does Gecko do here?

@S2Ler
Copy link

S2Ler commented Aug 20, 2018

Tried this PR with the latest version and it doesn't fix stack overflow issue during deallocation phase.

@hsivonen
Copy link

I'm so bad at GitHub notifications. Sorry.

@SimonSapin, it's a bit unclear to me what "here" means. The adoption agency algorithm has its outer loop count limited to 8 in the spec, and that's what Gecko limits it to.

Additionally, there's a need to limit the general DOM depth. On that point, you shouldn't copy Gecko but Blink. Please take a look at the Gecko patch to match Blink. It hasn't landed due to the sad combination of Android runtime stack limits and Robohornet. I'm hoping to resolve the Android issue one way or another in 2018H2.

@Ygg01
Copy link
Contributor

Ygg01 commented Dec 17, 2018

@hsivonen Could you explain the gist behind the Blink's limit, in that patch, My C++ foo isn't great. From what I could tell. It creates a limited Vec or array that has up to 512 (or rather 511) elements and errors and returns last element of said array.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #357) made this pull request unmergeable. Please resolve the merge conflicts.

@hsivonen
Copy link

Could you explain the gist behind the Blink's limit, in that patch, My C++ foo isn't great.

ReviewBoard went away, but here is rebased version of the Gecko patch.

In places where the code calls nodeFromStackWithBlinkCompat(), instead of taking the most-recently-pushed element from the tree builder stack, element at index 511 is taken if there are more elements on the stack. Assumes zero-based indexing where the "html" element is at index 0.

@Ygg01
Copy link
Contributor

Ygg01 commented Dec 24, 2018

@DemiMarie could you work in the changes hsivonen mentioned?

@DemiMarie
Copy link
Author

@hsivonen I am working on it.

@hsivonen
Copy link

The fix has finally landed in Gecko and is available for testing in Firefox Nightly.

@freddyb
Copy link

freddyb commented Jul 29, 2019

@DemiMarie Are you stil working on this?

@DemiMarie
Copy link
Author

DemiMarie commented Jul 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants