Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanize parses <script> contents with bad results #111

Open
stoduk opened this issue Nov 25, 2015 · 1 comment
Open

Mechanize parses <script> contents with bad results #111

stoduk opened this issue Nov 25, 2015 · 1 comment

Comments

@stoduk
Copy link

stoduk commented Nov 25, 2015

I've got a page with a bunch of javascript (which I believe Mechanize will make no effort to handle) - and helpfully there is something in that page that looks like the start of a tag. This ends badly! I get the ParseError("nested SELECTs") exception raised when a genuine select is found. So, two questions long term, is it right that Mechanize is parsing the contents of the <script> tag at all? Not handling javascript is fine, but trying to parse that javascript as if it was HTML won't work well. can anyone think of a short term fix to get me moving? Can I manipulate pages within Mechanize (in this case so I can delete the <script> elements)?

@stoduk
Copy link
Author

stoduk commented Nov 25, 2015

I found an answer for (2) - see below. Epic hack, but works :)

No idea how to fix (1), won't waste time looking until someone confirms it is at least something that should be fixed. [For instance, I don't know if a script tag can ever have children - if so that would complicate things, as we'd only want to exclude the script contents and not its children]

_sgmllib_copy.py:parse_starttag()
         self.__starttag_text = rawdata[start_pos:j]
+        if not self.rawdata[i-1] == "'": # HACK - quick way to exclude fake tags hiding in javascript
+           self.finish_starttag(tag, attrs)
         return j

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant