Skip to content

gh-153027: Make CDATA section parsing in HTMLParser context-dependent#153028

Open
serhiy-storchaka wants to merge 1 commit into
python:mainfrom
serhiy-storchaka:gh-135661-cdata-context
Open

gh-153027: Make CDATA section parsing in HTMLParser context-dependent#153028
serhiy-storchaka wants to merge 1 commit into
python:mainfrom
serhiy-storchaka:gh-135661-cdata-context

Conversation

@serhiy-storchaka

Copy link
Copy Markdown
Member

HTMLParser now follows start and end tags to detect foreign content (the content of svg and math elements), approximating the tree construction dispatcher and the rules for parsing tokens in foreign content of the HTML5 standard: integration points, breakout tags, self-closing foreign elements.

In HTML content, <![CDATA[...]]> is now parsed as a bogus comment which ends at the first >. In foreign content, RAWTEXT and RCDATA elements (such as script, style, title or textarea) are now parsed as normal elements.

The new support_cdata parameter of the constructor allows forcing recognition of CDATA sections in any context (the previous default behavior) or in no context. Calling the private method _set_support_cdata() disables the automatic detection, so existing code which maintains its own tracking machinery in handle_starttag() and handle_endtag() works as before.

There is no performance impact for parsing HTML content without foreign elements: the stack of open elements is only maintained inside foreign content.

…endent

HTMLParser now follows start and end tags to detect foreign content
(the content of "svg" and "math" elements), in accordance with the
HTML5 standard.
In HTML content, "<![CDATA[...]]>" is now parsed as a bogus comment
which ends at the first ">".
In foreign content, RAWTEXT and RCDATA elements (such as "script",
"style", "title" or "textarea") are now parsed as normal elements.
The new "support_cdata" parameter of the constructor allows forcing
recognition of CDATA sections in any context (the previous behavior)
or in no context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@read-the-docs-community

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33438924 | 📁 Comparing bd53440 against main (ed370d3)

  🔍 Preview build  

3 files changed
± library/html.parser.html
± whatsnew/3.16.html
± whatsnew/changelog.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant