Federated Republic of Sean is a user on retro.social. You can follow them or interact with them if you have an account anywhere in the fediverse.
Experiencing irrational feelings of jealousy watching rin and kaniini rewrite a fast html scrubber. Not sure it would be a reasonable thing for me to dive into. Alas.

@tedu @kaniini hopefully "fast" doesn't mean "foolable". Any scrubber should be super conservative with what it allows through and be unable to "fail open." If in doubt, throw it out!

Federated Republic of Sean @freakazoid

@kaniini @tedu I mostly mention this because if I needed to scrub HTML I'd probably do it the slow way, by using a maintained parser, a visitor that reconstructs the parse tree only pulling out info it understands (i.e. no blind recursion), and then re-serialize that. To do it the fast way I'd probably use an event-driven parser that tries as hard as possible to avoid any memory allocation or backtracking, possibly even operating in-place.

· Web · 1 · 0

@tedu @kaniini I guess one could make that reasonably safe by assuming reasonably valid HTML and bailing out completely if anything unexpected, perhaps even zeroing out the string if it operates in-place to make sure the caller can't go on and use it anyway.

@freakazoid @tedu that's basically what we did. the previous gold standard for html parsing used mochiweb, we replaced it with a parser from an actual browser.