Html cleaner and sanitizer for Python projects and as standalone app
Contents
- python >= 2.5
- BeautifulSoup
html_cleaner.clear.clear_html_code(text)
Clean up HTML code from tags that are not allowed. Structure of allowed tags can be found at needs.cfg. clear.py is generated by html_cleaner/generator.py with needs.cfg as config file.
Simple usage:
from html_cleaner.clear import clear_html_code clear_html_code(""" <a href="/" title="test" alt="test">link</a> <javascript>alert(0);</javascript> """)
./generator.py
Will generate clear.py source code file, according to rules specified at needs.cfg. Example of simpler configuration file can be found in example.cfg.
Configuration file contains hierarchical rules for white-list of html cleaner.
For example look at example.cfg
and needs.cfg
(we use this one).
Development of html-cleaner happens at github: https://github.com/ProstoKsi/html-cleaner/
Copyright (C) 2009-2013 Illia Polosukhin, Vladyslav Frolov. This program is licensed under the MIT License (see LICENSE)