strip_js v0.9.2 StripJs View Source
StripJs is an Elixir module for stripping executable JavaScript from blocks of HTML and CSS.
It handles:
<script>...</script>
and<script src="..."></script>
tags- Event handler attributes such as
onclick="..."
javascript:...
URLs in HTML and CSS- CSS
expression(...)
directives - HTML entity attacks (like
<script>
)
Installation
Add strip_js
to your application's mix.exs
:
def application do
[applications: [:strip_js]]
end
def deps do
[{:strip_js, "~> 0.9.2"}]
end
Usage
clean_html/2
removes all JS vectors from an HTML string:
iex> html = "<button onclick=\"alert('pwnt')\">Hi!</button>"
iex> StripJs.clean_html(html)
"<button>Hi!</button>"
clean_css/2
removes all JS vectors from a CSS string:
iex> css = "body { background-image: url('javascript:alert()'); }"
iex> StripJs.clean_css(css)
"body { background-image: url('removed_by_strip_js:alert()'); }"
StripJs relies on the Floki
HTML parser library, which is built using
Mochiweb.
StripJs provides a clean_html_tree/1
function to strip JS from
Floki.parse/1
- and :mochiweb_html.parse/1
- style HTML parse trees.
Bugs and Limitations
The brokenness of invalid HTML may be amplified by clean_html/2
.
In uncommon cases, innocent CSS which very closely resembles
JS-injection techniques may be mangled by clean_css/2
.
StripJs may not block 100% of executable JavaScript, though it gets quite close. If you believe there are JS injection methods not covered by this library, please submit an issue with a test case!
Authorship and License
Copyright 2017, Appcues, Inc.
Project homepage: StripJs
StripJs is released under the MIT License.
Link to this section Summary
Functions
Removes JS vectors from the given CSS string; i.e., the contents of a
stylesheet or <style>
tag.
Removes JS vectors from the given HTML string.
Removes JS vectors from the given
Floki/
Mochiweb-style HTML tree
(html_tree/0
).
Link to this section Types
html_attr() View Source
html_node() View Source
html_tag()
View Source
html_tag() :: String.t()
html_tag() :: String.t()
html_tree() View Source
opts()
View Source
opts() :: Keyword.t()
opts() :: Keyword.t()
Link to this section Functions
clean_css(css, opts \\ []) View Source
Removes JS vectors from the given CSS string; i.e., the contents of a
stylesheet or <style>
tag.
Does not HTML-escape its output. Care is taken to maintain valid CSS syntax.
Example:
iex> css = "tt { background-color: expression('alert()'); }"
iex> StripJs.clean_css(css)
"tt { background-color: removed_by_strip_js('alert()'); }"
Warning: this step is performed using regexes, not a parser, so it is
possible for innocent CSS containing either of the strings javascript:
or expression(
to be mangled.
clean_html(html, opts \\ []) View Source
Removes JS vectors from the given HTML string.
All non-tag text and tag attribute values will be HTML-escaped, except
for the contents of <style>
tags, which are passed through clean_css/2
.
Even if the input HTML contained no JS, the output of clean_html/2
is not guaranteed to match its input byte-for-byte.
Examples:
iex> StripJs.clean_html("<button onclick=\"alert('phear');\">Click here</button>")
"<button>Click here</button>"
iex> StripJs.clean_html("<script> console.log('oh heck'); </script>")
""
iex> StripJs.clean_html("<script> console.log('oh heck'); </script>")
"<script> console.log('oh heck'); </script>" ## HTML entity attack didn't work
clean_html_tree(trees, opts \\ []) View Source
Removes JS vectors from the given
Floki/
Mochiweb-style HTML tree
(html_tree/0
).
All attribute values and tag bodies except embedded stylesheets will be HTML-escaped.