strip_js v0.1.0 StripJs

StripJs is an Elixir module for stripping executable JavaScript from blocks of HTML. It removes <script> tags, javascript:... links, and event handlers like onclick as follows:

  • <script>...</script> and <script src="..."></script> tags are removed entirely.

  • <a href="javascript:..."> is converted to <a href="#" data-href-javascript="...">.

  • Event handler attributes such as onclick="..." are converted to e.g., data-onclick="...".

Installation

Add strip_js to your application’s dependencies in mix.exs:

def deps do
  [{:strip_js, "~> 0.1.0"}]
end

Usage

strip_js/1 returns a copy of its input, with all JS removed.

iex> html = "<button onclick=\"alert('pwnt')\">Hi!</button>"
iex> StripJs.strip_js(html)
"<button data-onclick=\"alert('pwnt')\">Hi!</button>"

strip_js_with_status/1 performs the same function as strip_js/1, also returning a boolean indicating whether any JS was removed from the input.

iex> html = "<button onclick=\"alert('pwnt')\">Hi!</button>"
iex> StripJs.strip_js_with_status(html)
{"<button data-onclick=\"alert('pwnt')\">Hi!</button>", true}

StripJs relies on the Floki HTML parser library. StripJs provides a strip_js_from_tree/1 function to strip JS from Floki HTML parse trees.

Link to this section Summary

Functions

Returns a copy of the given HTML string with all JS removed

Returns a copy of the given Floki HTML tree with all JS removed

Returns a tuple containing a copy of the given HTML string with all JS removed, as well as a boolean that is true when there was JS present in the original HTML and false otherwise

Link to this section Functions

Link to this function strip_js(html)
strip_js(String.t) :: String.t

Returns a copy of the given HTML string with all JS removed.

Even if the input HTML contained no JS, it may not match the output byte-for-byte.

Link to this function strip_js_from_tree(tree)
strip_js_from_tree(Floki.html_tree) :: Floki.html_tree

Returns a copy of the given Floki HTML tree with all JS removed.

Link to this function strip_js_with_status(html)
strip_js_with_status(String.t) :: {String.t, boolean}

Returns a tuple containing a copy of the given HTML string with all JS removed, as well as a boolean that is true when there was JS present in the original HTML and false otherwise.

Even if the input HTML contained no JS, it may not match the output byte-for-byte.