FastHTML

A C Node wrapping lexborisov's myhtml. Primarily used with FastSanitize.

  • Available as a hex package: {:fast_html, "~> 1.0"}
  • Documentation

Benchmarks

The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with Intel Core i7-3520M @ 2.90GHz CPU and 16GB of RAM. The mix fast_html.bench task can be used for running the benchmark by yourself.

File/Parserfast_html (C-Node)mochiweb_html (erlang)html5ever (Rust NIF)Myhtmlex (NIF)¹
document-large.html178.13 ms3471.70 ms799.20 ms402.64 ms
document-medium.html2.85 ms26.58 ms9.06 ms3.72 ms
document-small.html1.08 ms5.45 ms2.10 ms1.24 ms
fragment-large.html1.50 ms10.91 ms6.03 ms1.91 ms
fragment-small.html²434.64 μs83.02 μs57.97 μs311.39 μs
  1. Myhtmlex has a C-Node mode as well, but it wasn't benchmarked here because it segfaults on document-large.html
  2. The slowdown on fragment-small.html is due to C-Node overhead. Unlike html5ever and Myhtmlex in NIF mode, fast_html has the parser process isolated and communicates with it over the network, so even if a fatal crash in the parser happens, it won't bring down the entire VM.

Note about running with Swarm

Since the myhtml worker runs as a separate node, Swarm will try to sync with it. Of course it will fail since it's not a real Erlang node. To prevent it from doing that, you can add the following to your configuration:

config :swarm, node_blacklist: [~r/myhtml_.*$/]

Contribution / Bug Reports

  • Please make sure you do git submodule update after a checkout/pull
  • The project aims to be fully tested