Floki
This is a simple HTML parser that enables searching using CSS like selectors.
You can search elements by class, tag name and id.
Example
Assuming that you have the following HTML:
<!doctype html>
<html>
<body>
<section id="content">
<p class="headline">Floki</p>
<a href="http://github.com/philss/floki">Github page</a>
</section>
</body>
</html>
You can perform the following queries:
- Floki.find(html, “#content”) : returns the section with all children;
- Floki.find(html, “.headline”) : returns a list with the
p
element; - Floki.find(html, “a”) : returns a list with the
a
element.
Each HTML node is represented by a tuple like:
{tag_name, attributes, chidren_nodes}
Example of node:
{"p", [{"class", "headline"}], ["Floki"]}
So even if the only child node is the element text, it is represented inside a list.
You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:
html
|> Floki.find(".pages")
|> Floki.find("a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)
It is simple as that!
API
To parse a HTML document, try:
html = """
<html>
<body>
<div class="example"></div>
</body>
</html>
"""
Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}
To find elements with the class example
, try:
Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]
To fetch some attribute from elements, try:
Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]
You can also get attributes from elements that you already have:
Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]
If you want to get the text from an element, try:
Floki.find(html, ".headline")
|> Floki.text
# => "Floki"
License
Floki is under MIT license. Check the LICENSE
file for more details.