Searchex Readme
A full-text search engine written in pure Elixir.
This application is UNDER CONSTRUCTION - not yet ready for use.
About Searchex
Searchex provides a search capability for full-text documents. Example document types include:
- text, markdown, XML and JSON files
- source code files
- product descriptions
- blog and forum posts
- chat rooms and twitter feeds
- web pages
Quick Start
TBD
Searchex Architecture
A Searchex DOCUMENT
has two key elements:
document
META-DATA
, liketitle
,author_name
,publication_date
- the
FULL-TEXT
of the document
Searchex organizes documents into separate COLLECTIONS
. Each collection has
two main elements:
the
CATALOG
, a table-like structure that contains the document ID, meta-data, and document location.- the
INDEX
, an inverted index that is built for fast search and retrieval
Each collection is defined by a CONFIG
file, a yaml file that specifies
things like:
- document directories
- file types
- meta-data fields definitions and extraction regexes
- document separator regex (for multi-doc files)
Using Searchex
A searchex
command-line program can be used to manage config files, catalogs
and indexes, and to perform searches.
There is an API that Elixir developers can use to embed Searchex into their applications.
Why Erlang/Elixir
Text indexing and search can be incredibly compute intensive, and benefit massively from the concurrent/distributed capabilities of the BEAM.
Package Installation
Add searchex
to your list of dependencies in mix.exs
:
def deps do
[{:searchex, "~> 0.0.1-alpha.2"}]
end
Then run mix deps.get
Escript Installation
If you have Elixir 1.3+ enter this at the console:
mix escript.install https://raw.githubusercontent.com/andyl/stemex/master/stemex
Tab Completion
Get a tab-completion script by typing searchex completion
To install: copy this script to /etc/bash_completion.d
(or equivalent)
$ searchex completion > /etc/bash_completion.d/searchex_completion.bash
$ chmod a+rx /etc/bash_completion.d/searchex_completion.bash
Comparables and Reference Docs
http://blog.equanimity.nl/blog/2011/10/07/a-basic-full-text-search-server-in-erlang/
- http://himmele.blogspot.com/2011/04/build-your-own-internet-search-engine.html
System Overview
The search engine allows the establishment of one or more document collections. The source data from a document collection is a textual file which cataloged and indexed by Searchex.