View Source Lexical.Document.LineParser (lexical_shared v0.5.0)

A parser that parses a binary into Lexical.Document.Line records.

The approach taken by the parser is to first go through the binary to find out where the lines break, what their endings are and if the line is ascii. As we go through the binary, we store this information, and when we're done, go back and split up the binary using binary_slice. This performs 3x faster than iterating through the binary and collecting IOlists that represent each line.

I determines if a line is ascii (and what it really means is utf8 ascii) by checking to see if each byte is greater than 0 and less than 128. UTF-16 files won't be marked as ascii, which allows us to skip a lot of byte conversions later in the process.

Link to this section Summary

Functions

Parses the text into lines

Link to this section Functions

Link to this function

parse(text, starting_index)

View Source

Parses the text into lines

Parses the given text into lines, and uses starting_index as the first line's line number. Passing 0 as starting_index yields a zero-based collection, while passing 1 yields a 1-based collection.