View Source Lexical.Document.LineParser (lexical_shared v0.5.0)
A parser that parses a binary into Lexical.Document.Line
records.
The approach taken by the parser is to first go through the binary to find out where
the lines break, what their endings are and if the line is ascii. As we go through the
binary, we store this information, and when we're done, go back and split up the binary
using binary_slice
. This performs 3x faster than iterating through the binary and collecting
IOlists that represent each line.
I determines if a line is ascii (and what it really means is utf8 ascii) by checking to see if each byte is greater than 0 and less than 128. UTF-16 files won't be marked as ascii, which allows us to skip a lot of byte conversions later in the process.
Link to this section Summary
Functions
Parses the text into lines
Link to this section Functions
Parses the text into lines
Parses the given text into lines, and uses starting_index
as the first line's line number.
Passing 0 as starting_index yields a zero-based collection, while passing 1 yields a 1-based
collection.