View Source Renewex.Tokenizer (renewex v0.13.0)
This module implements a simple tokenizer for splitting a string into lexems that are needed to parse Renew *.rnw
files.
Renew *.rnw
files are text files containing a human readable serialization of Renew Java Objects.
For reading such a file the text content must first be split into tokens/lexems that are similar to Java tokens. In the original Renew Java implementation the Java tokenizer is used but the Renew file format does not make use of all the tokens defined by the Java language. Hence this module defines only a sub set Java syntax tokens.
*.rnw
contain an object graph. Each node starts with the name of java class. This class name determines how the following tokens shall be
parsed. Primitive values like integer, float or String are the leaf nodes.
A *.rnw
may contain cyclic references. These are represented by REF <int>
tokens that represent a reference to a previously parsed object.
The <int> is an index into the array of already parsed objects. For example REF 5
points to the fifth object that has occured while parsing the file.
A REF <int>
token must not contain an integer that is larger that the number of already parsed objects, ie. no forward references are possible.
Summary
Functions
Converts a given string into a given type.
Splits a string it into tokens.
Takes a list of tokens and removes all tokens that are regarded as white space.
Converts token value of a given type back into the corresponding binary string.
Get the list of token types defined by the tokenizer.
Functions
Converts a given string into a given type.
Parameters
type
: The type to convert the string intostring
: The string to be converted
Returns
The string
converted into the given type
Splits a string it into tokens.
The string is expected to be in the format of a Renew *.rnw
file.
Parameters
input
: The string to be tokenized
Returns
A stream of token tuples {type, value}
. See Renewex.Tokenizer.token_types
for a list of possible types.
Takes a list of tokens and removes all tokens that are regarded as white space.
Parameters
tokens
: A stream or list of tokens.
Returns
A list of tokens with all whitespace tokens removed.
Converts token value of a given type back into the corresponding binary string.
Parameters
type
: The type of the tokenstring
: The the value of the token
Returns
A string representing the given token value
Get the list of token types defined by the tokenizer.
Returns
[:white,:float,:int,:boolean,:null,:ref,:class_name,:string] in order determined by the capture groups in the compiled regex