View Source Renewex.Tokenizer (renewex v0.10.0)

This module implements a simple tokenizer for splitting a string into lexems that are needed to parse Renew *.rnw files.

Renew *.rnw files are text files containing a human readable serialization of Renew Java Objects.

For reading such a file the text content must first be split into tokens/lexems that are similar to Java tokens. In the original Renew Java implementation the Java tokenizer is used but the Renew file format does not make use of all the tokens defined by the Java language. Hence this module defines only a sub set Java syntax tokens.

*.rnw contain an object graph. Each node starts with the name of java class. This class name determines how the following tokens shall be parsed. Primitive values like integer, float or String are the leaf nodes. A *.rnw may contain cyclic references. These are represented by REF <int> tokens that represent a reference to a previously parsed object. The <int> is an index into the array of already parsed objects. For example REF 5 points to the fifth object that has occured while parsing the file.

A REF <int> token must not contain an integer that is larger that the number of already parsed objects, ie. no forward references are possible.

Summary

Functions

Converts a given string into a given type.

Splits a string it into tokens.

Takes a list of tokens and removes all tokens that are regarded as white space.

Converts token value of a given type back into the corresponding binary string.

Get the list of token types defined by the tokenizer.

Functions

Link to this function

cast_value(type, string)

View Source

Converts a given string into a given type.

Parameters

  • type: The type to convert the string into
  • string: The string to be converted

Returns

The string converted into the given type

Splits a string it into tokens.

The string is expected to be in the format of a Renew *.rnw file.

Parameters

  • input: The string to be tokenized

Returns

A stream of token tuples {type, value}. See Renewex.Tokenizer.token_types for a list of possible types.

Takes a list of tokens and removes all tokens that are regarded as white space.

Parameters

  • tokens: A stream or list of tokens.

Returns

A list of tokens with all whitespace tokens removed.

Link to this function

token_to_binary(type, value)

View Source

Converts token value of a given type back into the corresponding binary string.

Parameters

  • type: The type of the token
  • string: The the value of the token

Returns

A string representing the given token value

Get the list of token types defined by the tokenizer.

Returns

[:white,:float,:int,:boolean,:null,:ref,:class_name,:string] in order determined by the capture groups in the compiled regex