Delimit.Schema (delimit v0.4.1)

View Source

Defines schema structures and functions for working with delimited data.

This module handles schema definitions, data type conversions, and transformations between delimited data and Elixir structs.

Summary

Types

Options for schema handling.

t()

Schema definition structure.

Functions

Adds an embedded schema to the parent schema.

Adds a field to the schema.

Default delimiter used by canonical_string/3 and row_hash/3.

Returns a stable string encoding of a struct based on its schema.

Returns a flat list of {display_name, Field.t()} tuples for all leaf fields, including fields from flattened embeds.

Gets field names in order of definition.

Returns a list of {Field.t(), start_offset, width} tuples using cumulative offsets.

Gets the header prefix for an embedded field.

Gets all embedded fields defined in the schema.

Gets a field by name.

Gets the headers for the schema.

Creates a new schema definition.

Returns a binary cryptographic hash of a struct's canonical encoding.

Converts a struct or map to a row of values based on the schema.

Converts a row of data to a struct based on the schema.

Builds a struct (including embeds) from a flat list of raw string values.

Converts a row of data to a struct based on the schema, using headers for field mapping.

Converts a field type to an Elixir typespec.

Validates that all fields (including flattened embed fields) have a positive integer width: option.

Types

schema_options()

@type schema_options() :: [
  delimiter: String.t(),
  skip_lines: non_neg_integer(),
  skip_while: (String.t() -> boolean()),
  trim_fields: boolean(),
  nil_on_empty: boolean(),
  line_ending: String.t(),
  format: atom()
]

Options for schema handling.

  • :delimiter - Field delimiter character (default: comma)
  • :skip_lines - Number of lines to skip at beginning of file
  • :skip_while - Function to determine which lines to skip
  • :trim_fields - Whether to trim whitespace from fields (default: true)
  • :nil_on_empty - Convert empty strings to nil (default: true)
  • :line_ending - Line ending character(s) for output
  • :format - Predefined format (:csv, :tsv, :psv) that sets appropriate options

t()

@type t() :: %Delimit.Schema{
  embeds: %{required(atom()) => module()},
  fields: [Delimit.Field.t()],
  module: module(),
  options: schema_options()
}

Schema definition structure.

  • :module - The module associated with the schema
  • :fields - List of field definitions
  • :options - Additional options for the schema
  • :embeds - Map of module references for embedded schemas

Functions

add_embed(schema, name, module, opts \\ [])

@spec add_embed(t(), atom(), module(), Keyword.t()) :: t()

Adds an embedded schema to the parent schema.

Parameters

  • schema - The parent schema to add the embedded schema to
  • name - The name for the embedded schema as an atom
  • module - The module defining the embedded schema
  • opts - Options for the embedded schema

Returns

  • Updated schema structure

add_field(schema, name, type, opts \\ [])

@spec add_field(t(), atom(), atom(), Keyword.t()) :: t()

Adds a field to the schema.

Parameters

  • schema - The schema to add the field to
  • name - The name of the field as an atom
  • type - The type of the field (:string, :integer, etc.)
  • opts - Options for the field

Returns

  • Updated schema structure

canonical_delimiter()

Default delimiter used by canonical_string/3 and row_hash/3.

ASCII Unit Separator (0x1F) — chosen because it is highly unlikely to appear in real-world delimited file content, so the canonical encoding remains unambiguous regardless of the file's actual delimiter.

canonical_string(schema, struct_or_map, opts \\ [])

@spec canonical_string(t(), struct() | map(), Keyword.t()) :: String.t()

Returns a stable string encoding of a struct based on its schema.

The encoding is deterministic for a given schema and struct content:

  • Fields appear in schema definition order.
  • Each field's value is encoded as it would be written to a file (using configured format: / formats: / write_fn, etc.).
  • nil values encode as the empty string.
  • Embedded schemas contribute their own canonical encoding recursively (in their declared schema order, no prefix).
  • Derived field types (:row_hash, :raw_row) are skipped — their values come from the parsed source row, not from canonical state.

Options

  • :delimiter — the separator between encoded field values. Defaults to Delimit.Schema.canonical_delimiter/0 (ASCII Unit Separator). Use delimiter: "|" if you want a readable form (at the cost of ambiguity if any field value contains the chosen delimiter).

Example

iex> %MyApp.Person{first_name: "Alice", age: 30}
...> |> MyApp.Person.canonical_string()
"Alice<US>30"

collect_all_fields(schema)

@spec collect_all_fields(t()) :: [{atom(), Delimit.Field.t()}]

Returns a flat list of {display_name, Field.t()} tuples for all leaf fields, including fields from flattened embeds.

For regular fields, display_name is the field name atom. For embed fields, display_name includes the embed prefix (e.g., :billing_address_street).

field_names(schema)

@spec field_names(t()) :: [atom()]

Gets field names in order of definition.

Parameters

  • schema - The schema definition

Returns

  • List of field names as atoms

field_widths(schema)

@spec field_widths(t()) :: [{Delimit.Field.t(), non_neg_integer(), pos_integer()}]

Returns a list of {Field.t(), start_offset, width} tuples using cumulative offsets.

Used by the fixed-width reader to slice lines into field values.

get_embed_prefix(field, default_prefix \\ nil)

@spec get_embed_prefix(Delimit.Field.t(), String.t() | nil) :: String.t()

Gets the header prefix for an embedded field.

Parameters

  • field - The embedded field definition
  • default_prefix - Default prefix to use if none specified

Returns

  • String prefix to use for field headers

get_embeds(schema)

@spec get_embeds(t()) :: [Delimit.Field.t()]

Gets all embedded fields defined in the schema.

Parameters

  • schema - The schema definition

Returns

  • List of embedded field definitions

get_field(schema, name)

@spec get_field(t(), atom()) :: Delimit.Field.t() | nil

Gets a field by name.

Parameters

  • schema - The schema definition
  • name - The field name to find

Returns

  • The field definition or nil if not found

headers(schema, prefix \\ nil)

@spec headers(t(), String.t() | nil) :: [String.t()]

Gets the headers for the schema.

Parameters

  • schema - The schema definition
  • prefix - Optional prefix to apply to all headers

Returns

  • List of header strings

Example

iex> schema = Delimit.Schema.new(MyApp.Person)
iex> schema = Delimit.Schema.add_field(schema, :name, :string)
iex> schema = Delimit.Schema.add_field(schema, :age, :integer)
iex> Delimit.Schema.headers(schema)
["name", "age"]

iex> Delimit.Schema.headers(schema, "person_")
["person_name", "person_age"]

new(module, options \\ [])

@spec new(module(), schema_options()) :: t()

Creates a new schema definition.

Parameters

  • module - The module associated with the schema
  • options - Options for the schema

Returns

  • A new schema structure

populate_derived(schema, struct_or_map, row)

@spec populate_derived(t(), struct() | map(), [String.t()]) :: struct() | map()

row_hash(schema, struct_or_map, opts \\ [])

@spec row_hash(t(), struct() | map(), Keyword.t()) :: binary()

Returns a binary cryptographic hash of a struct's canonical encoding.

Options

  • :algorithm — hash algorithm passed to :crypto.hash/2. Default :sha256.
  • :truncate — bytes to truncate to. Default 16. nil means no truncation.

See canonical_string/3 for the encoding rules.

to_row(schema, struct_or_map)

@spec to_row(t(), struct() | map()) :: [String.t()]

Converts a struct or map to a row of values based on the schema.

Parameters

  • schema - The schema definition
  • struct_or_map - A struct or map containing field values

Returns

  • A list of field values

Examples

iex> schema = Delimit.Schema.new(MyApp.Person)
iex> schema = Delimit.Schema.add_field(schema, :name, :string)
iex> Delimit.Schema.to_row(schema, %{name: "John Doe"})
["John Doe"]

to_struct(schema, row, opts \\ [])

@spec to_struct(t(), [String.t()], Keyword.t()) :: struct()

Converts a row of data to a struct based on the schema.

Parameters

  • schema - The schema definition
  • row - A list of field values or a map of field name/values

Returns

  • A struct based on the schema with field values

Example

iex> schema = Delimit.Schema.new(MyApp.Person)
iex> schema = Delimit.Schema.add_field(schema, :name, :string)
iex> schema = Delimit.Schema.add_field(schema, :age, :integer)
iex> Delimit.Schema.to_struct(schema, ["John Doe", "42"])
%MyApp.Person{name: "John Doe", age: 42}

to_struct_from_flat_values(schema, values, opts \\ [])

@spec to_struct_from_flat_values(t(), [String.t() | nil], Keyword.t()) :: struct()

Builds a struct (including embeds) from a flat list of raw string values.

This is needed for fixed-width format where fields are position-based rather than header-based. Uses Field.parse_value/2 for each value.

to_struct_with_headers(schema, row, headers, opts \\ [])

@spec to_struct_with_headers(t(), [String.t()], [String.t()], Keyword.t()) :: struct()

Converts a row of data to a struct based on the schema, using headers for field mapping.

Parameters

  • schema - The schema definition
  • row - A list of field values
  • headers - A list of header strings matching the row fields
  • opts - Additional options for processing

Returns

  • A struct based on the schema with field values

type_to_typespec(type)

@spec type_to_typespec(atom() | tuple()) :: Macro.t()

Converts a field type to an Elixir typespec.

This function is used to convert field types to proper Elixir typespecs for use in @type definitions.

Parameters

  • type - The field type or a tuple with more specific type information

Returns

  • An Elixir typespec expression

Example

iex> Delimit.Schema.type_to_typespec(:string)
quote do: String.t()

iex> Delimit.Schema.type_to_typespec({:list, :string})
quote do: [String.t()]

validate_fixed_width!(schema)

@spec validate_fixed_width!(t()) :: :ok

Validates that all fields (including flattened embed fields) have a positive integer width: option.

Raises ArgumentError if any field is missing width: or has a non-positive width.