View Source ExEtlFramework
ExEtlFramework is a powerful and flexible ETL (Extract, Transform, Load) framework built in Elixir. It simplifies the process of creating robust data processing pipelines with built-in support for validation, error handling, and performance monitoring.
Features
- Modular Pipeline Structure: Easy-to-define ETL steps using a simple DSL
- Flexible Data Validation: Schema-based validation with built-in and custom validators
- Error Handling Strategies: Choose between fail-fast or error collection approaches
- Telemetry Integration: Built-in performance measurement and reporting
- Extensible: Easy to add custom steps and validators
Installation
Add ex_etl_framework
to your list of dependencies in mix.exs
:
def deps do
[
{:ex_etl_framework, "~> 0.1.0"}
]
end
Usage
Defining a Pipeline
Create a module for your pipeline and use the ExEtlFramework.Pipeline
macro:
defmodule MyPipeline do
use ExEtlFramework.Pipeline
step :extract do
# Extraction logic
{:ok, %{data: [1, 2, 3]}}
end
step :transform do
# Transformation logic
{:ok, %{data: [2, 4, 6]}}
end
step :load do
# Loading logic
{:ok, %{result: "Data loaded successfully"}}
end
# Optional: Define validation for each step
def validate_extract(data) do
schema = %{
data: [&ExEtlFramework.Validator.required/1, &ExEtlFramework.Validator.type(List)]
}
ExEtlFramework.Validator.validate(data, schema)
end
end
Running a Pipeline
Execute your pipeline with optional error handling strategy:
result = MyPipeline.run(%{initial: "data"}, error_strategy: :collect_errors)
Key Components
Pipeline
The core module for defining and executing ETL steps. It provides:
- A DSL for defining pipeline steps
- Automatic error handling
- Integration with the validation system
Validator
A flexible data validation system:
- Define validation schemas with built-in and custom validators
- Easy to use in pipeline steps
- Supports complex data structures
Example of a validation schema:
schema = %{
name: [&ExEtlFramework.Validator.required/1, &ExEtlFramework.Validator.type(String)],
age: [&ExEtlFramework.Validator.type(Integer)],
email: [&ExEtlFramework.Validator.required/1, &custom_email_validator/1]
}
Telemetry Integration
Built-in performance monitoring using Telemetry:
- Automatically measures duration of pipeline runs and individual steps
- Tracks errors in pipelines
- Easy to integrate with your preferred monitoring solution
Advanced Usage
Custom Validators
Create custom validation functions:
def custom_email_validator(value) do
if String.contains?(value, "@") do
:ok
else
{:error, "Invalid email format"}
end
end
Error Handling Strategies
Choose between two error handling strategies:
:fail_fast
: Stops the pipeline at the first error:collect_errors
: Continues processing and collects all errors
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.