DuckDB can scan local files, remote object stores, and lakehouse table formats directly. QuackDB exposes small SQL builders around those features so paths, options, extension names, and credentials are formatted consistently.
Source helpers
QuackDB.Source builds DuckDB table-function fragments:
alias QuackDB.Source
Source.csv("events.csv", header: true, columns: %{id: "INTEGER", name: "VARCHAR"})
#=> "read_csv('events.csv', header = TRUE, columns = {'id': 'INTEGER', 'name': 'VARCHAR'})"Available helpers:
Source.parquet/2→read_parquet(...)Source.csv/2→read_csv(...)Source.json/2→read_json(...)Source.xlsx/2→read_xlsx(...)Source.delta/2→delta_scan(...)Source.iceberg/2→iceberg_scan(...)
Use Source.table_function/3 for a DuckDB table function that QuackDB does not wrap yet.
Ecto queries
Source fragments can be used as Ecto sources for analytical reads:
use QuackDB.Ecto
alias QuackDB.Source
source = Source.parquet("s3://bucket/events/*.parquet", hive_partitioning: true)
MyApp.AnalyticsRepo.all(
from event in source,
group_by: event.category,
select: %{category: event.category, events: count()}
)Materialize sources for FTS or repeated queries
DuckDB features such as FTS indexes operate on tables, so source scans are often materialized first:
use QuackDB.Ecto
alias QuackDB.{DDL, FTS, Source}
source = Source.parquet("s3://bucket/documents/*.parquet")
query =
from doc in source,
select: %{id: doc.id, title: doc.title, body: doc.body}
QuackDB.query!(conn, DDL.create_table("docs", as: query, temporary: true))
QuackDB.query!(conn, FTS.create_index("docs", :id, [:title, :body], overwrite: true))See the full-text search guide for the full BM25 search flow.
Direct SQL queries
The same source fragments can be composed into direct QuackDB SQL when that is clearer:
alias QuackDB.Source
source = Source.csv("events.csv", header: true)
QuackDB.query!(conn, [
"SELECT category, count(*) AS events FROM ",
source,
" GROUP BY category ORDER BY category"
])Local files
A local path works when it is visible to the DuckDB server process:
Source.parquet("/data/events/*.parquet")
Source.csv("file:///data/events.csv", header: true)When QuackDB is connected to a remote DuckDB server, paths are resolved on the server side, not in the Elixir VM. QuackDB does not automatically upload local files; use paths that the DuckDB server can see, or put data behind HTTP(S), object storage, Azure/ADLS, Hugging Face, or another DuckDB-supported source.
HTTP and HTTPS
DuckDB reads HTTP(S) files through the httpfs extension:
alias QuackDB.{Extension, Source}
QuackDB.query!(conn, Extension.install(:httpfs))
QuackDB.query!(conn, Extension.load(:httpfs))
source = Source.parquet("https://example.com/events.parquet")HTTP credentials are configured with DuckDB secrets:
alias QuackDB.Secret
QuackDB.query!(conn, Secret.create(:http, name: :api, bearer_token: token))
QuackDB.query!(
conn,
Secret.create(:http,
name: :headers,
extra_http_headers: %{"Authorization" => "Bearer #{token}"}
)
)S3-compatible stores
DuckDB's httpfs extension supports S3 and S3-compatible storage such as MinIO, lakeFS, Cloudflare R2, Tigris, and Google Cloud Storage interoperability endpoints.
alias QuackDB.{Extension, Secret, Source}
QuackDB.query!(conn, Extension.install(:httpfs))
QuackDB.query!(conn, Extension.load(:httpfs))
QuackDB.query!(conn, Secret.create(:s3, provider: :credential_chain))
source = Source.parquet("s3://bucket/events/*.parquet", hive_partitioning: true)Explicit S3 credentials:
Secret.create(:s3,
key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
secret: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
region: "us-east-1",
scope: "s3://bucket/prefix/"
)Cloudflare R2:
Secret.create(:r2,
account_id: System.fetch_env!("CLOUDFLARE_ACCOUNT_ID"),
key_id: System.fetch_env!("R2_ACCESS_KEY_ID"),
secret: System.fetch_env!("R2_SECRET_ACCESS_KEY")
)
Source.parquet("r2://bucket/events/*.parquet")Google Cloud Storage via DuckDB's GCS secret type:
Secret.create(:gcs,
key_id: System.fetch_env!("GCS_HMAC_KEY_ID"),
secret: System.fetch_env!("GCS_HMAC_SECRET")
)
Source.parquet("gcs://bucket/events/*.parquet")Azure Blob Storage and ADLS
DuckDB reads Azure storage through the azure extension:
alias QuackDB.{Extension, Secret, Source}
QuackDB.query!(conn, Extension.install(:azure))
QuackDB.query!(conn, Extension.load(:azure))
QuackDB.query!(
conn,
Secret.create(:azure, provider: :credential_chain, account_name: "storage_account")
)
Source.parquet("az://container/events/*.parquet")
Source.parquet("abfss://filesystem/events/*.parquet")Hugging Face datasets
DuckDB can scan Hugging Face dataset files with hf:// URLs through httpfs:
alias QuackDB.{Extension, Secret, Source}
QuackDB.query!(conn, Extension.install(:httpfs))
QuackDB.query!(conn, Extension.load(:httpfs))
source =
Source.parquet(
"hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet"
)Private or gated datasets can use a Hugging Face secret:
QuackDB.query!(conn, Secret.create(:huggingface, name: :hf_token, token: token))Lakehouse formats
DuckDB can scan Delta and Iceberg tables when the relevant extensions and storage credentials are configured:
alias QuackDB.Source
Source.delta("s3://bucket/delta/events")
Source.iceberg("s3://bucket/warehouse/events")Use Ecto or direct SQL around those fragments just like CSV and Parquet sources.