ChromicPDF (ChromicPDF v1.1.1) View Source
ChromicPDF is a fast HTML-to-PDF/A renderer based on Chrome & Ghostscript.
Usage
Start
Start ChromicPDF as part of your supervision tree:
def MyApp.Application do
def start(_type, _args) do
children = [
# other apps...
{ChromicPDF, chromic_pdf_opts()}
]
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
end
defp chromic_pdf_opts do
[]
end
end
Print a PDF or PDF/A
ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: "output.pdf")
PDF printing comes with a ton of options. Please see ChromicPDF.print_to_pdf/2
and
ChromicPDF.convert_to_pdfa/2
for details.
Security Considerations
Before adding a browser to your application's (perhaps already long) list of dependencies, you may want consider the security hints below.
Escape user-supplied data
Make sure to escape any user-provided data with something like Phoenix.HTML.html_escape
.
Chrome is designed to make displaying HTML pages relatively safe, in terms of preventing
undesired access of a page to the host operating system. However, the attack surface of your
application is still increased. Running this in a containerized application with a small RPC
interface creates an additional barrier (and has other benefits).
Running in offline mode
For some apparent security bonus, browser targets can be spawned in "offline mode" (using the
DevTools command Network.emulateNetworkConditions
.
Chrome targets with network conditions set to offline
can't resolve any external URLs (e.g.
https://
), neither entered as navigation URL nor contained within the HTML body.
def chromic_pdf_opts do
[offline: true]
end
Chrome Sandbox in Docker containers
By default, ChromicPDF will allow Chrome to make use of its own "sandbox" process jail. The sandbox tries to limit system resource access of the renderer processes to the minimum resources they require to perform their task.
However, in Docker containers running Linux images (e.g. images based on Alpine), and which are configured to run their main job as a non-root user, this causes Chrome to crash on startup as it requires root privileges to enter the sandbox.
The error output (discard_stderr: false
option) looks as follows:
Failed to move to new namespace: PID namespaces supported, Network namespace supported,
but failed: errno = Operation not permitted
The best way to resolve this issue is to configure your Docker container to use seccomp rules that grant Chrome access to the relevant system calls. See the excellent Zenika/alpine-chrome repository for details on how to make this work.
Alternatively, you may choose to disable Chrome's sandbox with the no_sandbox
option.
defp chromic_pdf_opts do
[no_sandbox: true]
end
SSL connections
In you are fetching your print source from a https://
URL, as usual Chrome verifies the
remote host's SSL certificate when establishing the secure connection, and errors out of
navigation if the certificate has expired or is not signed by a known certificate authority
(i.e. no self-signed certificates).
For production systems, this security check is essential and should not be circumvented.
However, if for some reason you need to bypass certificate verification in development or test,
you can do this with the :ignore_certificate_errors
option.
defp chromic_pdf_opts do
[ignore_certificate_errors: true]
end
Worker pools
ChromicPDF spawns two worker pools, the session pool and the ghostscript pool. By default, it will create as many sessions (browser tabs) as schedulers are online, and allow the same number of concurrent Ghostscript processes to run.
Concurrency
To increase or limit the number of concurrent workers, you can pass pool configuration to the supervisor. Please note that these are non-queueing worker pools. If you intend to max them out, you will need a job queue as well.
defp chromic_pdf_opts do
[
session_pool: [size: 3]
ghostscript_pool: [size: 10]
]
end
Operation timeouts
By default, ChromicPDF allows the print process to take 5 seconds to finish. In case you are
printing large PDFs and run into timeouts, these can be configured configured by passing the
timeout
option to the session pool.
defp chromic_pdf_opts do
[
session_pool: [timeout: 10_000] # in milliseconds
]
end
Automatic session restarts to avoid memory drain
By default, ChromicPDF will restart sessions within the Chrome process after 1000 operations.
This helps to prevent infinite growth in Chrome's memory consumption. The "max age" of a session
can be configured with the :max_session_uses
option.
defp chromic_pdf_opts do
[max_session_uses: 1000]
end
Chrome zombies
Help, a Chrome army tries to take over my memory!
ChromicPDF tries its best to gracefully close the external Chrome process when its supervisor is terminated. Unfortunately, when the BEAM is not shutdown gracefully, Chrome processes will keep running. While in a containerized production environment this is unlikely to be of concern, in development it can lead to unpleasant performance degradation of your operation system.
In particular, the BEAM is not shutdown properly…
- when you exit your application or
iex
console with the Ctrl+C abort mechanism (see issue #56), - and when you run your tests. No, after an ExUnit run your application's supervisor is not terminated cleanly.
There are a few ways to mitigate this issue.
"On Demand" mode
In case you habitually end your development server with Ctrl+C, you should consider enabling "On Demand" mode which disables the session pool, and instead starts and stops Chrome instances as needed. If multiple PDF operations are requested simultaneously, multiple Chrome processes will be launched (each with a pool size of 1, disregarding the pool configuration).
defp chromic_pdf_opts do
[on_demand: true]
end
To enable it only for development, you can load the option from the application environment.
# config/config.exs
config :my_app, ChromicPDF, on_demand: false
# config/dev.exs
config :my_app, ChromicPDF, on_demand: true
# application.ex
@chromic_pdf_opts Application.compile_env!(:my_app, ChromicPDF)
defp chromic_pdf_opts do
@chromic_pdf_opts ++ [... other opts ...]
end
Terminating your supervisor after your test suite
You can enable "On Demand" mode for your tests, as well. However, please be aware that each test that prints a PDF will have an increased runtime (plus about 0.5s) due to the added Chrome boot time cost. Luckily, ExUnit provides a method to run code at the end of your test suite.
# test/test_helper.exs
ExUnit.after_suite(fn _ -> Supervisor.stop(MyApp.Supervisor) end)
ExUnit.start()
Only start ChromicPDF in production
The easiest way to prevent Chrome from spawning in development is to only run ChromicPDF in
the prod
environment. However, obviously you won't be able to print PDFs in development or
test then.
Chrome Options
Custom command line switches
The :chrome_args
option allows to pass arbitrary options to the Chrome/Chromium executable.
defp chromic_pdf_opts do
[chrome_args: "--font-render-hinting=none"]
end
The :chrome_executable
option allows to specify a custom Chrome/Chromium executable.
defp chromic_pdf_opts do
[chrome_executable: "/usr/bin/google-chrome-beta"]
end
Debugging Chrome errors
Chrome's stderr logging is silently discarded to not obscure your logfiles. In case you would
like to take a peek, add the discard_stderr: false
option.
defp chromic_pdf_opts do
[discard_stderr: false]
end
Telemetry support
To provide insights into PDF and PDF/A generation performance, ChromicPDF executes the following telemetry events:
[:chromic_pdf, :print_to_pdf, :start | :stop | exception]
[:chromic_pdf, :capture_screenshot, :start | :stop | :exception]
[:chromic_pdf, :convert_to_pdfa, :start | :stop | exception]
Please see :telemetry.span/3
for
details on their payloads, and :telemetry.attach/4
for how to attach to them.
Each of the corresponding functions accepts a telemetry_metadata
option which is passed to
the attached event handler. This can, for instance, be used to mark events with custom tags such
as the type of the print document.
ChromicPDF.print_to_pdf(..., telemetry_metadata: %{template: "invoice"})
The print_to_pdfa
function emits both the print_to_pdf
and convert_to_pdfa
event series,
in that order.
How it works
PDF Printing
- ChromicPDF spawns an instance of Chromium/Chrome (an OS process) and connects to its "DevTools" channel via file descriptors.
- The Chrome process is supervised and the connected processes will automatically recover if it crashes.
- A number of "targets" in Chrome are spawned, 1 per worker process in the
SessionPool
. By default, ChromicPDF will spawn each session in a new browser context (i.e., a profile). - When a PDF print is requested, a session will instruct its assigned "target" to navigate to
the given URL, then wait until it receives a "frameStoppedLoading" event, and proceed to call
the
printToPDF
function. - The printed PDF will be sent to the session as Base64 encoded chunks.
PDF/A Conversion
- To convert a PDF to a PDF/A-3, ChromicPDF uses the ghostscript utility.
- Since it is required to embed a color scheme into PDF/A files, ChromicPDF ships with a copy
of the royalty-free
eciRGB_V2
scheme by the European Color Initiative. If you need to be able to use a different color scheme, please open an issue.
Link to this section Summary
Functions
Captures a screenshot.
Returns a specification to start this module as part of a supervision tree.
Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).
Prints a PDF.
Prints a PDF and converts it to PDF/A in a single call.
Starts ChromicPDF.
Link to this section Types
Specs
blob() :: iodata()
Specs
capture_screenshot_option() :: {:capture_screenshot, map()} | navigate_option() | output_option() | telemetry_metadata_option()
Specs
evaluate_option() :: {:evaluate, %{expression: binary()}}
Specs
ghostscript_pool_option() :: {:size, non_neg_integer()}
Specs
global_option() :: {:offline, boolean()} | {:max_session_uses, non_neg_integer()} | {:session_pool, [session_pool_option()]} | {:no_sandbox, boolean()} | {:discard_stderr, boolean()} | {:chrome_args, binary()} | {:chrome_executable, binary()} | {:ignore_certificate_errors, boolean()} | {:ghostscript_pool, [ghostscript_pool_option()]} | {:on_demand, boolean()}
Specs
Specs
output_function() :: (blob() -> output_function_result())
Specs
output_function_result() :: any()
Specs
output_option() :: {:output, binary()} | {:output, output_function()}
Specs
path() :: binary()
Specs
pdf_option() :: {:print_to_pdf, map()} | navigate_option() | output_option() | telemetry_metadata_option()
Specs
pdfa_option() :: {:pdfa_version, binary()} | {:pdfa_def_ext, binary()} | info_option() | output_option() | telemetry_metadata_option()
Specs
return() :: :ok | {:ok, binary()} | {:ok, output_function_result()}
Specs
session_pool_option() :: {:size, non_neg_integer()} | {:timeout, timeout()}
Specs
Specs
source_and_options() :: %{source: source(), opts: [pdf_option()]}
Specs
telemetry_metadata_option() :: {:telemetry_metadata, map()}
Specs
url() :: binary()
Specs
Link to this section Functions
Specs
capture_screenshot(url :: source(), opts :: [capture_screenshot_option()]) :: return()
Captures a screenshot.
This call blocks until the screenshot has been created.
Print and return Base64-encoded PNG
{:ok, blob} = ChromicPDF.capture_screenshot({:url, "file:///example.html"})
Options
Options to the Page.captureScrenshot
call can be passed by passing a map to the :capture_screenshot
option.
ChromicPDF.capture_screenshot(
{:url, "file:///example.html"},
capture_screenshot: %{
format: "jpeg"
}
)
For navigational options (source, cookies, evaluating scripts) see print_to_pdf/2
.
Specs
child_spec([global_option()]) :: Supervisor.child_spec()
Returns a specification to start this module as part of a supervision tree.
Specs
convert_to_pdfa(pdf_path :: path(), opts :: [pdfa_option()]) :: return()
Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).
Convert an input PDF and return a Base64-encoded blob
{:ok, blob} = ChromicPDF.convert_to_pdfa("some_pdf_file.pdf")
Convert and write to file
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", output: "output.pdf")
PDF/A versions & levels
Ghostscript supports both PDF/A-2 and PDF/A-3 versions, both in their b
(basic) level. By
default, ChromicPDF generates version PDF/A-3b files. Set the pdfa_version
option for
version 2.
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", pdfa_version: "2")
Specifying PDF metadata
The converter is able to transfer PDF metadata (the Info
dictionary) from the original
PDF file to the output file. However, files printed by Chrome do not contain any metadata
information (except "Creator" being "Chrome").
The :info
option of the PDF/A converter allows to specify metatadata for the output file
directly.
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", info: %{creator: "ChromicPDF"})
The converter understands the following keys, all of which accept only String values:
:title
:author
:subject
:keywords
:creator
:creation_date
:mod_date
By specification, date values in :creation_date
and :mod_date
do not need to follow a
specific syntax. However, Ghostscript inserts date strings like "D:20200208153049+00'00'"
and Info extractor tools might rely on this or another specific format. The converter will
automatically format given DateTime
values like this.
Both :creation_date
and :mod_date
are filled with the current date automatically (by
Ghostscript), if the original file did not contain any.
Adding more PostScript to the conversion
The pdfa_def_ext
option can be used to feed more PostScript code into the final conversion
step.
ChromicPDF.convert_to_pdfa(
"some_pdf_file.pdf",
pdfa_def_ext: "[/Title (OverriddenTitle) /DOCINFO pdfmark",
)
Specs
print_to_pdf(input :: source() | source_and_options(), opts :: [pdf_option()]) :: return()
Prints a PDF.
This call blocks until the PDF has been created.
Output options
Print and return Base64-encoded PDF
{:ok, blob} = ChromicPDF.print_to_pdf({:url, "file:///example.html"})
# Can be displayed in iframes
"data:application/pdf;base64,\#{blob}"
Print to file
:ok = ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: "output.pdf")
Print to temporary file
{:ok, :some_result} =
ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: fn path ->
send_download(path)
:some_result
end)
The temporary file passed to the callback will be deleted when the callback returns.
Input options
ChromicPDF offers two primary methods of supplying Chrome with the HTML source to print. You can choose between passing in an URL for Chrome to load and injecting the HTML markup directly into the DOM through the remote debugging API.
Print from URL
Passing in a URL is the simplest way of printing a PDF. A target in Chrome is told to navigate to the given URL. When navigation is finished, the PDF is printed.
ChromicPDF.print_to_pdf({:url, "file:///example.html"})
ChromicPDF.print_to_pdf({:url, "http:///example.net"})
ChromicPDF.print_to_pdf({:url, "https:///example.net"})
Cookies
If your URL requires authentication, you can pass in a session cookie. The cookie is automatically cleared after the PDF has been printed.
cookie = %{
name: "foo",
value: "bar",
domain: "localhost"
}
ChromicPDF.print_to_pdf({:url, "http:///example.net"}, set_cookie: cookie)
See Network.setCookie
for options. name
and value
keys are required.
Print from in-memory HTML
Alternatively, print_to_pdf/2
allows to pass an in-memory HTML blob to Chrome in a
{:html, blob()}
tuple. The HTML is sent to the target using the Page.setDocumentContent
function. Oftentimes this method is preferable over printing a URL if you intend to render
PDFs from templates rendered within the application that also hosts ChromicPDF, without the
need to route the content through an actual HTTP endpoint. Also, this way of passing the
HTML source has slightly better performance than printing a URL.
ChromicPDF.print_to_pdf(
{:html, "<h1>Hello World!</h1>"}
)
In-memory content can be iodata
In-memory HTML for both the main input parameter as well as the header and footer options
can be passed as iodata
. Such lists
are converted to String before submission to the session process by passing them through
:erlang.iolist_to_binary/1
.
ChromicPDF.print_to_pdf(
{:html, ["<style>p { color: green; }</style>", "<p>green paragraph</p>"]}
)
Caveats
Please mind the following caveats.
References to external files in HTML source
Please note that since the document content is replaced without navigating to a URL, Chrome has no way of telling which host to prepend to relative URLs contained in the source. This means, if your HTML contains markup like
<!-- BAD: relative link to stylesheet in <head> element -->
<head>
<link rel="stylesheet" href="selfhtml.css">
</head>
<!-- BAD: relative link to image -->
<img src="some_logo.png">
... you will need to replace these lines with either absolute URLs or inline data.
Of course, absolute URLs can use the file://
scheme to point to files on the local
filesystem, assuming Chrome has access to them. For the purpose of displaying small
inline images (e.g. logos), data URLs
are a good way of embedding them without the need for an absolute URL.
<!-- GOOD: inline styles -->
<style>
/* ... */
</style>
<!-- GOOD: data URLs -->
<img src="...">
<!-- GOOD: absolute URLs -->
<img src="http://localhost/path/to/image.png">
<img src="file:///path/to/image.png">
Content from Phoenix templates
If your content is generated by a Phoenix template (and hence comes in the form of
{:safe, iodata()}
), you will need to pass it to Phoenix.HTML.safe_to_string/1
first.
content = SomeView.render("body.html") |> Phoenix.HTML.safe_to_string()
ChromicPDF.print_to_pdf({:html, content})
PDF printing options
ChromicPDF.print_to_pdf(
{:url, "file:///example.html"},
print_to_pdf: %{
# Margins are in given inches
marginTop: 0.393701,
marginLeft: 0.787402,
marginRight: 0.787402,
marginBottom: 1.1811,
# Print header and footer (on each page).
# This will print the default templates if none are given.
displayHeaderFooter: true,
# Even on empty string.
# To disable header or footer, pass an empty element.
headerTemplate: "<span></span>",
# Example footer template.
# They are completely unstyled by default and have a font-size of zero,
# so don't despair if they don't show up at first.
# There's a lot of documentation online about how to style them properly,
# this is just a basic example. Also, take a look at the documentation for the
# ChromicPDF.Template module.
# The <span> classes shown below are interpolated by Chrome.
footerTemplate: """
<style>
p {
color: #333;
font-size: 10pt;
text-align: right;
margin: 0 0.787402in;
width: 100%;
z-index: 1000;
}
</style>
<p>
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</p>
"""
}
)
Please note the camel-case. For a full list of options to the printToPDF
function,
please see the Chrome documentation at:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-printToPDF
Page size and margins
Chrome will use the provided pagerWidth
and paperHeight
dimensions as the PDF paper
format. Please be aware that the @page
section in the body CSS is not correctly
interpreted, see ChromicPDF.Template
for a discussion.
Header and footer
Chrome's support for native header and footer sections is a little bit finicky. Still, to the best of my knowledge, Chrome is currently the only well-functioning solution for HTML-to-PDF conversion if you need headers or footers that are repeated on multiple pages even in the presence of body elements stretching across a page break.
In order to make header and footer visible in the first place, you will need to be aware of a couple of caveats:
HTML for header and footer is interpreted in a new page context which means no body styles will be applied. In fact, even default browser styles are not present, so all content will have a default
font-size
of zero, and so on.You need to make space for the header and footer templates first, by adding page margins. Margins can either be given using the
marginTop
andmarginBottom
options or with CSS styles. If you use the options, the height of header and footer elements will inherit these values. If you use CSS styles, make sure to set the height of the elements in CSS as well.Header and footer have a default padding to the page ends of 0.4 centimeters. To remove this, add the following to header/footer template styles (source).
#header, #footer { padding: 0 !important; }
Header and footer have a default
zoom
level of 1/0.75 so everything appears to be smaller than in the body when the same styles are applied.If header or footer are not displayed even though they should, make sure your HTML is valid. Tuning the margins for an hour looking for mistakes there, only to discover that you are missing a closing
</style>
tag, can be quite painful.Javascript is not interpreted.
Background colors are not applied unless you include
-webkit-print-color-adjust: exact
in your stylesheet.
See print_header_footer_template.html
from the Chromium sources to see how these values are interpreted.
Dynamic Content
Evaluate script before printing
In case your print source is generated by client-side scripts, for instance to render graphics or load additional resources, you can trigger these by evaluating a JavaScript expression before the PDF is printed.
evaluate = %{
expression: """
document.querySelector('body').innerHTML = 'hello world';
"""
}
ChromicPDF.print_to_pdf({:url, "http://example.net"}, evaluate: evaluate)
If your script returns a Promise, Chrome will wait for it to be resolved.
Wait for attribute on element
Some JavaScript libraries signal their successful initialization to the user by setting an
attribute on a DOM element. The wait_for
option allows you to wait for this attribute to
be set before printing. It evaluates a script that repeatedly queries the element given by
the query selector and tests whether it has the given attribute.
wait_for = %{
selector: "#my-element",
attribute: "ready-to-print"
}
ChromicPDF.print_to_pdf({:url, "http:///example.net"}, wait_for: wait_for)
Specs
print_to_pdfa( input :: source() | source_and_options(), opts :: [pdf_option() | pdfa_option()] ) :: return()
Prints a PDF and converts it to PDF/A in a single call.
See print_to_pdf/2
and convert_to_pdfa/2
for options.
Example
ChromicPDF.print_to_pdfa({:url, "https://example.net"})
Specs
start_link([global_option()]) :: Supervisor.on_start() | Agent.on_start()
Starts ChromicPDF.
If the given config includes the on_demand: true
flag, this will instead spawn an
Agent process that holds this configuration until a PDF operation is triggered which
will then launch a supervisor temporarily, process the operation, and proceed to perform
a graceful shutdown.