ChromicPDF v0.3.0 ChromicPDF View Source
ChromicPDF is a fast HTML-to-PDF/A renderer based on Chrome & Ghostscript.
Usage
Start
Start ChromicPDF as part of your supervision tree:
def MyApp.Application do
def start(_type, _args) do
children = [
# other apps...
{ChromicPDF, chromic_pdf_opts()}
]
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
end
defp chromic_pdf_opts do
[]
end
end
Print a PDF or PDF/A
ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: "output.pdf")
See ChromicPDF.print_to_pdf/2
and ChromicPDF.convert_to_pdfa/2
.
Options
ChromicPDF spawns two worker pools, the session pool and the ghostscript pool. By default, it will create 5 workers with no overflow. To change these options, you can pass configuration to the supervisor. Please note that these are only worker pools. If you intend to max them out, you will need a job queue as well.
Please see https://github.com/devinus/poolboy for available options.
defp chromic_pdf_opts do
[
session_pool: [
size: 3,
max_overflow: 0
],
ghostscript_pool: [
size: 10,
max_overflow: 2
]
]
end
Security Considerations
Before adding a browser to your application's (perhaps already long) list of dependencies, you may want consider the security hints below.
Escape user-supplied data
If you can, make sure to escape any data provided by users with something like
Phoenix.HTML.escape_html
.
Chrome is designed to make displaying HTML pages relatively safe, in terms of preventing
undesired access of a page to the host operating system. However, the attack surface of your
application is still increased. Running this in a contained application with a small HTTP
interface creates an additional barrier (and has other benefits).
Running in online mode
Browser targets will be spawned in "offline mode" by default (using the DevTools command
Network.emulateNetworkConditions
.
Users are required to take this extra step (basically reading this paragraph) to re-consider
whether remote printing is a requirement.
However, there are a lot of valid use-cases for printing from a URL, particularly from a
webserver on localhost. To switch to "online mode", pass the offline: false
parameter.
def chromic_pdf_opts do
[offline: false]
end
Chrome Sandbox
By default, ChromicPDF will run Chrome targets in a sandboxed OS process. If you absolutely
must run Chrome as root, you can turn of its sandbox by passing the no_sandbox: true
option.
defp chromic_pdf_opts do
[no_sandbox: true]
end
How it works
PDF Printing
- ChromicPDF spawns an instance of Chromium/Chrome (an OS process) and connects to its "DevTools" channel via file descriptors.
- The Chrome process is supervised and the connected processes will automatically recover if it crashes.
- A number of "targets" in Chrome are spawned, 1 per worker process in the
SessionPool
. By default, ChromicPDF will spawn each session in a new browser context (i.e., a profile). - When a PDF print is requested, a session will instruct its assigned "target" to navigate to
the given URL, then wait until it receives a "frameStoppedLoading" event, and proceed to call
the
printToPDF
function. - The printed PDF will be sent to the session as Base64 encoded chunks.
Link to this section Summary
Functions
Captures a screenshot.
Returns a specification to start this module under a supervisor.
Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).
Prints a PDF.
Prints a PDF and converts it to PDF/A in a single call.
Link to this section Functions
capture_screenshot(input, opts \\ [])
View Sourcecapture_screenshot(url :: ChromicPDF.Processor.source(), opts :: keyword()) :: :ok | {:ok, ChromicPDF.Processor.blob()}
Captures a screenshot.
This call blocks until the screenshot has been created.
Print and return Base64-encoded PNG
{:ok, blob} = ChromicPDF.capture_screenshot({:url, "file:///example.html"})
Options
Options can be passed by passing a map to the :capture_screenshot
key.
ChromicPDF.capture_screenshot(
{:url, "file:///example.html"},
capture_screenshot: %{
format: "jpeg"
}
)
Please see docs for details:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-captureScreenshot
Returns a specification to start this module under a supervisor.
See Supervisor
.
convert_to_pdfa(pdf_path, opts \\ [])
View Sourceconvert_to_pdfa( pdf_path :: ChromicPDF.Processor.path(), opts :: [ChromicPDF.Processor.pdfa_option()] ) :: :ok | {:ok, ChromicPDF.Processor.blob()}
Converts a PDF to PDF/A (either PDF/A-2b or PDF/A-3b).
Convert an input PDF and return a Base64-encoded blob
{:ok, blob} = ChromicPDF.convert_to_pdfa("some_pdf_file.pdf")
Convert and write to file
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", output: "output.pdf")
PDF/A versions & levels
Ghostscript supports both PDF/A-2 and PDF/A-3 versions, both in their b
(basic) level. By
default, ChromicPDF generates version PDF/A-3b files. Set the pdfa_version
option for
version 2.
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", pdfa_version: "2")
Specifying PDF metadata
The converter is able to transfer PDF metadata (the Info
dictionary) from the original
PDF file to the output file. However, files printed by Chrome do not contain any metadata
information (except "Creator" being "Chrome").
The :info
option of the PDF/A converter allows to specify metatadata for the output file
directly.
ChromicPDF.convert_to_pdfa("some_pdf_file.pdf", info: %{creator: "ChromicPDF"})
The converter understands the following keys, all of which accept only String values:
:title
:author
:subject
:keywords
:creator
:creation_date
:mod_date
By specification, date values in :creation_date
and :mod_date
do not need to follow a
specific syntax. However, Ghostscript inserts date strings like "D:20200208153049+00'00'"
and Info extractor tools might rely on this or another specific format. The converter will
automatically format given DateTime
values like this.
Both :creation_date
and :mod_date
are filled with the current date automatically (by
Ghostscript), if the original file did not contain any.
Adding more PostScript to the conversion
The pdfa_def_ext
option can be used to feed more PostScript code into the final conversion
step. This can be useful to add additional features to the generated PDF-A file, for
instance a ZUGFeRD invoice.
ChromicPDF.convert_to_pdfa(
"some_pdf_file.pdf",
pdfa_def_ext: "[/Title (OverriddenTitle) /DOCINFO pdfmark",
)
print_to_pdf(input, opts \\ [])
View Sourceprint_to_pdf( input :: ChromicPDF.Processor.source() | ChromicPDF.Processor.source_and_options(), opts :: [ChromicPDF.Processor.pdf_option()] ) :: :ok | {:ok, ChromicPDF.Processor.blob()}
Prints a PDF.
This call blocks until the PDF has been created.
Output options
Print and return Base64-encoded PDF
{:ok, blob} = ChromicPDF.print_to_pdf({:url, "file:///example.html"})
# Can be displayed in iframes
"data:application/pdf;base64,#{blob}"
Print to file
ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: "output.pdf")
Print to temporary file
ChromicPDF.print_to_pdf({:url, "file:///example.html"}, output: fn path ->
send_download(path)
end)
The temporary file passed to the callback will be deleted when the callback returns.
Input options
Print from URL
Passing in a URL is the simplest way of printing a PDF. A target in Chrome is told to navigate to the given URL. When navigation is finished, the PDF is printed.
ChromicPDF.print_to_pdf({:url, "file:///example.html"})
One may pass http
or https
URLs just like above, only be aware that you will need to
enable "online mode" first. See "Running in online mode"
for explanation.
ChromicPDF.print_to_pdf({:url, "http:///example.net"})
Cookies
If your URL requires authentication, you can pass in a session cookie. The cookie is automatically cleared after the PDF has been printed.
cookie = %{
name: "foo",
value: "bar",
domain: "localhost"
}
ChromicPDF.print_to_pdf({:url, "http:///example.net"}, set_cookie: cookie)
See Network.setCookie
for options. name
and value
keys are required.
Print from in-memory HTML
For convenience, it is also possible to pass a HTML blob to print_to_pdf/2
. The HTML is
sent to the target using the Pahe.setDocumentContent
function.
ChromicPDF.print_to_pdf(
{:html, "<h1>Hello World!</h1>"}
)
In-memory content can be iodata
In-memory HTML for both the main input parameter as well as the header and footer options
can be passed as iodata
. Such lists are converted to String before submission to the
session process by passing them through :erlang.iolist_to_binary/1
.
ChromicPDF.print_to_pdf(
{:html, ["<style>p { color: green; }</style>", "<p>green paragraph</p>"]}
)
Content from Phoenix templates
If your content is generated by a Phoenix template (and hence comes in the form of
{:safe, iodata()}
), you will need to pass it to Phoenix.HTML.safe_to_string/1
first.
content = SomeView.render("body.html") |> Phoenix.HTML.safe_to_string()
ChromicPDF.print_to_pdf({:html, content})
PDF printing options
ChromicPDF.print_to_pdf(
{:url, "file:///example.html"},
print_to_pdf: %{
pageRanges: "1-2"
}
)
Please note the camel-case. For a full list of options to the printToPDF
function,
please see the Chrome documentation at:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-printToPDF
Header and footer
Chrome's support for native header and footer sections is a little bit finicky. Still, to
the best of my knowledge, headerTemplate
and footerTemplate
are the only
well-functioning solutions if you need headers or footers that are repeated on multiple
pages even in the presence of body elements stretching across a page break.
In order to make header and footer visible in the first place, you will need to be aware of a couple of caveats:
HTML for header and footer is interpreted in a new page context which means no body styles will be applied. In fact, even default browser styles are not present, so all content will have a default
font-size
of zero, and so on.You need to make space for the header and footer templates first, by adding page margins. Margins can either be given using the
marginTop
andmarginBottom
options or with CSS styles. If you use the options, the height of header and footer elements will inherit these values. If you use CSS styles, make sure to set the height of the elements in CSS as well.Header and footer have a default padding to the page ends of 0.4 centimeters. To remote this, add the following to header/footer template styles (source).
#header, #footer { padding: 0 !important; }
If header or footer is not displayed when it should, make sure your HTML is valid. Tuning the margins for an hour looking for mistakes there, only to discover that you are missing a closing
</style>
tag, can be quite painful.Javascript is not interpreted.
Background colors are not applied, unless you include set
-webkit-print-color-adjust: exact
in the CSS.
See print_header_footer_template.html
from the Chromium sources to see how these values are interpreted.
Page size and margins
Chrome will use the provided pagerWidth
and paperHeight
dimensions as the PDF paper
format, unless the preferCSSPageSize
option is set to true
in which case it prioritizes
values set in a @page
section in the (body) CSS. However, any margin applied to the page
using the options above is generally overridden by margin rules in the @page
section.
print_to_pdfa(input, opts \\ [])
View Sourceprint_to_pdfa( input :: ChromicPDF.Processor.source() | ChromicPDF.Processor.source_and_options(), opts :: [ ChromicPDF.Processor.pdf_option() | ChromicPDF.Processor.pdfa_option() ] ) :: :ok | {:ok, ChromicPDF.Processor.blob()}
Prints a PDF and converts it to PDF/A in a single call.
See print_to_pdf/2
and convert_to_pdfa/2
for options.
Example
ChromicPDF.print_to_pdfa({:url, "https://example.net"})