Sourceror (Sourceror v0.4.0) View Source

Installation

Add :sourceror as a dependency to your project's mix.exs:

defp deps do
  [
    {:sourceror, "~> 0.4.0"}
  ]
end

A note on compatibility

Sourceror is compatible with Elixir versions down to 1.10 and OTP 21. For Elixir versions prior to 1.13 it uses a vendored version of the Elixir parser and formatter modules. This means that for Elixir versions prior to 1.12 it will successfully parse the new syntax for stepped ranges instead of raising a SyntaxError, but everything else should work as expected.

Background

There have been several attempts at source code manipulation in the Elixir community. Thanks to its metaprogramming features, Elixir provides builtin tools that let us get the AST of any Elixir code, but when it comes to turning the AST back to code as text, we had limited options. Macro.to_string/2 is a thing, but the produced code is generally ugly, mostly because of the extra parenthesis or because it turns string interpolations into calls to erlang modules, to name some examples. This meant that, even if we could use Macro.to_string/2 to get a string and then give that to the Elixir formatter Code.format_string!/2, the output would still be suboptimal, as the formatter is not designed to change the semantics of the code, only to pretty print it. For example, call to erlang modules would be kept as is instead of being turned back to interpolations.

We also had the additional problem of comments being discarded by the tokenizer, and literals not having information like line numbers or delimiter characters. This makes the regular AST too lossy to be useful if what we want is to manipulate the source code, because we need as much information as possible to be able to stay as close to the source as possible. There have been several proposal in the past to bring all this information to the Elixir AST, but they all meant a change that would either break macros due to the addition of new types of AST nodes, or making a compromise in core Elixir itself by storing comments in the nods metadata.

Despite of all these issues, the Elixir formatter is still capable of manipulating the source code to pretty print it. Under the hood it does some neat tricks to have all this information available: on one hand, it tells the tokenizer to extract the comments from the source code and keep it at hand(not in the AST itself, but as a separate data structure), and on the other hand it tells the parser to wrap literals in block nodes so metadata can be preserved. Once it has all it needs, it can start converting the AST and comments into an algebra document, and ultimately convert that to a string. This functionality was private, and if we wanted to do it ourselves we would have to replicate or vendor the Elixir formatter with all its more than 2000 lines of code. This approach was explored by Wojtek Mach in wojtekmach/fix, but it involved vendoring the elixir Formatter code, was tightly coupled to the formatting process, and any change in Elixir would break the code.

Since Elixir 1.13 this functionality from the formatter was finally exposed via the Code.string_to_quoted_with_comments/2 and Code.quoted_to_algebra/2 functions. The former gives us access to the list of comments in a shape the Elixir formatter is able to use, and the latter lets us turn any arbitrary Elixir AST into an algebra document. If we also give it the list of comments, it will merge them together, allowing us to format AST and preserve the comments. Now all we need to care about is of manipulating the AST, and let the formatter do the rest.

Sourceror's AST

Having the AST and comments as separate entities allows Elixir to expose the code formatting utilities without making any changes to it's AST, but also delegates the task of figuring out what's the most appropiate way to work with them to us.

Sourceror's take is to use the node metadata to store the comments. This allows us to work with an AST that is as close to regular elixir AST as possible. It also allows you to move nodes around without worrying about leaving a comment behind and ending up with misplaced comments.

Two metadata fields are added to the regular Elixir AST:

  • :leading_comments - holds the comments directly above the node or are in the same line as it. For example:

    test "parses leading comments" do
      quoted = """
      # Comment for :a
      :a # Also a comment for :a
      """ |> Sourceror.parse_string!()
      assert {:__block__, meta, [:a]} = quoted
      assert meta[:leading_comments] == [
        %{line: 1, previous_eol_count: 1, next_eol_count: 1, text: "# Comment for :a"},
        %{line: 2, previous_eol_count: 0, next_eol_count: 1, text: "# Also a comment for :a"},
      ]
    end
  • :trailing_comments - holds the comments that are inside of the node, but aren't leading any children, for example:

    test "parses trailing comments" do
      quoted = """
      def foo() do
      :ok
      # A trailing comment
      end # Also a trailing comment for :foo
      """ |> Sourceror.parse_string!()
      assert {:def, meta, _} = quoted
      assert meta[:trailing_comments] == [
        %{line: 3, previous_eol_count: 1, next_eol_count: 1, text: "# A trailing comment"},
        %{line: 4, previous_eol_count: 0, next_eol_count: 1, text: "# Also a trailing comment for :foo"},
      ]
    end

Note that Sourceror considers leading comments to the ones that are found in the same line as a node, and trailing coments to the ones that are found in the same line or before the ending line of a node, based on the end, closing or end_of_expression line. This also makes the Sourceror AST consistent with the way the Elixir formatter works, making it easier to reason about how a given AST would be formatted.

Working with line numbers

The way the Elixir formatter combines AST and comments depends on their line numbers and the order in which the AST is traversed. This means that whenever you move a node around, you need to also change the line numbers to reflect their position in the node. This is best seen with an example. Lets imagine you have a list of atoms and you want to sort them in alphabetical order:

:a
# Comment for :j
:j
:c
# Comment for :b
:b

Sorting it is trivial, as you just need to use Enum.sort_by with Atom.to_string(atom). But if we consider the line numbers:

1 :a
2 # Comment for :j
3 :j
4 :c
5 # Comment for :b
6 :b

If we sort them, we end up with this:

1 :a
6 :b
4 :c
3 :j

And the comments will be associated to the line number of the node they're leading:

6 # Comment for :b
3 # Comment for :j

When the formatter traverses the AST, it will find node :b with line 6 and will see comment with line 6, and it will print that comment. But it will also see the comment with line 3 and will go like "hey, this comment has a line number smaller than this node, so this is a trailing comment too!" and will print that comment as well. That will make it output this code:

:a
# Comment for :b
# Comment for :j
:b
:c
:j

And that's not what we want at all. To avoid this issue, we need to calculate how line numbers changed while the sorting and correct them appropiately. Sourceror provides a correct_lines(node, line_correction) that takes care of correcting all the line numbers associated to a node, so all you have to do is figure out the line correction numbers. One way to do it in this example is by getting the line numbers before the change, reorder the nodes, zip the old line numbers with the nodes, and correct their line numbers by the difference between the new and the old one. Translated to code, it would look something like this:

test "sorts atoms with correct comments placement" do
  {:__block__, meta, atoms} = """
  :a
  # Comment for :j
  :j
  :c
  # Comment for :b
  :b
  """ |> Sourceror.parse_string!()

  lines = Enum.map(atoms, &Sourceror.get_line/1)

  atoms =
    Enum.sort_by(atoms, fn {:__block__, _, [atom]} ->
      Atom.to_string(atom)
    end)
    |> Enum.zip(lines)
    |> Enum.map(fn {atom, old_line} ->
      line_correction = old_line - Sourceror.get_line(atom)
      Macro.update_meta(atom, &Sourceror.correct_lines(&1, line_correction))
    end)

  assert Sourceror.to_string({:__block__, meta, atoms}) == """
  :a
  # Comment for :b
  :b
  :c
  # Comment for :j
  :j
  """ |> String.trim()
end

Which will produce the code we expect:

:a
# Comment for :b
:b
:c
# Comment for :j
:j

In other cases, you may want to add lines to the code, which would cause the new nodes to have higher line numbers than the nodes that come after it, and that would also mess up the comments placement. For those use cases Sourceror provides the Sourceror.postwalk/3 function. It's a wrapper over Macro.postwalk/3 that lets you set the line correction that should be applied to subsequent nodes, and it will automatically correct them for you before calling your function on each node. You can see this in action in the examples/expand_multi_alias.exs example.

License

Copyright (c) 2021 dorgandash@gmail.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Link to this section Summary

Functions

Compares two positions.

Shifts the line numbers of the node or metadata by the given line_correction.

Returns the arguments of the node.

Returns the column of a node. If none is found, the default value is returned(defaults to 1).

Returns the line where the given node ends. It recursively checks for end, closing and end_of_expression line numbers. If none is found, the default value is returned(defaults to 1).

Returns the end position of the quoted expression

Returns the line of a node. If none is found, the default value is returned(defaults to 1).

Returns how many lines a quoted expression used in the original source code.

Returns the metadata of the given node.

Gets the range used byt the given quoted expression in the source code.

Returns the start position of a node.

Parses a single expression from the given string.

Parses the source code into an extended AST suitable for source manipulation as described in Code.quoted_to_algebra/2.

Same as parse_string/1 but raises on error.

Performs a depth-first post-order traversal of a quoted expression, correcting line numbers as it goes.

Performs a depth-first post-order traversal of a quoted expression with an accumulator, correcting line numbers as it goes.

A wrapper around Code.quoted_to_algebra/2 for compatibility with pre 1.13 Elixir versions.

A wrapper around Code.string_to_quoted_with_comments/2 for compatibility with pre 1.13 Elixir versions.

A wrapper around Code.string_to_quoted_with_comments!/2 for compatibility with pre 1.13 Elixir versions.

Converts a quoted expression to a string.

Updates the arguments for the given node.

Link to this section Types

Specs

position() :: keyword()

Specs

postwalk_function() ::
  (Macro.t(), Sourceror.PostwalkState.t() ->
     {Macro.t(), Sourceror.PostwalkState.t()})

Specs

range() :: %{start: position(), end: position()}

Link to this section Functions

Link to this function

compare_positions(left, right)

View Source

Specs

compare_positions(position(), position()) :: :gt | :eq | :lt

Compares two positions.

Returns :gt if the first position comes after the second one, and :lt for vice versa. If the two positions are equal, :eq is returned.

nil values for line or columns are strictly lesser than integer values.

Link to this function

correct_lines(meta, line_correction, opts \\ [])

View Source

Specs

correct_lines(Macro.t() | Macro.metadata(), integer(), Macro.metadata()) ::
  Macro.t() | Macro.metadata()

Shifts the line numbers of the node or metadata by the given line_correction.

This function will update the :line, :closing, :do, :end and :end_of_expression line numbers of the node metadata if such fields are present.

Specs

get_args(Macro.t()) :: [Macro.t()]

Returns the arguments of the node.

iex> Sourceror.get_args({:foo, [], [{:__block__, [], [:ok]}]})
[{:__block__, [], [:ok]}]
Link to this function

get_column(arg, default \\ 1)

View Source

Specs

get_column(Macro.t(), default :: integer() | nil) :: integer() | nil

Returns the column of a node. If none is found, the default value is returned(defaults to 1).

A default of nil may also be provided if the column number is meant to be coalesced with a value that is not known upfront.

iex> Sourceror.get_column({:foo, [column: 5], []})
5

iex> Sourceror.get_column({:foo, [], []}, 3)
3
Link to this function

get_end_line(quoted, default \\ 1)

View Source

Specs

get_end_line(Macro.t(), integer()) :: integer()

Returns the line where the given node ends. It recursively checks for end, closing and end_of_expression line numbers. If none is found, the default value is returned(defaults to 1).

iex> Sourceror.get_end_line({:foo, [end: [line: 4]], []})
4

iex> Sourceror.get_end_line({:foo, [closing: [line: 2]], []})
2

iex> Sourceror.get_end_line({:foo, [end_of_expression: [line: 5]], []})
5

iex> Sourceror.get_end_line({:foo, [closing: [line: 2], end: [line: 4]], []})
4

iex> """
...> alias Foo.{
...>   Bar
...> }
...> """ |> Sourceror.parse_string!() |> Sourceror.get_end_line()
3
Link to this function

get_end_position(quoted, default \\ [line: 1, column: 1])

View Source

Specs

get_end_position(Macro.t(), position()) :: position()

Returns the end position of the quoted expression

iex> quoted = ~S"""
...> A.{
...>   B
...> }
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 3, column: 1]

iex> quoted = ~S"""
...> foo do
...>   :ok
...> end
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 3, column: 1]

iex> quoted = ~S"""
...> foo(
...>   :a,
...>   :b
...>    )
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 4, column: 4]
Link to this function

get_line(arg, default \\ 1)

View Source

Specs

get_line(Macro.t(), default :: integer() | nil) :: integer() | nil

Returns the line of a node. If none is found, the default value is returned(defaults to 1).

A default of nil may also be provided if the line number is meant to be coalesced with a value that is not known upfront.

iex> Sourceror.get_line({:foo, [line: 5], []})
5

iex> Sourceror.get_line({:foo, [], []}, 3)
3

Specs

get_line_span(Macro.t()) :: integer()

Returns how many lines a quoted expression used in the original source code.

iex> "foo do :ok end" |> Sourceror.parse_string!() |> Sourceror.get_line_span()
1

iex> """
...> foo do
...>   :ok
...> end
...> """ |> Sourceror.parse_string!() |> Sourceror.get_line_span()
3

Specs

get_meta(Macro.t()) :: Macro.metadata()

Returns the metadata of the given node.

iex> Sourceror.get_meta({:foo, [line: 5], []})
[line: 5]

Specs

get_range(Macro.t()) :: range()

Gets the range used byt the given quoted expression in the source code.

The range is a map with :start and :end positions. Since the end position is normally the start of the closing token, the end position column is adjusted to reflect the real position of the end token.

iex> quoted = ~S"""
...> def foo do
...>   :ok
...> end
...> """ |> Sourceror.parse_string!()
iex> Sourceror.get_range(quoted)
%{start: [line: 1, column: 1], end: [line: 3, column: 3]}

iex> quoted = ~S"""
...> Foo.{
...>   Bar
...> }
...> """ |> Sourceror.parse_string!()
iex> Sourceror.get_range(quoted)
%{start: [line: 1, column: 1], end: [line: 3, column: 1]}
Link to this function

get_start_position(quoted, default \\ [line: 1, column: 1])

View Source

Specs

get_start_position(Macro.t(), position()) :: position()

Returns the start position of a node.

iex> quoted = Sourceror.parse_string!(" :foo")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 2]

iex> quoted = Sourceror.parse_string!("\n\nfoo()")
iex> Sourceror.get_start_position(quoted)
[line: 3, column: 1]

iex> quoted = Sourceror.parse_string!("Foo.{Bar}")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]

iex> quoted = Sourceror.parse_string!("foo[:bar]")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]

iex> quoted = Sourceror.parse_string!("foo(:bar)")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]
Link to this function

parse_expression(string, opts \\ [])

View Source

Specs

parse_expression(String.t(), keyword()) ::
  {:ok, Macro.t(), String.t()} | {:error, String.t()}

Parses a single expression from the given string.

Returns {:ok, quoted, rest} on success or {:error, source} on error.

Examples

iex> ~S"""
...> 42
...>
...> :ok
...> """ |> Sourceror.parse_expression()
{:ok, {:__block__, [trailing_comments: [], leading_comments: [],
                    token: "42", line: 2, column: 1], [42]}, "\n:ok"}

Options

  • :from_line - The line at where the parsing should start. Defaults to 1.

Specs

parse_string(String.t()) :: {:ok, Macro.t()} | {:error, term()}

Parses the source code into an extended AST suitable for source manipulation as described in Code.quoted_to_algebra/2.

Two additional fields are added to nodes metadata:

  • :leading_comments - a list holding the comments found before the node.
  • :trailing_comments - a list holding the comments found before the end of the node. For example, comments right before the end keyword.

Comments are the same maps returned by Code.string_to_quoted_with_comments/2.

Specs

parse_string!(String.t()) :: Macro.t()

Same as parse_string/1 but raises on error.

Specs

postwalk(Macro.t(), postwalk_function()) :: Macro.t()

Performs a depth-first post-order traversal of a quoted expression, correcting line numbers as it goes.

See postwalk/3 for more information.

Link to this function

postwalk(quoted, acc, fun)

View Source

Specs

postwalk(Macro.t(), term(), postwalk_function()) :: {Macro.t(), term()}

Performs a depth-first post-order traversal of a quoted expression with an accumulator, correcting line numbers as it goes.

fun is a function that will receive the current node as a first argument and the traversal state as the second one. It must return a {quoted, state}, in the same way it would return {quoted, acc} when using Macro.postwalk/3.

Before calling fun in a node, its line numbers will be corrected by the state.line_correction. If you need to manually correct the line number of a node, use correct_lines/2.

The state is a map with the following keys:

  • :line_correction - an integer representing how many lines subsequent nodes should be shifted. If the function adds more nodes to the tree that should go in a new line, the line numbers of the subsequent nodes need to be updated in order for comments to be correctly placed during the formatting process. If the function does this kind of change, it must update the :line_correction field by adding the amount of lines that should be shifted. Note that this field is cumulative, setting it to 0 will reset it for the whole traversal. Starts at 0.

  • :acc - The accumulator. Defaults to nil if none is given.

Link to this macro

quoted_to_algebra(quoted, opts)

View Source (macro)

A wrapper around Code.quoted_to_algebra/2 for compatibility with pre 1.13 Elixir versions.

Link to this macro

string_to_quoted(string, opts)

View Source (macro)

A wrapper around Code.string_to_quoted_with_comments/2 for compatibility with pre 1.13 Elixir versions.

Link to this macro

string_to_quoted!(string, opts)

View Source (macro)

A wrapper around Code.string_to_quoted_with_comments!/2 for compatibility with pre 1.13 Elixir versions.

Link to this function

to_string(quoted, opts \\ [])

View Source

Specs

to_string(Macro.t(), keyword()) :: String.t()

Converts a quoted expression to a string.

The comments line number will be ignored and the line number of the associated node will be used when formatting the code.

Options

  • :line_length - The max line length for the formatted code.

  • :indent - how many indentations to insert at the start of each line. Note that this only prepends the indents without checking the indentation of nested blocks. Defaults to 0.

  • :indent_type - the type of indentation to use. It can be one of :spaces, :single_space or :tabs. Defaults to :spaces;

Specs

update_args(Macro.t(), ([Macro.t()] -> [Macro.t()])) :: Macro.t()

Updates the arguments for the given node.

iex> node = {:foo, [line: 1], [{:__block__, [line: 1], [2]}]}
iex> updater = fn args -> Enum.map(args, &Sourceror.correct_lines(&1, 2)) end
iex> Sourceror.update_args(node, updater)
{:foo, [line: 1], [{:__block__, [line: 3], [2]}]}