Building a linter

Building a linter#

This tutorial will show how to use TOML Fortran to build a linter for your configuration files. Linters provide a way to encourage or enforce a certain style or flag up common usage errors.

Target selection#

This tutorial will look into finding lint in the package manifest from the Fortran package manager (fpm). We will use its plugin mechanism to create a new subcommand called lint.

We start with setting up the package manifest for our linter:

fpm.toml#

name = "fpm-lint"
version = "0.1.0"

[dependencies]
toml-f.git = "https://github.com/toml-lang/toml-f.git"

Configuration of the linter#

To configure our linter we will use the extra section in the manifest which is specially reserved for tools integrating with fpm and boldly claim extra.fpm.lint as our configuration section. Using the package manifest provides us with two advantages, first this document will be present in all projects using fpm, second if we can read our configuration from the manifest, we are already sure it is valid TOML.

fpm.toml#

# ...
[extra.fpm.lint]
package-name = true
bare-keys = true

Now we will set up our main program to run the linter.

app/main.f90#

program main
  use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
  use fpm_lint_utils, only : get_argument
  use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config
  implicit none
  logical, parameter :: color = .true.
  character(:), allocatable :: manifest
  type(toml_table), allocatable :: table
  type(toml_error), allocatable :: error
  type(toml_context) :: context

  call get_argument(1, manifest)
  if (.not.allocated(manifest)) manifest = "fpm.toml"

  call toml_load(table, manifest, error=error, context=context, &
    & config=toml_parser_config(color=color))
  call handle_error(error)

contains

  subroutine handle_error(error)
    type(toml_error), intent(in), optional :: error
    if (present(error)) then
      write(stderr, '(a)') error%message
      stop 1
    end if
  end subroutine handle_error

end program main

We create a utility module for the get_argument function used to retrieve the manifest name, in most cases we can default to fpm.toml, but for testing it is convenient to pass an argument.

src/utils.f90#

!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
  implicit none
  private

  public :: get_argument

contains

  !> Obtain the command line argument at a given index
  subroutine get_argument(idx, arg)
    !> Index of command line argument, range [0:command_argument_count()]
    integer, intent(in) :: idx
    !> Command line argument
    character(len=:), allocatable, intent(out) :: arg

    integer :: length, stat

    call get_command_argument(idx, length=length, status=stat)
    if (stat == 0) then
      allocate(character(len=length) :: arg, stat=stat)
    end if

    if (stat == 0 .and. length > 0) then
      call get_command_argument(idx, arg, status=stat)
      if (stat /= 0) deallocate(arg)
    end if
  end subroutine get_argument

end module fpm_lint_utils

The first error source we can encounter stems from parsing the TOML document itself. This is outside of our responsibility to handle, still we want to check whether we can report the error correctly.

fpm.toml (invalid)#

name = "demo"

[extra.fpm.lint]
package-name =
bare-keys = true

Running the linter on this document will break with the following message produced by the toml_load procedure.

❯ fpm run -- invalid.toml
error: Invalid expression for value
 --> invalid.toml:4:15
  |
4 | package-name =
  |               ^ unexpected newline
  |

With this case covered we proceed with reading the configuration for our linter.

Our configuration from the package manifest will be stored in a lint_config type which we define in a separate module. Reading the configuration will happen from the root table, meaning we have to advance through several subtables first before we can process the options for our linter. We want to report errors with rich context information here as well, therefore we request the origin in every call to the get_value interface and produce a report using the context we obtained in the main program.

src/config.f90#

!> Configuration data for the manifest linting
module fpm_lint_config
  use tomlf, only : toml_table, toml_context, toml_terminal, toml_error, &
    & toml_stat, get_value
  implicit none

  !> Configuration for the manifest linting
  type :: lint_config
    !> Check package name
    logical :: package_name
    !> Check all key paths
    logical :: bare_keys
  end type lint_config

contains

  !> Load the configuration for the linter from the package manifest
  subroutine load_lint_config(config, table, context, terminal, error)
    !> Configuration for the linter
    type(lint_config), intent(out) :: config
    !> TOML data structure representing the manifest
    type(toml_table), intent(inout) :: table
    !> Context describing the data structure
    type(toml_context), intent(in) :: context
    !> Terminal for output
    type(toml_terminal), intent(in) :: terminal
    !> Error handler
    type(toml_error), allocatable, intent(out) :: error

    integer :: origin, stat
    type(toml_table), pointer :: child1, child2, child3

    call get_value(table, "extra", child1, origin=origin)
    if (.not.associated(child1)) then
      call make_error(error, context%report("The 'extra' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if
    call get_value(child1, "fpm", child2, origin=origin)
    if (.not.associated(child2)) then
      call make_error(error, context%report("The 'fpm' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if
    call get_value(child2, "lint", child3, origin=origin)
    if (.not.associated(child3)) then
      call make_error(error, context%report("The 'lint' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if

    call get_value(child3, "package-name", config%package_name, .true., &
      & stat=stat, origin=origin)
    if (stat /= toml_stat%success) then
      call make_error(error, context%report("Entry in 'package-name' must be boolean", &
        & origin, "expected boolean value", color=terminal))
      return
    end if
    call get_value(child3, "bare-keys", config%bare_keys, .true., &
      & stat=stat, origin=origin)
    if (stat /= toml_stat%success) then
      call make_error(error, context%report("Entry in 'bare-key' must be boolean", &
        & origin, "expected boolean value", color=terminal))
      return
    end if
  end subroutine load_lint_config

  !> Create an error message
  subroutine make_error(error, message)
    !> Error handler
    type(toml_error), allocatable, intent(out) :: error
    !> Message to be displayed
    character(len=*), intent(in) :: message

    allocate(error)
    error%message = message
    error%stat = toml_stat%fatal
  end subroutine make_error

end module fpm_lint_config

For convenience, we defined a make_error routine to allocate the error handler and store our report from the context. At this point, we should check whether our error reporting works and run the linter on an incorrect TOML document.

fpm.toml#

name = "demo"

[extra.fpm.lint]
package-name = "true"
bare-keys = true

Running our linter on this file will correctly flag this as an error since a string value is provided rather than a boolean value.

❯ fpm run -- fpm.toml
error: Entry in 'package-name' must be boolean
 --> fpm.toml:4:16-21
  |
4 | package-name = "true"
  |                ^^^^^^ expected boolean value
  |

Finally, we define a logging mechanism to capture our actual linting messages which are not fatal. The logger provides two procedures, add_message to store a message and show_log to display all stored messages.

src/logger.f90#

module fpm_lint_logger
  implicit none
  private

  public :: lint_logger, new_logger


  type :: log_message
    character(:), allocatable :: output
  end type log_message

  type :: lint_logger
    type(log_message), allocatable :: message(:)
  contains
    procedure :: add_message
    procedure :: show_log
  end type lint_logger

contains

  subroutine new_logger(logger)
    type(lint_logger), intent(out) :: logger

    allocate(logger%message(0))
  end subroutine new_logger

  subroutine add_message(logger, message)
    class(lint_logger), intent(inout) :: logger
    character(*), intent(in) :: message

    logger%message = [logger%message, log_message(message)]
  end subroutine add_message

  subroutine show_log(logger, io)
    class(lint_logger), intent(in) :: logger
    integer, intent(in) :: io

    integer :: it

    do it = 1, size(logger%message)
      write(io, '(a)') logger%message(it)%output
    end do
  end subroutine show_log

end module fpm_lint_logger

Recommended package name#

As a first linting check we will inspect the package name, for this we will apply the following rules:

the package name should be a TOML bare key to not require quotes in dependency sections, characters like dots, colons, or slashes are not allowed
TOML generally favors lowercase dashed keys, therefore we will discourage capitalization (camelCase and PascalCase) as well as underscores (snake_case)
there are several ways to declare strings in TOML, we want to favor the normal string one

An example of a package name we would disallow would be fpmLinter as seen in the manifest below.

fpm.toml#

name = "fpmLinter"

Let’s start with our implementation of this check. For convenience we will reexport the other modules from the fpm_lint module, this allows one clean import in the main program. Then we define the lint_data procedure, where we first check whether the name key is present, if not we create a message at the info level and leave our block scope, as all further checks rely on the presence of the entry.

We can now check whether the entry is provided as a string or maybe as something else, like a literal string, which we can flag. Furthermore, we verify that the package name uses only lowercase letters, numbers, and dashes with the verify intrinsic.

src/lint.f90#

!> Linter for package manifests used with the Fortran package manager
module fpm_lint
  use fpm_lint_config, only : lint_config, load_lint_config
  use fpm_lint_logger, only : lint_logger, new_logger
  use fpm_lint_utils, only : resize, get_argument
  use tomlf, only : toml_table, toml_context, toml_terminal, toml_error, toml_level, &
    & toml_key, get_value
  use tomlf_de_token, only : token_kind, stringify
  implicit none
  private

  public :: lint_config, load_lint_config
  public :: lint_logger, new_logger
  public :: lint_data
  public :: get_argument

contains

  !> Entry point for linting the data structure representing the package manifest
  subroutine lint_data(logger, config, table, context, terminal)
    !> Instance of the logger
    type(lint_logger), intent(inout) :: logger
    !> Configuration for the linter
    type(lint_config), intent(in) :: config
    !> TOML data structure
    type(toml_table), intent(inout) :: table
    !> Context describing the data structure
    type(toml_context), intent(in) :: context
    !> Terminal for output
    type(toml_terminal), intent(in) :: terminal

    if (config%package_name) then
      check_package_name: block
        character(:), allocatable :: package_name
        integer :: origin

        call get_value(table, "name", package_name, origin=origin)

        if (.not.allocated(package_name)) then
          call logger%add_message(context%report( &
            "Package name entry is required in the top-level", &
            origin, &
            level=toml_level%info, color=terminal))
          exit check_package_name
        end if

        if (context%token(origin)%kind /= token_kind%string) then
          call logger%add_message(context%report( &
            "Package name should not be a "//stringify(context%token(origin)), &
            origin, &
            "prefer string value (double quotes)", &
            level=toml_level%info, color=terminal))
        end if

        if (verify(package_name, "abcdefghijklmnopqrstuvwxyz0123456789-") > 0) then
          call logger%add_message(context%report( &
            "Package name should be lowercase with dashes", &
            origin, &
            level=toml_level%info, color=terminal))
        end if
      end block check_package_name
    end if

  end subroutine lint_data

end module fpm_lint

Tip

The toml_level parameter provides a statically initialized derived type enumerating all available report levels. Similarly, the token_kind parameter provides an enumeration of the token kinds. You can think of it as an enumerator with a proper namespace.

We check this on the camelCase package name from above and can find the following output.

❯ fpm run -- fpm.toml
info: Package name should be lowercase with dashes
 --> fpm.toml:1:8-18
  |
1 | name = "fpmLinter"
  |        ^^^^^^^^^^^
  |

Exercise

Add a check for the length of the package name, everything under three characters is probably a bad choice, so is a too long package name.

Create an example to trigger the error with your new check. What happens if a too long camelCase package name is used?

Bare key paths preferred#

TOML allows to quote keys, however this might become visually distracting if some keys are quoted and others are not. With our package name rule, there should not be the need to quote any keys even in dependency sections.

To determine whether a string is used in the context of a key we need a way to identify all keys. We could check all entries in the data structures by implementing a visitor object which walks through all tables and checks the keys. However, this is somewhat inefficient and we can also miss keys that are not recorded.

fpm.toml#

name = "demo"

[dependencies]
toml-f.git = "https://github.com/toml-f/toml-f"
"toml-f".tag = "v0.2.3"

In this example, the second occurrence of the key toml-f will only reference the table but it is already defined the line before. The quotation marks are visually identifiable as lint and we need a programmatic way to flag this.

Instead of working with the data structure, we will use the parser to record more tokens in the context. Rather than using the context to only report errors, we will use it to identify keys. This is done by increasing the context_detail option in the config keyword of the parser to one. Now all tokens except for whitespace and comments will be recorded.

app/main.f90#

call toml_load(table, manifest, error=error, context=context, &
  & config=toml_parser_config(color=color, context_detail=1))

Tip

Increasing the context_detail to two will also record whitespace and comments. This can be useful when writing checks for whitespace or indentation styles.

Our linter pass will work as follows:

identifying all relevant keys in the manifest
check whether they are keypath tokens
create a report for any key that is a string or a literal

Our implementation reflects this by first collecting an array of toml_key objects in list and then iterating over all entries checking whether they are the correct token_kind.

src/lint.f90#

  !> Entry point for linting the keys in the TOML document
  subroutine lint_keys(logger, config, context, terminal)
    !> Instance of the logger
    type(lint_logger), intent(inout) :: logger
    !> Configuration for the linter
    type(lint_config), intent(in) :: config
    !> Context describing the data structure
    type(toml_context), intent(in) :: context
    !> Terminal for output
    type(toml_terminal), intent(in) :: terminal

    integer :: it
    type(toml_key), allocatable :: list(:)

    call identify_keys(list, context)

    if (config%bare_keys) then
      do it = 1, size(list)
        associate(token => context%token(list(it)%origin))
          if (token%kind /= token_kind%keypath) then
            call logger%add_message(context%report( &
              "String used in key path", &
              list(it)%origin, &
              "use bare key instead", &
              level=toml_level%info, color=terminal))
          end if
        end associate
      end do
    end if
  end subroutine lint_keys

To create the list we need to implement the identify_keys procedure. The rules in TOML for key paths are simple: before an equal sign we can have key paths and keypath can only be present in table bodies or inline tables. This can be implemented by using a stack storing whether the current scope belongs in a table, array, or value.

We will always push a new scope on the respective token opening it, i.e. a value is opened by an equal sign, an array by a right bracket, and an inline table by a right curly brace. To distinguish table headers from inline arrays we only push arrays on our stack after an equal sign. Finally, we default to a table scope if no other scope is present and we have collected all required rules to identify key paths. Similarly, we can identify the endings of the scopes.

We then can check whether the current scope on the top of the stack allows key paths and record those in our list.

src/lint.f90#

  !> Collect all key paths used in TOML document
  subroutine identify_keys(list, context)
    !> List of all keypaths in the TOML document
    type(toml_key), allocatable, intent(out) :: list(:)
    !> Context describing the data structure
    type(toml_context), intent(in) :: context

    integer, parameter :: table_scope = 1, array_scope = 2, value_scope = 3
    integer :: it, top
    integer, allocatable :: scopes(:)

    allocate(list(0))

    top = 0
    call resize(scopes)

    ! Documents always start with a table scope
    call push_back(scopes, top, table_scope)
    do it = 1, context%top
      select case(context%token(it)%kind)
      case(token_kind%keypath)
        ! Record all key path
        associate(token => context%token(it))
          list = [list, toml_key(context%source(token%first:token%last), it)]
        end associate

      case(token_kind%string, token_kind%literal)
        ! Record all strings used in key paths
        if (scopes(top) == table_scope) then
          associate(token => context%token(it))
            list = [list, toml_key(context%source(token%first+1:token%last-1), it)]
          end associate
        end if

      case(token_kind%equal)  ! Open value scope
        call push_back(scopes, top, value_scope)

      case(token_kind%lbrace)  ! Open inline table scope
        call push_back(scopes, top, table_scope)

      case(token_kind%lbracket)  ! Open array scope
        if (scopes(top) /= table_scope) then
          call push_back(scopes, top, array_scope)
        end if

      case(token_kind%newline)  ! Close value scope in key-value pair 
        call pop(scopes, top, value_scope)

      case(token_kind%rbrace)  ! Close value and table scope in inline table
        call pop(scopes, top, value_scope)
        call pop(scopes, top, table_scope)

      case(token_kind%comma)  ! Close value scope in inline table
        call pop(scopes, top, value_scope)

      case(token_kind%rbracket)  ! Close array scope
        call pop(scopes, top, array_scope)

      end select
    end do

  contains

    !> Push a new scope onto the stack
    pure subroutine push_back(scopes, top, this_scope)
      !> Stack top
      integer, intent(inout) :: top
      !> Current stack of scopes
      integer, allocatable, intent(inout) :: scopes(:)
      !> Scope to push onto the stack
      integer, intent(in) :: this_scope

      top = top + 1
      if (top > size(scopes)) call resize(scopes)
      scopes(top) = this_scope
    end subroutine push_back

    !> Remove a matching scope from the stack
    subroutine pop(scopes, top, this_scope)
      !> Stack top
      integer, intent(inout) :: top
      !> Current stack of scopes
      integer, allocatable, intent(inout) :: scopes(:)
      !> Scope to remove from the stack
      integer, intent(in) :: this_scope

      if (top > 0) then
        if (scopes(top) == this_scope) top = top - 1
      end if
    end subroutine pop

  end subroutine identify_keys

For convenience, we implement a push_back and pop function to add and remove scopes from our stack. The pop function will additionally perform a check whether we want to remove a matching scope and save us some repetition in the loop this way.

In our utility module, we implement the resize procedure for an array of integers

src/utils.f90#

!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
  implicit none
  private

  public :: resize
  public :: get_argument

  !> Resize a 1D array to a new size
  interface resize
    module procedure :: resize_ints
  end interface resize

contains

  !> Reallocate list of integer
  pure subroutine resize_ints(var, n)
    !> Instance of the array to be resized
    integer, allocatable, intent(inout) :: var(:)
    !> Dimension of the final array size
    integer, intent(in), optional :: n

    integer, allocatable :: tmp(:)
    integer :: this_size, new_size
    integer, parameter :: initial_size = 8

    if (allocated(var)) then
      this_size = size(var, 1)
      call move_alloc(var, tmp)
    else
      this_size = initial_size
    end if

    if (present(n)) then
      new_size = n
    else
      new_size = this_size + this_size/2 + 1
    end if

    allocate(var(new_size))

    if (allocated(tmp)) then
      this_size = min(size(tmp, 1), size(var, 1))
      var(:this_size) = tmp(:this_size)
      deallocate(tmp)
    end if
  end subroutine resize_ints

end module fpm_lint_utils

At this point, we can now add a call in our main program to run the linter.

❯ fpm run -- fpm.toml
info: String used in key path
 --> fpm.toml:5:1-8
  |
5 | "toml-f".tag = "v0.2.3"
  | ^^^^^^^^ use bare key instead
  |

Now for something more tricky with an inline table to check whether our scoping rules are working correctly.

fpm.toml#

name = "demo"

[dependencies]
toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}

Our linter can correctly identify the tag entry as a string in the key path context and produces the appropriate message.

❯ fpm run -- fpm.toml
info: String used in key path
 --> fpm.toml:4:53-57
  |
4 | toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}
  |                                                     ^^^^^ use bare key instead
  |

Exercise

Previously, we flagged the usage of a literal string as a value for the package name, however a package manifest can contain much more string values.

Create a check for all string values in the manifest to ensure they use double-quotes. Collect string values (string, literal, mstring, and mliteral) from array and value scopes for this purpose.

Can you make a meaningful suggestion if a literal string contains characters that must be escaped in a double-quoted string?

Summary#

This concludes the linting we wanted to implement for the fpm package manifest. For a feature-complete linter, the rule set to check for is usually growing with time and might also shift as new rules are adopted. Our linter currently provides only a few rules but has the potential to include more checks as the need arises.

Exercise

Our output is currently in the order of the checks, rather than in the order of reports occurring in the TOML document. The output of the reports might become more intuitive if it was sorted according to the source lines.

Record the first character in the output together with the messages in the logger. Have the logger sort the messages according to their order before printing them.

Important

In this tutorial, you have learned how to report custom error messages in your TOML input data. You can now

report colorized error messages with rich context information
create error messages when reading a TOML data structure
control the details captured in the context describing the TOML document
check a TOML document based on the token information in the context

Building a linter

Contents

Building a linter#

Target selection#

Configuration of the linter#

Recommended package name#

Bare key paths preferred#

Summary#