Building a linter#

Difficulty: Beginner

This tutorial will show how to use TOML Fortran to build a linter for your configuration files. Linters provide a way to encourage or enforce a certain style or flag up common usage errors.

Target selection#

This tutorial will look into finding lint in the package manifest from the Fortran package manager (fpm). We will use its plugin mechanism to create a new subcommand called lint.

We start with setting up the package manifest for our linter:

fpm.toml#
name = "fpm-lint"
version = "0.1.0"

[dependencies]
toml-f.git = "https://github.com/toml-lang/toml-f.git"

Configuration of the linter#

To configure our linter we will use the extra section in the manifest which is specially reserved for tools integrating with fpm and boldly claim extra.fpm.lint as our configuration section. Using the package manifest provides us with two advantages, first this document will be present in all projects using fpm, second if we can read our configuration from the manifest, we are already sure it is valid TOML.

fpm.toml#
# ...
[extra.fpm.lint]
package-name = true
bare-keys = true

Now we will set up our main program to run the linter.

app/main.f90#
program main
  use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
  use fpm_lint_utils, only : get_argument
  use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config
  implicit none
  logical, parameter :: color = .true.
  character(:), allocatable :: manifest
  type(toml_table), allocatable :: table
  type(toml_error), allocatable :: error
  type(toml_context) :: context

  call get_argument(1, manifest)
  if (.not.allocated(manifest)) manifest = "fpm.toml"

  call toml_load(table, manifest, error=error, context=context, &
    & config=toml_parser_config(color=color))
  call handle_error(error)

contains

  subroutine handle_error(error)
    type(toml_error), intent(in), optional :: error
    if (present(error)) then
      write(stderr, '(a)') error%message
      stop 1
    end if
  end subroutine handle_error

end program main

We create a utility module for the get_argument function used to retrieve the manifest name, in most cases we can default to fpm.toml, but for testing it is convenient to pass an argument.

src/utils.f90#
!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
  implicit none
  private

  public :: get_argument

contains

  !> Obtain the command line argument at a given index
  subroutine get_argument(idx, arg)
    !> Index of command line argument, range [0:command_argument_count()]
    integer, intent(in) :: idx
    !> Command line argument
    character(len=:), allocatable, intent(out) :: arg

    integer :: length, stat

    call get_command_argument(idx, length=length, status=stat)
    if (stat == 0) then
      allocate(character(len=length) :: arg, stat=stat)
    end if

    if (stat == 0 .and. length > 0) then
      call get_command_argument(idx, arg, status=stat)
      if (stat /= 0) deallocate(arg)
    end if
  end subroutine get_argument

end module fpm_lint_utils

The first error source we can encounter stems from parsing the TOML document itself. This is outside of our responsibility to handle, still we want to check whether we can report the error correctly.

fpm.toml (invalid)#
name = "demo"

[extra.fpm.lint]
package-name =
bare-keys = true

Running the linter on this document will break with the following message produced by the toml_load procedure.

❯ fpm run -- invalid.toml
error: Invalid expression for value
 --> invalid.toml:4:15
  |
4 | package-name =
  |               ^ unexpected newline
  |

With this case covered we proceed with reading the configuration for our linter.

Our configuration from the package manifest will be stored in a lint_config type which we define in a separate module. Reading the configuration will happen from the root table, meaning we have to advance through several subtables first before we can process the options for our linter. We want to report errors with rich context information here as well, therefore we request the origin in every call to the get_value interface and produce a report using the context we obtained in the main program.

src/config.f90#
!> Configuration data for the manifest linting
module fpm_lint_config
  use tomlf, only : toml_table, toml_context, toml_terminal, toml_error, &
    & toml_stat, get_value
  implicit none

  !> Configuration for the manifest linting
  type :: lint_config
    !> Check package name
    logical :: package_name
    !> Check all key paths
    logical :: bare_keys
  end type lint_config

contains

  !> Load the configuration for the linter from the package manifest
  subroutine load_lint_config(config, table, context, terminal, error)
    !> Configuration for the linter
    type(lint_config), intent(out) :: config
    !> TOML data structure representing the manifest
    type(toml_table), intent(inout) :: table
    !> Context describing the data structure
    type(toml_context), intent(in) :: context
    !> Terminal for output
    type(toml_terminal), intent(in) :: terminal
    !> Error handler
    type(toml_error), allocatable, intent(out) :: error

    integer :: origin, stat
    type(toml_table), pointer :: child1, child2, child3

    call get_value(table, "extra", child1, origin=origin)
    if (.not.associated(child1)) then
      call make_error(error, context%report("The 'extra' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if
    call get_value(child1, "fpm", child2, origin=origin)
    if (.not.associated(child2)) then
      call make_error(error, context%report("The 'fpm' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if
    call get_value(child2, "lint", child3, origin=origin)
    if (.not.associated(child3)) then
      call make_error(error, context%report("The 'lint' table is missing.", &
        & origin, "expected table", color=terminal))
      return
    end if

    call get_value(child3, "package-name", config%package_name, .true., &
      & stat=stat, origin=origin)
    if (stat /= toml_stat%success) then
      call make_error(error, context%report("Entry in 'package-name' must be boolean", &
        & origin, "expected boolean value", color=terminal))
      return
    end if
    call get_value(child3, "bare-keys", config%bare_keys, .true., &
      & stat=stat, origin=origin)
    if (stat /= toml_stat%success) then
      call make_error(error, context%report("Entry in 'bare-key' must be boolean", &
        & origin, "expected boolean value", color=terminal))
      return
    end if
  end subroutine load_lint_config

  !> Create an error message
  subroutine make_error(error, message)
    !> Error handler
    type(toml_error), allocatable, intent(out) :: error
    !> Message to be displayed
    character(len=*), intent(in) :: message

    allocate(error)
    error%message = message
    error%stat = toml_stat%fatal
  end subroutine make_error

end module fpm_lint_config

For convenience, we defined a make_error routine to allocate the error handler and store our report from the context. At this point, we should check whether our error reporting works and run the linter on an incorrect TOML document.

fpm.toml#
name = "demo"

[extra.fpm.lint]
package-name = "true"
bare-keys = true
current main program

Putting everything together in the main program should look like this.

app/main.f90#
program main
  use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
  use fpm_lint_config, only : lint_config, load_lint_config
  use fpm_lint_utils, only : get_argument
  use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config, &
    & toml_terminal
  implicit none
  logical, parameter :: color = .true.
  character(:), allocatable :: manifest
  type(toml_terminal) :: terminal
  type(toml_table), allocatable :: table
  type(toml_error), allocatable :: error
  type(toml_context) :: context
  type(lint_config) :: config

  terminal = toml_terminal(color)
  call get_argument(1, manifest)
  if (.not.allocated(manifest)) manifest = "fpm.toml"

  call toml_load(table, manifest, error=error, context=context, &
    & config=toml_parser_config(color=terminal))
  call handle_error(error)

  call load_lint_config(config, table, context, terminal, error)
  call handle_error(error)

contains

  subroutine handle_error(error)
    type(toml_error), intent(in), optional :: error
    if (present(error)) then
      write(stderr, '(a)') error%message
      stop 1
    end if
  end subroutine handle_error

end program main

Running our linter on this file will correctly flag this as an error since a string value is provided rather than a boolean value.

❯ fpm run -- fpm.toml
error: Entry in 'package-name' must be boolean
 --> fpm.toml:4:16-21
  |
4 | package-name = "true"
  |                ^^^^^^ expected boolean value
  |

Finally, we define a logging mechanism to capture our actual linting messages which are not fatal. The logger provides two procedures, add_message to store a message and show_log to display all stored messages.

src/logger.f90#
module fpm_lint_logger
  implicit none
  private

  public :: lint_logger, new_logger


  type :: log_message
    character(:), allocatable :: output
  end type log_message

  type :: lint_logger
    type(log_message), allocatable :: message(:)
  contains
    procedure :: add_message
    procedure :: show_log
  end type lint_logger

contains

  subroutine new_logger(logger)
    type(lint_logger), intent(out) :: logger

    allocate(logger%message(0))
  end subroutine new_logger

  subroutine add_message(logger, message)
    class(lint_logger), intent(inout) :: logger
    character(*), intent(in) :: message

    logger%message = [logger%message, log_message(message)]
  end subroutine add_message

  subroutine show_log(logger, io)
    class(lint_logger), intent(in) :: logger
    integer, intent(in) :: io

    integer :: it

    do it = 1, size(logger%message)
      write(io, '(a)') logger%message(it)%output
    end do
  end subroutine show_log

end module fpm_lint_logger

Bare key paths preferred#

TOML allows to quote keys, however this might become visually distracting if some keys are quoted and others are not. With our package name rule, there should not be the need to quote any keys even in dependency sections.

To determine whether a string is used in the context of a key we need a way to identify all keys. We could check all entries in the data structures by implementing a visitor object which walks through all tables and checks the keys. However, this is somewhat inefficient and we can also miss keys that are not recorded.

fpm.toml#
name = "demo"

[dependencies]
toml-f.git = "https://github.com/toml-f/toml-f"
"toml-f".tag = "v0.2.3"

In this example, the second occurrence of the key toml-f will only reference the table but it is already defined the line before. The quotation marks are visually identifiable as lint and we need a programmatic way to flag this.

Instead of working with the data structure, we will use the parser to record more tokens in the context. Rather than using the context to only report errors, we will use it to identify keys. This is done by increasing the context_detail option in the config keyword of the parser to one. Now all tokens except for whitespace and comments will be recorded.

app/main.f90#
call toml_load(table, manifest, error=error, context=context, &
  & config=toml_parser_config(color=color, context_detail=1))

Tip

Increasing the context_detail to two will also record whitespace and comments. This can be useful when writing checks for whitespace or indentation styles.

Our linter pass will work as follows:

  1. identifying all relevant keys in the manifest

  2. check whether they are keypath tokens

  3. create a report for any key that is a string or a literal

Our implementation reflects this by first collecting an array of toml_key objects in list and then iterating over all entries checking whether they are the correct token_kind.

src/lint.f90#
  !> Entry point for linting the keys in the TOML document
  subroutine lint_keys(logger, config, context, terminal)
    !> Instance of the logger
    type(lint_logger), intent(inout) :: logger
    !> Configuration for the linter
    type(lint_config), intent(in) :: config
    !> Context describing the data structure
    type(toml_context), intent(in) :: context
    !> Terminal for output
    type(toml_terminal), intent(in) :: terminal

    integer :: it
    type(toml_key), allocatable :: list(:)

    call identify_keys(list, context)

    if (config%bare_keys) then
      do it = 1, size(list)
        associate(token => context%token(list(it)%origin))
          if (token%kind /= token_kind%keypath) then
            call logger%add_message(context%report( &
              "String used in key path", &
              list(it)%origin, &
              "use bare key instead", &
              level=toml_level%info, color=terminal))
          end if
        end associate
      end do
    end if
  end subroutine lint_keys

To create the list we need to implement the identify_keys procedure. The rules in TOML for key paths are simple: before an equal sign we can have key paths and keypath can only be present in table bodies or inline tables. This can be implemented by using a stack storing whether the current scope belongs in a table, array, or value.

We will always push a new scope on the respective token opening it, i.e. a value is opened by an equal sign, an array by a right bracket, and an inline table by a right curly brace. To distinguish table headers from inline arrays we only push arrays on our stack after an equal sign. Finally, we default to a table scope if no other scope is present and we have collected all required rules to identify key paths. Similarly, we can identify the endings of the scopes.

We then can check whether the current scope on the top of the stack allows key paths and record those in our list.

src/lint.f90#
  !> Collect all key paths used in TOML document
  subroutine identify_keys(list, context)
    !> List of all keypaths in the TOML document
    type(toml_key), allocatable, intent(out) :: list(:)
    !> Context describing the data structure
    type(toml_context), intent(in) :: context

    integer, parameter :: table_scope = 1, array_scope = 2, value_scope = 3
    integer :: it, top
    integer, allocatable :: scopes(:)

    allocate(list(0))

    top = 0
    call resize(scopes)

    ! Documents always start with a table scope
    call push_back(scopes, top, table_scope)
    do it = 1, context%top
      select case(context%token(it)%kind)
      case(token_kind%keypath)
        ! Record all key path
        associate(token => context%token(it))
          list = [list, toml_key(context%source(token%first:token%last), it)]
        end associate

      case(token_kind%string, token_kind%literal)
        ! Record all strings used in key paths
        if (scopes(top) == table_scope) then
          associate(token => context%token(it))
            list = [list, toml_key(context%source(token%first+1:token%last-1), it)]
          end associate
        end if

      case(token_kind%equal)  ! Open value scope
        call push_back(scopes, top, value_scope)

      case(token_kind%lbrace)  ! Open inline table scope
        call push_back(scopes, top, table_scope)

      case(token_kind%lbracket)  ! Open array scope
        if (scopes(top) /= table_scope) then
          call push_back(scopes, top, array_scope)
        end if

      case(token_kind%newline)  ! Close value scope in key-value pair 
        call pop(scopes, top, value_scope)

      case(token_kind%rbrace)  ! Close value and table scope in inline table
        call pop(scopes, top, value_scope)
        call pop(scopes, top, table_scope)

      case(token_kind%comma)  ! Close value scope in inline table
        call pop(scopes, top, value_scope)

      case(token_kind%rbracket)  ! Close array scope
        call pop(scopes, top, array_scope)

      end select
    end do

  contains

    !> Push a new scope onto the stack
    pure subroutine push_back(scopes, top, this_scope)
      !> Stack top
      integer, intent(inout) :: top
      !> Current stack of scopes
      integer, allocatable, intent(inout) :: scopes(:)
      !> Scope to push onto the stack
      integer, intent(in) :: this_scope

      top = top + 1
      if (top > size(scopes)) call resize(scopes)
      scopes(top) = this_scope
    end subroutine push_back

    !> Remove a matching scope from the stack
    subroutine pop(scopes, top, this_scope)
      !> Stack top
      integer, intent(inout) :: top
      !> Current stack of scopes
      integer, allocatable, intent(inout) :: scopes(:)
      !> Scope to remove from the stack
      integer, intent(in) :: this_scope

      if (top > 0) then
        if (scopes(top) == this_scope) top = top - 1
      end if
    end subroutine pop

  end subroutine identify_keys

For convenience, we implement a push_back and pop function to add and remove scopes from our stack. The pop function will additionally perform a check whether we want to remove a matching scope and save us some repetition in the loop this way.

In our utility module, we implement the resize procedure for an array of integers

src/utils.f90#
!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
  implicit none
  private

  public :: resize
  public :: get_argument

  !> Resize a 1D array to a new size
  interface resize
    module procedure :: resize_ints
  end interface resize

contains

  !> Reallocate list of integer
  pure subroutine resize_ints(var, n)
    !> Instance of the array to be resized
    integer, allocatable, intent(inout) :: var(:)
    !> Dimension of the final array size
    integer, intent(in), optional :: n

    integer, allocatable :: tmp(:)
    integer :: this_size, new_size
    integer, parameter :: initial_size = 8

    if (allocated(var)) then
      this_size = size(var, 1)
      call move_alloc(var, tmp)
    else
      this_size = initial_size
    end if

    if (present(n)) then
      new_size = n
    else
      new_size = this_size + this_size/2 + 1
    end if

    allocate(var(new_size))

    if (allocated(tmp)) then
      this_size = min(size(tmp, 1), size(var, 1))
      var(:this_size) = tmp(:this_size)
      deallocate(tmp)
    end if
  end subroutine resize_ints

end module fpm_lint_utils
current main program

Putting everything together in the main program should look like this.

app/main.f90#
program main
  use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
  use fpm_lint, only : lint_config, load_lint_config, lint_logger, new_logger, &
    & lint_data, lint_keys, get_argument
  use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config, &
    & toml_terminal
  implicit none
  logical, parameter :: color = .true.
  integer, parameter :: detail = 1
  character(:), allocatable :: manifest
  type(toml_terminal) :: terminal
  type(toml_table), allocatable :: table
  type(toml_error), allocatable :: error
  type(toml_context) :: context
  type(lint_logger) :: logger
  type(lint_config) :: config

  terminal = toml_terminal(color)
  call get_argument(1, manifest)
  if (.not.allocated(manifest)) manifest = "fpm.toml"

  call toml_load(table, manifest, error=error, context=context, &
    & config=toml_parser_config(color=terminal, context_detail=detail))
  call handle_error(error)

  call load_lint_config(config, table, context, terminal, error)
  call handle_error(error)

  call new_logger(logger)

  call lint_data(logger, config, table, context, terminal)
  call lint_keys(logger, config, context, terminal)

  call logger%show_log(stdout)

contains

  subroutine handle_error(error)
    type(toml_error), intent(in), optional :: error
    if (present(error)) then
      write(stderr, '(a)') error%message
      stop 1
    end if
  end subroutine handle_error

end program main

At this point, we can now add a call in our main program to run the linter.

❯ fpm run -- fpm.toml
info: String used in key path
 --> fpm.toml:5:1-8
  |
5 | "toml-f".tag = "v0.2.3"
  | ^^^^^^^^ use bare key instead
  |

Now for something more tricky with an inline table to check whether our scoping rules are working correctly.

fpm.toml#
name = "demo"

[dependencies]
toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}

Our linter can correctly identify the tag entry as a string in the key path context and produces the appropriate message.

❯ fpm run -- fpm.toml
info: String used in key path
 --> fpm.toml:4:53-57
  |
4 | toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}
  |                                                     ^^^^^ use bare key instead
  |

Exercise

Previously, we flagged the usage of a literal string as a value for the package name, however a package manifest can contain much more string values.

Create a check for all string values in the manifest to ensure they use double-quotes. Collect string values (string, literal, mstring, and mliteral) from array and value scopes for this purpose.

Can you make a meaningful suggestion if a literal string contains characters that must be escaped in a double-quoted string?

Summary#

This concludes the linting we wanted to implement for the fpm package manifest. For a feature-complete linter, the rule set to check for is usually growing with time and might also shift as new rules are adopted. Our linter currently provides only a few rules but has the potential to include more checks as the need arises.

Exercise

Our output is currently in the order of the checks, rather than in the order of reports occurring in the TOML document. The output of the reports might become more intuitive if it was sorted according to the source lines.

Record the first character in the output together with the messages in the logger. Have the logger sort the messages according to their order before printing them.

Important

In this tutorial, you have learned how to report custom error messages in your TOML input data. You can now

  • report colorized error messages with rich context information

  • create error messages when reading a TOML data structure

  • control the details captured in the context describing the TOML document

  • check a TOML document based on the token information in the context