Building a linter
Contents
Building a linter#
This tutorial will show how to use TOML Fortran to build a linter for your configuration files. Linters provide a way to encourage or enforce a certain style or flag up common usage errors.
Target selection#
This tutorial will look into finding lint in the package manifest from the Fortran package manager (fpm).
We will use its plugin mechanism to create a new subcommand called lint
.
We start with setting up the package manifest for our linter:
name = "fpm-lint"
version = "0.1.0"
[dependencies]
toml-f.git = "https://github.com/toml-lang/toml-f.git"
Configuration of the linter#
To configure our linter we will use the extra section in the manifest which is specially reserved for tools integrating with fpm and boldly claim extra.fpm.lint as our configuration section. Using the package manifest provides us with two advantages, first this document will be present in all projects using fpm, second if we can read our configuration from the manifest, we are already sure it is valid TOML.
# ...
[extra.fpm.lint]
package-name = true
bare-keys = true
Now we will set up our main program to run the linter.
program main
use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
use fpm_lint_utils, only : get_argument
use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config
implicit none
logical, parameter :: color = .true.
character(:), allocatable :: manifest
type(toml_table), allocatable :: table
type(toml_error), allocatable :: error
type(toml_context) :: context
call get_argument(1, manifest)
if (.not.allocated(manifest)) manifest = "fpm.toml"
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=color))
call handle_error(error)
contains
subroutine handle_error(error)
type(toml_error), intent(in), optional :: error
if (present(error)) then
write(stderr, '(a)') error%message
stop 1
end if
end subroutine handle_error
end program main
We create a utility module for the get_argument function used to retrieve the manifest name, in most cases we can default to fpm.toml, but for testing it is convenient to pass an argument.
!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
implicit none
private
public :: get_argument
contains
!> Obtain the command line argument at a given index
subroutine get_argument(idx, arg)
!> Index of command line argument, range [0:command_argument_count()]
integer, intent(in) :: idx
!> Command line argument
character(len=:), allocatable, intent(out) :: arg
integer :: length, stat
call get_command_argument(idx, length=length, status=stat)
if (stat == 0) then
allocate(character(len=length) :: arg, stat=stat)
end if
if (stat == 0 .and. length > 0) then
call get_command_argument(idx, arg, status=stat)
if (stat /= 0) deallocate(arg)
end if
end subroutine get_argument
end module fpm_lint_utils
The first error source we can encounter stems from parsing the TOML document itself. This is outside of our responsibility to handle, still we want to check whether we can report the error correctly.
name = "demo"
[extra.fpm.lint]
package-name =
bare-keys = true
Running the linter on this document will break with the following message produced by the toml_load procedure.
❯ fpm run -- invalid.toml error: Invalid expression for value --> invalid.toml:4:15 | 4 | package-name = | ^ unexpected newline |
With this case covered we proceed with reading the configuration for our linter.
Our configuration from the package manifest will be stored in a lint_config type which we define in a separate module. Reading the configuration will happen from the root table, meaning we have to advance through several subtables first before we can process the options for our linter. We want to report errors with rich context information here as well, therefore we request the origin in every call to the get_value interface and produce a report using the context we obtained in the main program.
!> Configuration data for the manifest linting
module fpm_lint_config
use tomlf, only : toml_table, toml_context, toml_terminal, toml_error, &
& toml_stat, get_value
implicit none
!> Configuration for the manifest linting
type :: lint_config
!> Check package name
logical :: package_name
!> Check all key paths
logical :: bare_keys
end type lint_config
contains
!> Load the configuration for the linter from the package manifest
subroutine load_lint_config(config, table, context, terminal, error)
!> Configuration for the linter
type(lint_config), intent(out) :: config
!> TOML data structure representing the manifest
type(toml_table), intent(inout) :: table
!> Context describing the data structure
type(toml_context), intent(in) :: context
!> Terminal for output
type(toml_terminal), intent(in) :: terminal
!> Error handler
type(toml_error), allocatable, intent(out) :: error
integer :: origin, stat
type(toml_table), pointer :: child1, child2, child3
call get_value(table, "extra", child1, origin=origin)
if (.not.associated(child1)) then
call make_error(error, context%report("The 'extra' table is missing.", &
& origin, "expected table", color=terminal))
return
end if
call get_value(child1, "fpm", child2, origin=origin)
if (.not.associated(child2)) then
call make_error(error, context%report("The 'fpm' table is missing.", &
& origin, "expected table", color=terminal))
return
end if
call get_value(child2, "lint", child3, origin=origin)
if (.not.associated(child3)) then
call make_error(error, context%report("The 'lint' table is missing.", &
& origin, "expected table", color=terminal))
return
end if
call get_value(child3, "package-name", config%package_name, .true., &
& stat=stat, origin=origin)
if (stat /= toml_stat%success) then
call make_error(error, context%report("Entry in 'package-name' must be boolean", &
& origin, "expected boolean value", color=terminal))
return
end if
call get_value(child3, "bare-keys", config%bare_keys, .true., &
& stat=stat, origin=origin)
if (stat /= toml_stat%success) then
call make_error(error, context%report("Entry in 'bare-key' must be boolean", &
& origin, "expected boolean value", color=terminal))
return
end if
end subroutine load_lint_config
!> Create an error message
subroutine make_error(error, message)
!> Error handler
type(toml_error), allocatable, intent(out) :: error
!> Message to be displayed
character(len=*), intent(in) :: message
allocate(error)
error%message = message
error%stat = toml_stat%fatal
end subroutine make_error
end module fpm_lint_config
For convenience, we defined a make_error routine to allocate the error handler and store our report from the context. At this point, we should check whether our error reporting works and run the linter on an incorrect TOML document.
name = "demo"
[extra.fpm.lint]
package-name = "true"
bare-keys = true
current main program
Putting everything together in the main program should look like this.
program main
use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
use fpm_lint_config, only : lint_config, load_lint_config
use fpm_lint_utils, only : get_argument
use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config, &
& toml_terminal
implicit none
logical, parameter :: color = .true.
character(:), allocatable :: manifest
type(toml_terminal) :: terminal
type(toml_table), allocatable :: table
type(toml_error), allocatable :: error
type(toml_context) :: context
type(lint_config) :: config
terminal = toml_terminal(color)
call get_argument(1, manifest)
if (.not.allocated(manifest)) manifest = "fpm.toml"
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=terminal))
call handle_error(error)
call load_lint_config(config, table, context, terminal, error)
call handle_error(error)
contains
subroutine handle_error(error)
type(toml_error), intent(in), optional :: error
if (present(error)) then
write(stderr, '(a)') error%message
stop 1
end if
end subroutine handle_error
end program main
Running our linter on this file will correctly flag this as an error since a string value is provided rather than a boolean value.
❯ fpm run -- fpm.toml error: Entry in 'package-name' must be boolean --> fpm.toml:4:16-21 | 4 | package-name = "true" | ^^^^^^ expected boolean value |
Finally, we define a logging mechanism to capture our actual linting messages which are not fatal. The logger provides two procedures, add_message to store a message and show_log to display all stored messages.
module fpm_lint_logger
implicit none
private
public :: lint_logger, new_logger
type :: log_message
character(:), allocatable :: output
end type log_message
type :: lint_logger
type(log_message), allocatable :: message(:)
contains
procedure :: add_message
procedure :: show_log
end type lint_logger
contains
subroutine new_logger(logger)
type(lint_logger), intent(out) :: logger
allocate(logger%message(0))
end subroutine new_logger
subroutine add_message(logger, message)
class(lint_logger), intent(inout) :: logger
character(*), intent(in) :: message
logger%message = [logger%message, log_message(message)]
end subroutine add_message
subroutine show_log(logger, io)
class(lint_logger), intent(in) :: logger
integer, intent(in) :: io
integer :: it
do it = 1, size(logger%message)
write(io, '(a)') logger%message(it)%output
end do
end subroutine show_log
end module fpm_lint_logger
Recommended package name#
As a first linting check we will inspect the package name, for this we will apply the following rules:
the package name should be a TOML bare key to not require quotes in dependency sections, characters like dots, colons, or slashes are not allowed
TOML generally favors lowercase dashed keys, therefore we will discourage capitalization (camelCase and PascalCase) as well as underscores (snake_case)
there are several ways to declare strings in TOML, we want to favor the normal string one
An example of a package name we would disallow would be fpmLinter as seen in the manifest below.
name = "fpmLinter"
Let’s start with our implementation of this check. For convenience we will reexport the other modules from the fpm_lint module, this allows one clean import in the main program. Then we define the lint_data procedure, where we first check whether the name key is present, if not we create a message at the info level and leave our block scope, as all further checks rely on the presence of the entry.
We can now check whether the entry is provided as a string or maybe as something else, like a literal string, which we can flag. Furthermore, we verify that the package name uses only lowercase letters, numbers, and dashes with the verify intrinsic.
!> Linter for package manifests used with the Fortran package manager
module fpm_lint
use fpm_lint_config, only : lint_config, load_lint_config
use fpm_lint_logger, only : lint_logger, new_logger
use fpm_lint_utils, only : resize, get_argument
use tomlf, only : toml_table, toml_context, toml_terminal, toml_error, toml_level, &
& toml_key, get_value
use tomlf_de_token, only : token_kind, stringify
implicit none
private
public :: lint_config, load_lint_config
public :: lint_logger, new_logger
public :: lint_data
public :: get_argument
contains
!> Entry point for linting the data structure representing the package manifest
subroutine lint_data(logger, config, table, context, terminal)
!> Instance of the logger
type(lint_logger), intent(inout) :: logger
!> Configuration for the linter
type(lint_config), intent(in) :: config
!> TOML data structure
type(toml_table), intent(inout) :: table
!> Context describing the data structure
type(toml_context), intent(in) :: context
!> Terminal for output
type(toml_terminal), intent(in) :: terminal
if (config%package_name) then
check_package_name: block
character(:), allocatable :: package_name
integer :: origin
call get_value(table, "name", package_name, origin=origin)
if (.not.allocated(package_name)) then
call logger%add_message(context%report( &
"Package name entry is required in the top-level", &
origin, &
level=toml_level%info, color=terminal))
exit check_package_name
end if
if (context%token(origin)%kind /= token_kind%string) then
call logger%add_message(context%report( &
"Package name should not be a "//stringify(context%token(origin)), &
origin, &
"prefer string value (double quotes)", &
level=toml_level%info, color=terminal))
end if
if (verify(package_name, "abcdefghijklmnopqrstuvwxyz0123456789-") > 0) then
call logger%add_message(context%report( &
"Package name should be lowercase with dashes", &
origin, &
level=toml_level%info, color=terminal))
end if
end block check_package_name
end if
end subroutine lint_data
end module fpm_lint
Tip
The toml_level
parameter provides a statically initialized derived type enumerating all available report levels.
Similarly, the token_kind
parameter provides an enumeration of the token kinds.
You can think of it as an enumerator with a proper namespace.
current main program
Putting everything together in the main program should look like this.
program main
use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
use fpm_lint, only : lint_config, load_lint_config, lint_logger, new_logger, &
& lint_data, get_argument
use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config, &
& toml_terminal
implicit none
logical, parameter :: color = .true.
integer, parameter :: detail = 1
character(:), allocatable :: manifest
type(toml_terminal) :: terminal
type(toml_table), allocatable :: table
type(toml_error), allocatable :: error
type(toml_context) :: context
type(lint_logger) :: logger
type(lint_config) :: config
terminal = toml_terminal(color)
call get_argument(1, manifest)
if (.not.allocated(manifest)) manifest = "fpm.toml"
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=terminal, context_detail=detail))
call handle_error(error)
call load_lint_config(config, table, context, terminal, error)
call handle_error(error)
call new_logger(logger)
call lint_data(logger, config, table, context, terminal)
call logger%show_log(stdout)
contains
subroutine handle_error(error)
type(toml_error), intent(in), optional :: error
if (present(error)) then
write(stderr, '(a)') error%message
stop 1
end if
end subroutine handle_error
end program main
We check this on the camelCase package name from above and can find the following output.
❯ fpm run -- fpm.toml info: Package name should be lowercase with dashes --> fpm.toml:1:8-18 | 1 | name = "fpmLinter" | ^^^^^^^^^^^ |
Exercise
Add a check for the length of the package name, everything under three characters is probably a bad choice, so is a too long package name.
Create an example to trigger the error with your new check. What happens if a too long camelCase package name is used?
Bare key paths preferred#
TOML allows to quote keys, however this might become visually distracting if some keys are quoted and others are not. With our package name rule, there should not be the need to quote any keys even in dependency sections.
To determine whether a string is used in the context of a key we need a way to identify all keys. We could check all entries in the data structures by implementing a visitor object which walks through all tables and checks the keys. However, this is somewhat inefficient and we can also miss keys that are not recorded.
name = "demo"
[dependencies]
toml-f.git = "https://github.com/toml-f/toml-f"
"toml-f".tag = "v0.2.3"
In this example, the second occurrence of the key toml-f
will only reference the table but it is already defined the line before.
The quotation marks are visually identifiable as lint and we need a programmatic way to flag this.
Instead of working with the data structure, we will use the parser to record more tokens in the context. Rather than using the context to only report errors, we will use it to identify keys. This is done by increasing the context_detail option in the config keyword of the parser to one. Now all tokens except for whitespace and comments will be recorded.
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=color, context_detail=1))
Tip
Increasing the context_detail
to two will also record whitespace and comments.
This can be useful when writing checks for whitespace or indentation styles.
Our linter pass will work as follows:
identifying all relevant keys in the manifest
check whether they are keypath tokens
create a report for any key that is a string or a literal
Our implementation reflects this by first collecting an array of toml_key objects in list and then iterating over all entries checking whether they are the correct token_kind.
!> Entry point for linting the keys in the TOML document
subroutine lint_keys(logger, config, context, terminal)
!> Instance of the logger
type(lint_logger), intent(inout) :: logger
!> Configuration for the linter
type(lint_config), intent(in) :: config
!> Context describing the data structure
type(toml_context), intent(in) :: context
!> Terminal for output
type(toml_terminal), intent(in) :: terminal
integer :: it
type(toml_key), allocatable :: list(:)
call identify_keys(list, context)
if (config%bare_keys) then
do it = 1, size(list)
associate(token => context%token(list(it)%origin))
if (token%kind /= token_kind%keypath) then
call logger%add_message(context%report( &
"String used in key path", &
list(it)%origin, &
"use bare key instead", &
level=toml_level%info, color=terminal))
end if
end associate
end do
end if
end subroutine lint_keys
To create the list we need to implement the identify_keys procedure. The rules in TOML for key paths are simple: before an equal sign we can have key paths and keypath can only be present in table bodies or inline tables. This can be implemented by using a stack storing whether the current scope belongs in a table, array, or value.
We will always push a new scope on the respective token opening it, i.e. a value is opened by an equal sign, an array by a right bracket, and an inline table by a right curly brace. To distinguish table headers from inline arrays we only push arrays on our stack after an equal sign. Finally, we default to a table scope if no other scope is present and we have collected all required rules to identify key paths. Similarly, we can identify the endings of the scopes.
We then can check whether the current scope on the top of the stack allows key paths and record those in our list.
!> Collect all key paths used in TOML document
subroutine identify_keys(list, context)
!> List of all keypaths in the TOML document
type(toml_key), allocatable, intent(out) :: list(:)
!> Context describing the data structure
type(toml_context), intent(in) :: context
integer, parameter :: table_scope = 1, array_scope = 2, value_scope = 3
integer :: it, top
integer, allocatable :: scopes(:)
allocate(list(0))
top = 0
call resize(scopes)
! Documents always start with a table scope
call push_back(scopes, top, table_scope)
do it = 1, context%top
select case(context%token(it)%kind)
case(token_kind%keypath)
! Record all key path
associate(token => context%token(it))
list = [list, toml_key(context%source(token%first:token%last), it)]
end associate
case(token_kind%string, token_kind%literal)
! Record all strings used in key paths
if (scopes(top) == table_scope) then
associate(token => context%token(it))
list = [list, toml_key(context%source(token%first+1:token%last-1), it)]
end associate
end if
case(token_kind%equal) ! Open value scope
call push_back(scopes, top, value_scope)
case(token_kind%lbrace) ! Open inline table scope
call push_back(scopes, top, table_scope)
case(token_kind%lbracket) ! Open array scope
if (scopes(top) /= table_scope) then
call push_back(scopes, top, array_scope)
end if
case(token_kind%newline) ! Close value scope in key-value pair
call pop(scopes, top, value_scope)
case(token_kind%rbrace) ! Close value and table scope in inline table
call pop(scopes, top, value_scope)
call pop(scopes, top, table_scope)
case(token_kind%comma) ! Close value scope in inline table
call pop(scopes, top, value_scope)
case(token_kind%rbracket) ! Close array scope
call pop(scopes, top, array_scope)
end select
end do
contains
!> Push a new scope onto the stack
pure subroutine push_back(scopes, top, this_scope)
!> Stack top
integer, intent(inout) :: top
!> Current stack of scopes
integer, allocatable, intent(inout) :: scopes(:)
!> Scope to push onto the stack
integer, intent(in) :: this_scope
top = top + 1
if (top > size(scopes)) call resize(scopes)
scopes(top) = this_scope
end subroutine push_back
!> Remove a matching scope from the stack
subroutine pop(scopes, top, this_scope)
!> Stack top
integer, intent(inout) :: top
!> Current stack of scopes
integer, allocatable, intent(inout) :: scopes(:)
!> Scope to remove from the stack
integer, intent(in) :: this_scope
if (top > 0) then
if (scopes(top) == this_scope) top = top - 1
end if
end subroutine pop
end subroutine identify_keys
For convenience, we implement a push_back and pop function to add and remove scopes from our stack. The pop function will additionally perform a check whether we want to remove a matching scope and save us some repetition in the loop this way.
In our utility module, we implement the resize procedure for an array of integers
!> Misc utilities for the fpm-lint implementation
module fpm_lint_utils
implicit none
private
public :: resize
public :: get_argument
!> Resize a 1D array to a new size
interface resize
module procedure :: resize_ints
end interface resize
contains
!> Reallocate list of integer
pure subroutine resize_ints(var, n)
!> Instance of the array to be resized
integer, allocatable, intent(inout) :: var(:)
!> Dimension of the final array size
integer, intent(in), optional :: n
integer, allocatable :: tmp(:)
integer :: this_size, new_size
integer, parameter :: initial_size = 8
if (allocated(var)) then
this_size = size(var, 1)
call move_alloc(var, tmp)
else
this_size = initial_size
end if
if (present(n)) then
new_size = n
else
new_size = this_size + this_size/2 + 1
end if
allocate(var(new_size))
if (allocated(tmp)) then
this_size = min(size(tmp, 1), size(var, 1))
var(:this_size) = tmp(:this_size)
deallocate(tmp)
end if
end subroutine resize_ints
end module fpm_lint_utils
current main program
Putting everything together in the main program should look like this.
program main
use, intrinsic :: iso_fortran_env, only : stderr => error_unit, stdout => output_unit
use fpm_lint, only : lint_config, load_lint_config, lint_logger, new_logger, &
& lint_data, lint_keys, get_argument
use tomlf, only : toml_table, toml_load, toml_error, toml_context, toml_parser_config, &
& toml_terminal
implicit none
logical, parameter :: color = .true.
integer, parameter :: detail = 1
character(:), allocatable :: manifest
type(toml_terminal) :: terminal
type(toml_table), allocatable :: table
type(toml_error), allocatable :: error
type(toml_context) :: context
type(lint_logger) :: logger
type(lint_config) :: config
terminal = toml_terminal(color)
call get_argument(1, manifest)
if (.not.allocated(manifest)) manifest = "fpm.toml"
call toml_load(table, manifest, error=error, context=context, &
& config=toml_parser_config(color=terminal, context_detail=detail))
call handle_error(error)
call load_lint_config(config, table, context, terminal, error)
call handle_error(error)
call new_logger(logger)
call lint_data(logger, config, table, context, terminal)
call lint_keys(logger, config, context, terminal)
call logger%show_log(stdout)
contains
subroutine handle_error(error)
type(toml_error), intent(in), optional :: error
if (present(error)) then
write(stderr, '(a)') error%message
stop 1
end if
end subroutine handle_error
end program main
At this point, we can now add a call in our main program to run the linter.
❯ fpm run -- fpm.toml info: String used in key path --> fpm.toml:5:1-8 | 5 | "toml-f".tag = "v0.2.3" | ^^^^^^^^ use bare key instead |
Now for something more tricky with an inline table to check whether our scoping rules are working correctly.
name = "demo"
[dependencies]
toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"}
Our linter can correctly identify the tag entry as a string in the key path context and produces the appropriate message.
❯ fpm run -- fpm.toml info: String used in key path --> fpm.toml:4:53-57 | 4 | toml-f = {git = "https://github.com/toml-f/toml-f", "tag" = "v0.2.3"} | ^^^^^ use bare key instead |
Exercise
Previously, we flagged the usage of a literal string as a value for the package name, however a package manifest can contain much more string values.
Create a check for all string values in the manifest to ensure they use double-quotes. Collect string values (string, literal, mstring, and mliteral) from array and value scopes for this purpose.
Can you make a meaningful suggestion if a literal string contains characters that must be escaped in a double-quoted string?
Summary#
This concludes the linting we wanted to implement for the fpm package manifest. For a feature-complete linter, the rule set to check for is usually growing with time and might also shift as new rules are adopted. Our linter currently provides only a few rules but has the potential to include more checks as the need arises.
Exercise
Our output is currently in the order of the checks, rather than in the order of reports occurring in the TOML document. The output of the reports might become more intuitive if it was sorted according to the source lines.
Record the first character in the output together with the messages in the logger. Have the logger sort the messages according to their order before printing them.
Important
In this tutorial, you have learned how to report custom error messages in your TOML input data. You can now
report colorized error messages with rich context information
create error messages when reading a TOML data structure
control the details captured in the context describing the TOML document
check a TOML document based on the token information in the context