Refactoring and Finding the Right Data Structure
Recently, before implementing a new feature in pandoc-include-code, a Pandoc filter I have written, I needed to make some changes to the code. The program allowed what should’ve been considered conflicting modes of operation. In this episode I’ll perform the same modification, to show you how the right data structure can guide and improve your implementation.
Show Notes
The purpose of the filter is to allow code snippets to be included from external files, instead of copy-pasting snippets into the Markdown document. By specifying the source code language, and with the include
attribute specifying a file path, the filter will populate the code block with the contents of that file. By using startLine
and endLine
, you can control the range of lines included from the file.
# My Project
```{.haskell include=Example.hs startLine=1 endLine=6}
```
Another way of including a smaller part of a file is to specify a named snippet. This expects you to have two lines in the included file, usually source code comments, specifying the start and end of the snippet.
## Details About Stuff
```{.haskell include=Example.hs snippet=main}
```
Let’s have a look at the Example.hs
file. Lines 1 to 6 are included by the range code block, and 9 to 11 by the snippet code block. These two comments mark the start and end of the “main” snippet.
module Example where
myExample = do
putStrLn "Hello"
putStrLn "..."
putStrLn "world."
-- start snippet main
main = do
myExample
putStrLn "Bye!"
-- end snippet main
Rendering this Markdown document to HTML, you can get something like the following in a web browser.
In the feature request I got, a user asked for the ability to combine the .numberLines
class in Pandoc with pandoc-include-code
and its named snippets, to have included snippets show line numbers starting from their original line number in the source file.
The following HTML shows the expected output. Notice how it starts at 9, not 1.
When I started on adding support for .numberLines
, I quickly realized a flaw in the implementation of pandoc-include-code
. It had three modes of operation:
- including an entire file,
- including a line range from a file, and
- including a named snippet from a file.
The first case was all right, but the second two had some overlap; there was nothing stopping a user from specifying both snippet
and startLine
/endLine
.
```{.haskell include=Example.hs startLine=1 endLine=6 snippet=main}
```
Using both in combination did not make any sense to me, and it made the feature request much harder to implement, so I decided to first do some refactoring, making the implementation explicitly state the modes of operation, and making them mutually exclusive.
Let’s head back to the revision before my refactoring, and see where we end up.
Introducing the InclusionMode Data Type
The InclusionSpec
data type represents the work to be done by the filter; it carries the file to include, the optional snippet name, the optional range of lines, and also a number of columns to dedent. We won’t touch dedent at all, so you can ignore it.
data InclusionSpec = InclusionSpec
{ include :: FilePath
, snippet :: Maybe Text
, range :: Maybe Range
, dedent :: Maybe Int
}
Remember how I said the modes of operation should be mutually exclusive? If you look at this data type, and specifically at the snippet
and range
fields, we are not representating that design decision in a clear way. By using two optional modes, we do allow for them both to be present. Let’s cut them out, and make the mode explicit using a data type.
data InclusionSpec = InclusionSpec
{ include :: FilePath
, mode :: InclusionMode
, dedent :: Maybe Int
}
The InclusionMode
data type will have three constructors – one for each valid mode of operation – snippet mode, range mode, and entire file mode.
data InclusionMode
= SnippetMode Text
| RangeMode Range
| EntireFileMode
deriving (Show, Eq)
We want these to be comparable and printable using show
, and we see from the type error that it requires the Range
type to provide the corresponding instances.
data Range = Range { rangeStart :: Int
, rangeEnd :: Int
}
deriving (Show, Eq)
Following the Type Errors
Jumping to the next error, we see that there is no longer a range
in scope. This is one of the fields we removed from the InclusionSpec
. Let’s see how the filterLineRange
function is actually used.
allSteps :: Inclusion Text
allSteps =
readIncluded
>>= splitLines
>>= filterLineRange
>>= onlySnippet
>>= dedentLines
>>= joinLines
Here we see that the definition for allSteps
reads the included file, splits the lines, filters by the line range, extracts the named snippet, dedents the lines, and finally joins the lines back together. Now that we’re making the modes mutually exclusive, we can combine filterLineRange
and onlySnippet
into a single step. We’ll rename it includeByMode
.
allSteps :: Inclusion Text
allSteps =
readIncluded
>>= splitLines
>>= includeByMode
>>= dedentLines
>>= joinLines
Instead of asking for an optional range
, we’ll ask for and pattern match on the mode
. In case we’re in RangeMode
, we can do the same thing as before. If we’re in EntireFileMode
, we’ll just return the lines unchanged. The pattern match for Just
in onlySnippet
needs to be merged in.
includeByMode :: Lines -> Inclusion Lines
includeByMode ls = asks mode >>= \case
RangeMode range -> return
(take (rangeEnd range - startIndex) (drop startIndex ls))
where startIndex = pred (rangeStart range)
EntireFileMode -> return ls
SnippetMode name ->
return $ drop 1 $ takeWhile (not . isSnippetEnd name) $ dropWhile
(not . isSnippetStart name)
ls
Nice, we now have all three modes covered. Jumping to the next error, we see that the parsing that constructs our InclusionSpec
needs some changes as well. Instead of the optional range
and snippet
, we now want a single mode
. We rename getRange
to parseRangeMode
, and wrap the parsed range with the RangeMode
constructor. In the same way, snippet
becomes snippetMode
, wrapping the parsed snippet name with the SnippetMode
constructor.
...
where
lookupInt name = HM.lookup name attrs >>= readMaybe
snippetMode = SnippetMode . Text.pack <$> HM.lookup "snippet" attrs
dedent = lookupInt "dedent"
parseRangeMode = case (lookupInt "startLine", lookupInt "endLine") of
(Just start, Just end) -> maybe (throwError (InvalidRange start end))
(return . Just . RangeMode)
(mkRange start end)
(Nothing, Just _ ) -> throwError (IncompleteRange Start)
(Just _ , Nothing) -> throwError (IncompleteRange End)
(Nothing, Nothing) -> return Nothing
Selecting a Single Inclusion Mode
We select a single mode by transforming a list of Maybe InclusionMode
values to a list of modes. The catMaybe
function filters out the Nothing
values, and extracts the Just
values. Pattern matching on that resulting list, if we have no modes specified, we default to EntireFileMode
. In case we have a single mode, we use that. In case we have more than one mode specified, we throw an error, describing the conflicting modes.
...
Just include -> do
rangeMode <- parseRangeMode
mode <-
case catMaybes [rangeMode, snippetMode] of
[] -> return EntireFileMode
[m] -> return m
ms -> throwError (ConflictingModes ms)
return (Just InclusionSpec {..})
...
We need to import catMaybes
from the Data.Maybe
module.
import Data.Maybe (catMaybes)
Jumping to the next error, we are reminded that we have yet to define the ConflictingModes
error constructor. It will hold the list of conflicting modes.
data InclusionError
= InvalidRange Int
Int
| IncompleteRange MissingRangePart
| ConflictingModes [InclusionMode]
deriving (Show, Eq)
And in the last error in this exercise, the compiler kindly informs us that the formatError
function is partial. We add the missing case.
...
where
formatError = \case
InvalidRange start end ->
"Invalid range: " ++ show start ++ " to " ++ show end
IncompleteRange Start -> "Incomplete range: \"startLine\" is missing"
IncompleteRange End -> "Incomplete range: \"endLine\" is missing"
ConflictingModes ms -> "Conflicting inclusion modes: " ++ show ms
Summary
Next error? Nope, we’re done. I hope this highlights how a suitable data structure can guide you towards a much more solid program, using very simple Haskell code.
There are some tests that need changes, that I haven’t shown you, and of course the feature request itself that led to this modification, which I’ve left out. If you’re interested, drop a comment on YouTube or on Patreon, and I might do a follow-up video in the same style.
Thanks for watching!