Validation with Smart Constructors

Some viewers have asked me about validation in Haskell. There are multiple kinds of validation; checking that values have a certain structure, which is deterministic, and things like querying a database to see if a user name is already taken, which is effectful and indeterministic. Today, I'll focus on simple deterministic validation using smart constructors.

Show Notes

One way of ensuring a value is valid is to have it broken down into simpler parts, and enforce a correct-by-construction guarantee that way. A nice example of this is the modern-uri package and its URI data type. The individual parts of a URI are required to construct a URI value directly, using the constructor. You can also parse a URI string, but you then need to handle invalid URIs.

Another good example is the NonEmpty list data type in base, which guarantees that there’s at least one element in the list, by keeping the first element and the rest of the list separated.

Example: Serial Numbers

We are going to implement the SerialNumber module, a simple example including some serial number validation rules. We can split the serial number into some smaller pieces, or groups, but the groups are chunks of text that wont be further split up, and instead validated against a set of legal characters. We will instead use a technique called smart constructors, and make it impossible to obtain a SerialNumber value without having it validated.

The SerialNumber datatype is a newtype wrapper around a list of Text values.

module SerialNumber where

import           Data.Text    (Text)
import qualified Data.Text    as Text

newtype SerialNumber =
  SerialNumber [Text]
  deriving (Eq, Show)

makeSerialNumber validates a Text value, and returns either a validation error, or a SerialNumber value. The user of this function can pass a Text value directly, and have the smart constructor deal with proper text splitting.

makeSerialNumber :: Text -> Either ValidationError SerialNumber

The ValidationError data type encodes the validation errors that can occur. We’ll add those as we go along.

data ValidationError
  deriving (Eq, Show)

The crucial part of the smart constructors approach is to not expose the data constructor in the module exports list. We only the SerialNumber type, not the SerialNumber constructor. We do export ValidationError along with all its constructors, and the makeSerialNumber function, which is our smart constructor.

module SerialNumber
  ( SerialNumber
  , ValidationError(..)
  , makeSerialNumber
  ) where

Now, let’s implement makeSerialNumber. We will split the text string into groups, and validate each group individually. We split by a dash separator. If there are 4 valid groups, we have a valid serial number. Otherwise, it’s got the wrong number of groups.

data ValidationError
  = WrongNumberOfGroups Int
  | InvalidGroupLength Int
  deriving (Eq, Show)

makeSerialNumber :: Text -> Either ValidationError SerialNumber
makeSerialNumber t = do
  gs <- mapM validateGroup (Text.splitOn (Text.singleton '-') t)
  if length gs == 4
    then Right (SerialNumber gs)
    else Left (WrongNumberOfGroups (length gs))
-- ...

We need to define the validateGroup function. If the group length is not 4, it’s invalid. We define len in the where block. If the set of invalid characters in the group is not empty, it’s invalid. The invalid characters set is defined as the difference between the provided ones and the set of valid characters.

-- ...
    validateGroup group
      | len /= 4 = Left (InvalidGroupLength len group)
      | not (HashSet.null invalidChars) =
          Left (InvalidCharacters invalidChars)
      | otherwise = Right group
        len = Text.length group
        invalidChars =
            (HashSet.fromList (Text.unpack group))

We need some new imports for working with HashSets.

import           Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet

And we add a constructor for the InvalidCharacters error.

data ValidationError
  = WrongNumberOfGroups Int
  | InvalidGroupLength Int
  | InvalidCharacters (HashSet Char)
  deriving (Eq, Show)

Finally, we will define the valid characters to be a set of uppercase “A” to “Z”, and digits.

validChars :: HashSet Char
validChars = HashSet.fromList (['A' .. 'Z'] ++ ['0' .. '9'])

In case the two invalid patterns in validateGroup weren’t matched, we have a valid group.

Validating in GHCi

Let’s try it in the REPL. We run the module command on SerialNumber to only have access to exported definitions; we can’t use the SerialNumber constructor directly.

*SerialNumber SerialNumber> :module SerialNumber
Prelude SerialNumber>

To construct Text values, we’ll use the OverloadedStrings extension.

Prelude SerialNumber> :set -XOverloadedStrings

This example has invalid characters:

Prelude SerialNumber> makeSerialNumber "1234-aaaa-bbbb-cccc"
Left (InvalidCharacters (fromList "a"))

The next try has an invalid group length.

Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCCC"
Left (InvalidGroupLength 5 "CCCCC")

And this one has has the wrong number of groups.

Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB"
Left (WrongNumberOfGroups 3)

Our last attempt is good.

Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCC"
Right (SerialNumber ["1234","AAAA","BBBB","CCCC"])

Using the Validated Serial Number

Other functions can use the validated SerialNumber value with certainty. Let’s create a rendering function that takes us back to a valid serial number text string.

renderSerialNumber :: SerialNumber -> Text
renderSerialNumber (SerialNumber groups) =
  Text.intercalate separator groups

It will join the groups by the separator, which we will extract from the previous code, to instead be defined at the top level.

separator :: Text
separator = Text.singleton '-'

makeSerialNumber :: Text -> Either ValidationError SerialNumber
makeSerialNumber t = do
  gs <- mapM validateGroup (Text.splitOn separator t)

We need to add renderSerialNumber to the exports list.

module SerialNumber
  ( SerialNumber
  , ValidationError(..)
  , makeSerialNumber
  , renderSerialNumber
  ) where

Now we can create a serial number, and map renderSerialNumber over the smart constructor result.

Prelude SerialNumber> renderSerialNumber <$> makeSerialNumber "1234-AAAA-BBBB-CCCC"
Right "1234-AAAA-BBBB-CCCC"


Using smart constructors you can make sure values have been validated, resulting in a sort of correct-by-construction guarantee.

As soon as you have created the value, even from user input, you can pass it around and be confident in its validity. You won’t need to do ad hoc defensive programming. If these serial number groups had some special meanings to other parts of the program, they could be extracted safely. We could even change the newtype to hold four separate Text values instead of a list, to avoid partiality.

Also, if these values are constructed regularly in your own code, like in tests, you can create a QuasiQuoter, using the smart constructor, and have them validated at compile-time. Then you don’t need to handle validation errors at runtime; you’d get the SerialNumber value directly, in our example.

That’s it for this brief introduction to smart constructors.

Source Code

The source code for the full series is available at