Validation with Smart Constructors
Some viewers have asked me about validation in Haskell. There are multiple kinds of validation; checking that values have a certain structure, which is deterministic, and things like querying a database to see if a user name is already taken, which is effectful and indeterministic. Today, I’ll focus on simple deterministic validation using smart constructors.
Show Notes
One way of ensuring a value is valid is to have it broken down into simpler parts, and enforce a correct-by-construction guarantee that way. A nice example of this is the modern-uri package and its URI
data type. The individual parts of a URI
are required to construct a URI
value directly, using the constructor. You can also parse a URI string, but you then need to handle invalid URIs.
Another good example is the NonEmpty
list data type in base
, which guarantees that there’s at least one element in the list, by keeping the first element and the rest of the list separated.
Example: Serial Numbers
We are going to implement the SerialNumber
module, a simple example including some serial number validation rules. We can split the serial number into some smaller pieces, or groups, but the groups are chunks of text that wont be further split up, and instead validated against a set of legal characters. We will instead use a technique called smart constructors, and make it impossible to obtain a SerialNumber
value without having it validated.
The SerialNumber
datatype is a newtype wrapper around a list of Text
values.
module SerialNumber where
import Data.Text (Text)
import qualified Data.Text as Text
newtype SerialNumber =
SerialNumber [Text]
deriving (Eq, Show)
makeSerialNumber
validates a Text
value, and returns either a validation error, or a SerialNumber
value. The user of this function can pass a Text
value directly, and have the smart constructor deal with proper text splitting.
makeSerialNumber :: Text -> Either ValidationError SerialNumber
The ValidationError
data type encodes the validation errors that can occur. We’ll add those as we go along.
data ValidationError
deriving (Eq, Show)
The crucial part of the smart constructors approach is to not expose the data constructor in the module exports list. We only the SerialNumber
type, not the SerialNumber
constructor. We do export ValidationError
along with all its constructors, and the makeSerialNumber
function, which is our smart constructor.
module SerialNumber
( SerialNumber
, ValidationError(..)
, makeSerialNumber
) where
Now, let’s implement makeSerialNumber
. We will split the text string into groups, and validate each group individually. We split by a dash separator. If there are 4 valid groups, we have a valid serial number. Otherwise, it’s got the wrong number of groups.
data ValidationError
= WrongNumberOfGroups Int
| InvalidGroupLength Int
Text
deriving (Eq, Show)
makeSerialNumber :: Text -> Either ValidationError SerialNumber
makeSerialNumber t = do
gs <- mapM validateGroup (Text.splitOn (Text.singleton '-') t)
if length gs == 4
then Right (SerialNumber gs)
else Left (WrongNumberOfGroups (length gs))
-- ...
We need to define the validateGroup
function. If the group length is not 4, it’s invalid. We define len
in the where block. If the set of invalid characters in the group is not empty, it’s invalid. The invalid characters set is defined as the difference between the provided ones and the set of valid characters.
-- ...
where
validateGroup group
| len /= 4 = Left (InvalidGroupLength len group)
| not (HashSet.null invalidChars) =
Left (InvalidCharacters invalidChars)
| otherwise = Right group
where
len = Text.length group
invalidChars =
HashSet.difference
(HashSet.fromList (Text.unpack group))
validChars
We need some new imports for working with HashSet
s.
import Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet
And we add a constructor for the InvalidCharacters
error.
data ValidationError
= WrongNumberOfGroups Int
| InvalidGroupLength Int
Text
| InvalidCharacters (HashSet Char)
deriving (Eq, Show)
Finally, we will define the valid characters to be a set of uppercase “A” to “Z”, and digits.
validChars :: HashSet Char
validChars = HashSet.fromList (['A' .. 'Z'] ++ ['0' .. '9'])
In case the two invalid patterns in validateGroup
weren’t matched, we have a valid group.
Validating in GHCi
Let’s try it in the REPL. We run the module
command on SerialNumber
to only have access to exported definitions; we can’t use the SerialNumber
constructor directly.
*SerialNumber SerialNumber> :module SerialNumber
Prelude SerialNumber>
To construct Text
values, we’ll use the OverloadedStrings
extension.
Prelude SerialNumber> :set -XOverloadedStrings
This example has invalid characters:
Prelude SerialNumber> makeSerialNumber "1234-aaaa-bbbb-cccc"
Left (InvalidCharacters (fromList "a"))
The next try has an invalid group length.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCCC"
Left (InvalidGroupLength 5 "CCCCC")
And this one has has the wrong number of groups.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB"
Left (WrongNumberOfGroups 3)
Our last attempt is good.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCC"
Right (SerialNumber ["1234","AAAA","BBBB","CCCC"])
Using the Validated Serial Number
Other functions can use the validated SerialNumber
value with certainty. Let’s create a rendering function that takes us back to a valid serial number text string.
renderSerialNumber :: SerialNumber -> Text
renderSerialNumber (SerialNumber groups) =
Text.intercalate separator groups
It will join the groups by the separator, which we will extract from the previous code, to instead be defined at the top level.
separator :: Text
separator = Text.singleton '-'
makeSerialNumber :: Text -> Either ValidationError SerialNumber
makeSerialNumber t = do
gs <- mapM validateGroup (Text.splitOn separator t)
...
We need to add renderSerialNumber
to the exports list.
module SerialNumber
( SerialNumber
, ValidationError(..)
, makeSerialNumber
, renderSerialNumber
) where
Now we can create a serial number, and map renderSerialNumber
over the smart constructor result.
Prelude SerialNumber> renderSerialNumber <$> makeSerialNumber "1234-AAAA-BBBB-CCCC"
Right "1234-AAAA-BBBB-CCCC"
Summary
Using smart constructors you can make sure values have been validated, resulting in a sort of correct-by-construction guarantee.
As soon as you have created the value, even from user input, you can pass it around and be confident in its validity. You won’t need to do ad hoc defensive programming. If these serial number groups had some special meanings to other parts of the program, they could be extracted safely. We could even change the newtype to hold four separate Text
values instead of a list, to avoid partiality.
Also, if these values are constructed regularly in your own code, like in tests, you can create a QuasiQuoter
, using the smart constructor, and have them validated at compile-time. Then you don’t need to handle validation errors at runtime; you’d get the SerialNumber
value directly, in our example.
That’s it for this brief introduction to smart constructors.
Source Code
The source code for the full series is available at github.com/haskell-at-work/validation-with-smart-constructors.