Validation with Smart Constructors
Some viewers have asked me about validation in Haskell. There are multiple kinds of validation; checking that values have a certain structure, which is deterministic, and things like querying a database to see if a user name is already taken, which is effectful and indeterministic. Today, I’ll focus on simple deterministic validation using smart constructors.
One way of ensuring a value is valid is to have it broken down into simpler parts, and enforce a correct-by-construction guarantee that way. A nice example of this is the modern-uri package and its
URI data type. The individual parts of a
URI are required to construct a
URI value directly, using the constructor. You can also parse a URI string, but you then need to handle invalid URIs.
Another good example is the
NonEmpty list data type in
base, which guarantees that there’s at least one element in the list, by keeping the first element and the rest of the list separated.
Example: Serial Numbers
We are going to implement the
SerialNumber module, a simple example including some serial number validation rules. We can split the serial number into some smaller pieces, or groups, but the groups are chunks of text that wont be further split up, and instead validated against a set of legal characters. We will instead use a technique called smart constructors, and make it impossible to obtain a
SerialNumber value without having it validated.
SerialNumber datatype is a newtype wrapper around a list of
module SerialNumber where import Data.Text (Text) import qualified Data.Text as Text newtype SerialNumber = SerialNumber [Text] deriving (Eq, Show)
makeSerialNumber validates a
Text value, and returns either a validation error, or a
SerialNumber value. The user of this function can pass a
Text value directly, and have the smart constructor deal with proper text splitting.
makeSerialNumber :: Text -> Either ValidationError SerialNumber
ValidationError data type encodes the validation errors that can occur. We’ll add those as we go along.
data ValidationError deriving (Eq, Show)
The crucial part of the smart constructors approach is to not expose the data constructor in the module exports list. We only the
SerialNumber type, not the
SerialNumber constructor. We do export
ValidationError along with all its constructors, and the
makeSerialNumber function, which is our smart constructor.
module SerialNumber ( SerialNumber , ValidationError(..) , makeSerialNumber ) where
Now, let’s implement
makeSerialNumber. We will split the text string into groups, and validate each group individually. We split by a dash separator. If there are 4 valid groups, we have a valid serial number. Otherwise, it’s got the wrong number of groups.
data ValidationError = WrongNumberOfGroups Int | InvalidGroupLength Int Text deriving (Eq, Show) makeSerialNumber :: Text -> Either ValidationError SerialNumber makeSerialNumber t = do gs <- mapM validateGroup (Text.splitOn (Text.singleton '-') t) if length gs == 4 then Right (SerialNumber gs) else Left (WrongNumberOfGroups (length gs)) -- ...
We need to define the
validateGroup function. If the group length is not 4, it’s invalid. We define
len in the where block. If the set of invalid characters in the group is not empty, it’s invalid. The invalid characters set is defined as the difference between the provided ones and the set of valid characters.
-- ... where validateGroup group | len /= 4 = Left (InvalidGroupLength len group) | not (HashSet.null invalidChars) = Left (InvalidCharacters invalidChars) | otherwise = Right group where len = Text.length group invalidChars = HashSet.difference (HashSet.fromList (Text.unpack group)) validChars
We need some new imports for working with
import Data.HashSet (HashSet) import qualified Data.HashSet as HashSet
And we add a constructor for the
data ValidationError = WrongNumberOfGroups Int | InvalidGroupLength Int Text | InvalidCharacters (HashSet Char) deriving (Eq, Show)
Finally, we will define the valid characters to be a set of uppercase “A” to “Z”, and digits.
validChars :: HashSet Char validChars = HashSet.fromList (['A' .. 'Z'] ++ ['0' .. '9'])
In case the two invalid patterns in
validateGroup weren’t matched, we have a valid group.
Validating in GHCi
Let’s try it in the REPL. We run the
module command on
SerialNumber to only have access to exported definitions; we can’t use the
SerialNumber constructor directly.
*SerialNumber SerialNumber> :module SerialNumber Prelude SerialNumber>
Text values, we’ll use the
Prelude SerialNumber> :set -XOverloadedStrings
This example has invalid characters:
Prelude SerialNumber> makeSerialNumber "1234-aaaa-bbbb-cccc" Left (InvalidCharacters (fromList "a"))
The next try has an invalid group length.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCCC" Left (InvalidGroupLength 5 "CCCCC")
And this one has has the wrong number of groups.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB" Left (WrongNumberOfGroups 3)
Our last attempt is good.
Prelude SerialNumber> makeSerialNumber "1234-AAAA-BBBB-CCCC" Right (SerialNumber ["1234","AAAA","BBBB","CCCC"])
Using the Validated Serial Number
Other functions can use the validated
SerialNumber value with certainty. Let’s create a rendering function that takes us back to a valid serial number text string.
renderSerialNumber :: SerialNumber -> Text renderSerialNumber (SerialNumber groups) = Text.intercalate separator groups
It will join the groups by the separator, which we will extract from the previous code, to instead be defined at the top level.
separator :: Text separator = Text.singleton '-' makeSerialNumber :: Text -> Either ValidationError SerialNumber makeSerialNumber t = do gs <- mapM validateGroup (Text.splitOn separator t) ...
We need to add
renderSerialNumber to the exports list.
module SerialNumber ( SerialNumber , ValidationError(..) , makeSerialNumber , renderSerialNumber ) where
Now we can create a serial number, and map
renderSerialNumber over the smart constructor result.
Prelude SerialNumber> renderSerialNumber <$> makeSerialNumber "1234-AAAA-BBBB-CCCC" Right "1234-AAAA-BBBB-CCCC"
Using smart constructors you can make sure values have been validated, resulting in a sort of correct-by-construction guarantee.
As soon as you have created the value, even from user input, you can pass it around and be confident in its validity. You won’t need to do ad hoc defensive programming. If these serial number groups had some special meanings to other parts of the program, they could be extracted safely. We could even change the newtype to hold four separate
Text values instead of a list, to avoid partiality.
Also, if these values are constructed regularly in your own code, like in tests, you can create a
QuasiQuoter, using the smart constructor, and have them validated at compile-time. Then you don’t need to handle validation errors at runtime; you’d get the
SerialNumber value directly, in our example.
That’s it for this brief introduction to smart constructors.
The source code for the full series is available at github.com/haskell-at-work/validation-with-smart-constructors.