Reading the Regex

Why ?

Reading the regex-specification from a string, doesn't just provide a familiar link with traditional character-based regexen, it allows an executable to construct a regex from user-input; so while regexdot's API, may provide a more robust way to construct regexen, it's not always appropriate. It is for this reason that the package "bridge" uses regexdot in this manner.

Match-operators

Like RegExChar.ExtendedRegExChar1, the module "RegExDot.RegEx" exports three match-operators:

$ ghci
GHCi, version 7.6.3: https://www.haskell.org/ghc/  :? for help
Loading package ghc-prim … linking … done.
Loading package integer-gmp … linking … done.
Loading package base … linking … done.

Prelude> :module + RegExDot.RegEx

Prelude RegExDot.RegEx> :type (=~)
(=~)	:: (Eq a, Show a, Control.DeepSeq.NFData a) => InputData a -> RegExDot.RegExOpts.RegExOpts (ExtendedRegEx a) -> Bool

Prelude RegExDot.RegEx> :type (/~)
(/~)	:: (Eq a, Show a, Control.DeepSeq.NFData a) => InputData a -> RegExDot.RegExOpts.RegExOpts (ExtendedRegEx a) -> Bool

Prelude RegExDot.RegEx> :type (+~)
(+~)	:: (Eq a, Show a, Control.DeepSeq.NFData a) => InputData a -> RegExDot.RegExOpts.RegExOpts (ExtendedRegEx a) -> Result a

Prelude RegExDot.RegEx> :info InputData
type InputData a = [a] -- Defined in RegExDot.RegEx

Prelude RegExDot.RegEx> :info RegExDot.RegExOpts.RegExOpts
data RegExDot.RegExOpts.RegExOpts a = RegExDot.RegExOpts.MkRegExOpts {
	RegExDot.RegExOpts.compilationOptions	:: RegExDot.CompilationOptions.CompilationOptions,
	RegExDot.RegExOpts.executionOptions	:: RegExDot.ExecutionOptions.ExecutionOptions,
	RegExDot.RegExOpts.regEx		:: a
}	-- Defined in RegExDot.RegExOpts
instance Functor RegExDot.RegExOpts.RegExOpts			-- Defined in RegExDot.RegExOpts
instance (Show a) => Show (RegExDot.RegExOpts.RegExOpts a)	-- Defined in RegExDot.RegExOpts

As can be seen from the context of each match-operator, the type-parameter 'a' is, as expected, required to implement the type-classes "Eq" & "Show"; & also type-class "Control.DeepSeq.NFData".

Each binary operator requires parameters of type "RegExDot.RegEx.InputData a" & "RegExDot.RegExOpts.RegExOpts (RegExDot.RegEx.ExtendedRegEx a)", both of which are polymorphic. The former is merely a type-synonym for a list of input-data, whereas the latter aggregates a polymorphic regex, with options which govern both its interpretation & the implementation of the match-process.

The first two operators return a value of type "Bool", & merely indicate the success or failure of the match; the second operator returning the inverse of the first. The last operator returns a value of type "RegExDot.RegEx.Result a", which after a match, contains the complete mapping of the input-data into a polymorphic tree-structure.

Examples

To demonstrate this regex-engine, one must first choose a suitable value for the type-parameter 'a'. The chosen data-type must satisfy the context, which means it must implement the three required type-classes, but it would be hard to find a data-type which didn't satisfy both the type-classes "Eq" & "Show", & the type-class "Control.DeepSeq.NFData" is common amongst simple built-in data-types.

Type-parameter = Int

Prelude RegExDot.RegEx> :module + Control.DeepSeq
Prelude RegExDot.RegEx Control.DeepSeq> :info Int
data Int = GHC.Types.I# GHC.Prim.Int#	-- Defined in GHC.Types
instance Bounded Int			-- Defined in GHC.Enum
instance Enum Int			-- Defined in GHC.Enum
instance Eq Int				-- Defined in GHC.Base
instance Integral Int			-- Defined in GHC.Real
instance Num Int				-- Defined in GHC.Num
instance Ord Int				-- Defined in GHC.Base
instance Read Int				-- Defined in GHC.Read
instance Real Int				-- Defined in GHC.Real
instance Show Int			-- Defined in GHC.Show
instance NFData Int			-- Defined in Control.DeepSeq

Prelude RegExDot.RegEx Control.DeepSeq> :module - Control.DeepSeq
Prelude RegExDot.RegEx> [1,-2,4,-8,16] =~ RegExDot.RegExOpts.mkRegEx (read "^[.*]$" {-Regex encoded within a String-} :: ExtendedRegEx Int {-Required because of ambiguous type of input-data-})
<interactive>:9:47:
	No instances for (RegExDot.Meta.ShortcutExpander Int, ShortcutExpander Int) arising from a use of `read'
	Possible fix: add an instance declaration for (RegExDot.Meta.ShortcutExpander Int, ShortcutExpander Int)
	In the first argument of `RegExDot.RegExOpts.mkRegEx', namely `(read "^[.*]$" :: ExtendedRegEx Int)'
	In the second argument of `(=~)', namely `RegExDot.RegExOpts.mkRegEx (read "^[.*]$" :: ExtendedRegEx Int)'
	In the expression: [1, -2, 4, -8, ....] =~ RegExDot.RegExOpts.mkRegEx (read "^[.*]$" :: ExtendedRegEx Int)

Prelude RegExDot.RegEx> :info ShortcutExpander
class ShortcutExpander a where
	expand :: Char -> ExtendedRegEx a -- Defined in RegExDot.RegEx

Prelude RegExDot.RegEx> :info RegExDot.Meta.ShortcutExpander
class RegExDot.BracketExpressionMember.ShortcutExpander a => RegExDot.Meta.ShortcutExpander a where
	RegExDot.Meta.expand :: Char -> RegExDot.Meta.Meta a -- Defined in RegExDot.Meta

Prelude RegExDot.RegEx> :info RegExDot.BracketExpressionMember.ShortcutExpander
class RegExDot.BracketExpressionMember.ShortcutExpander a where
	RegExDot.BracketExpressionMember.findPredicate :: Char -> Maybe (RegExDot.ShowablePredicate.ShowablePredicate a) -- Defined in RegExDot.BracketExpressionMember

What went wrong ?!
The data-type "Int" implements the type-classes required according to the match-operator's context, but on closer inspection one can see that it doesn't implement all the type-classes required by the implementation of the method "read". Because we've strayed from the traditional Char-based regex, the well-defined meaning of Perl-style shortcuts has been lost. To re-establish this useful concept, the method "read" requires the implementation of two additional type-classes; & the second of those requires the implementation of a third.

The reason for the first two type-classes, is that a shortcut may occur in two different contexts within a regex, depending on whether expansion to a "RegExDot.RegEx.ExtendedRegEx a", or to a "RegExDot.Meta.Meta a", is implied.

RegExDot.RegEx.ShortcutExpander

When RegExDot.RegEx.ExtendedRegEx's definition2 of the method "read" is called, any character preceded by a '\'3 is passed to the method "RegExDot.RegEx.expand", for expansion into a "RegExDot.RegEx.ExtendedRegEx a".

RegExDot.Meta.ShortcutExpander

When RegExDot.Meta.Meta's definition of the method "read" is called, which it is when an individual meta-datum needs to be read, any character preceded by a '\'3 is passed to the method "RegExDot.Meta.expand", for expansion into a "RegExDot.Meta.Meta a".

RegExDot.BracketExpressionMember.ShortcutExpander

Since the context for the type-class "RegExDot.Meta.ShortcutExpander", requires one to implement the type-class "RegExDot.BracketExpressionMember.ShortcutExpander", then the method "RegExDot.BracketExpressionMember.findPredicate", is available to facilitate the implementation of the method "RegExDot.Meta.expand". If a predicate-function has been defined for a specific shortcut, it can be called to determine whether a specific input-datum matches; though you may not need this facility.

While this seems like a recipe for a pounding head-ache, it's ironically only for convenience, & if the user of the regex doesn't need shortcuts, then a default implementation of these type-classes is easy to define; as it has been for the type "Int", in the module "RegExDot.InstanceInt". This module's minimal implementation defines the required type-classes, as required for compilation, but without actually defining any shortcuts.

Prelude RegExDot.RegEx> :module + RegExDot.InstanceInt

Prelude RegExDot.RegEx RegExDot.InstanceInt> [1,-2,4,-8,16] =~ RegExDot.RegExOpts.mkRegEx (read "^[.*]$" :: ExtendedRegEx Int)
Loading package transformers-0.3.0.0 … linking … done.
Loading package array-0.4.0.1 … linking … done.
Loading package mtl-2.1.2 … linking … done.
Loading package deepseq-1.3.0.1 … linking … done.
Loading package containers-0.5.0.0 … linking … done.
Loading package parallel-3.2.0.3 … linking … done.
Loading package bytestring-0.10.0.2 … linking … done.
Loading package text-0.11.3.1 … linking … done.
Loading package parsec-3.1.3 … linking … done.
Loading package old-locale-1.0.0.5 … linking … done.
Loading package time-1.4.0.1 … linking … done.
Loading package random-1.0.1.1 … linking … done.
Loading package QuickCheck-2.6 … linking … done.
Loading package filepath-1.3.0.1 … linking … done.
Loading package unix-2.6.0.1 … linking … done.
Loading package directory-1.2.0.1 … linking … done.
Loading package toolshed-0.16.0.0 … linking … done.
Loading package regexdot-0.11.1.2 … linking … done.
True

You may have been wondering about the curious syntax of the above regex; it appears that I've encoded a /.*/ inside a bracket-expression (which makes no sense), but in this instance the brackets actually delimit a singleton list. Now we've strayed from the traditional character-based regex, a slightly more verbose syntax is required in order to represent a regex inside a String. It consists of an explicit list of optionally quantified "RegExDot.Meta.Meta a", sandwiched by optional anchors. One can still define a bracket-expression, but the context in which it appears within the regex, is used to distinguish it from the above concatenation. Capture-groups make the syntactic jungle just a shade more dense, as will shortly become apparent.

Telephone-numbers Telephone

As an crude example of the specialisation "RegExDot.RegEx.ExtendedRegEx Int", one could attempt to define a regex to match the format of a UK telephone-number; one could also do this with a traditional character-based regex, though I'm struggling to motivate myself either way.

Prelude RegExDot.RegEx RegExDot.InstanceInt> :{
Prelude RegExDot.RegEx RegExDot.InstanceInt| [0,1,6,3,2,3,4,5,6,7,8] =~ RegExDot.RegExOpts.mkRegEx (
Prelude RegExDot.RegEx RegExDot.InstanceInt| 	read "^[([([4,4]|[0])?,[1,2,3,7,8,9],[0,1,2,3,4,5,6,7,8,9]{1,4}])?,[2,3,4,5,6,7,8,9],[0,1,2,3,4,5,6,7,8,9]{3,7}]$" :: ExtendedRegEx Int
Prelude RegExDot.RegEx RegExDot.InstanceInt| )
Prelude RegExDot.RegEx RegExDot.InstanceInt| :}
True

A multi-line ghci-command was used, to reduce the otherwise excessive line-length.

That's quite a syntactic eye-full. There are two capture-groups delimited by (), the second of which contains a pair of alternative sub-expressions, each of which, recursively, has the same structure as the outer regex; a list (explicitly delimited by []), of optionally quantified "RegExDot.Meta.Meta a", sandwiched by optional anchors. The alternating pairs of nested brackets & parentheses, associated with capture-groups, can become confusing (& probably already have).

Note that while there's a superficial similarity to the corresponding traditional character-based regex, the ','-separated list, from which the regex is now composed, permits one to quantify directly multi-Char representations of Ints, i.e. those outside the range of digits, [0 .. 9].

Even though the actual type-parameter "Int", implements the type-class "Enum", as required to construct a list using "..", the context for the formal type-parameter doesn't mandate it, so the members of bracket-expressions must be manually enumerated. One can resolve this tedium, by modifying the minimal implementation of type-class "RegExDot.Meta.ShortcutExpander Int" from the module "RegExDot.InstanceInt", to include a simple Perl-style shortcut, which being tailored for the actual type-parameter, can use any of the type-classes that it implements. Once defined, it can also be used repeatedly, to reduce further the above regex.

$ #Enhance the minimal instance-definition with Perl-style shortcuts.
$ cd 'RegExDot-0.11.1.2/src-lib/RegExDot/'
$ mv 'InstanceInt.hs' 'InstanceInt.hs.old' && cat >'InstanceInt.hs' <<'EOF'
module RegExDot.InstanceInt() where {
	import qualified	RegExDot.BracketExpressionMember	as BracketExpressionMember;
	import qualified	RegExDot.Meta			as Meta;
	import qualified	RegExDot.RegEx			as RegEx;
	import qualified	RegExDot.ShowablePredicate		as ShowablePredicate;

	instance BracketExpressionMember.ShortcutExpander Int where {
		findPredicate 'e'{-ven-}	= Just ShowablePredicate.MkShowablePredicate {ShowablePredicate.name = "\\e", ShowablePredicate.predicate = even};
		findPredicate _		= Nothing
	};

	instance Meta.ShortcutExpander Int where {
		expand 'd'{-igit-}	= Meta.AnyOf $ map BracketExpressionMember.Literal [0 .. 9]; {-Note the use of "..", which requires "instance Enum".-}
		expand 'o'{-dd-}	= Meta.AnyOf . map BracketExpressionMember.Literal $ filter odd [minBound ..]; {-Watch this space !-}
		expand c		= case BracketExpressionMember.findPredicate c of {
			Just showablePredicate	-> Meta.Predicate showablePredicate;
			_			-> error $ "Unrecognised shortcut '" ++ show c ++ "'."
		}
	};

	instance RegEx.ShortcutExpander Int where {
		expand c	= error $ "Unrecognised shortcut '" ++ show c ++ "'."
	}
}
EOF

$ cd .. && ghci -cpp -D'MIN_VERSION_base(major1,major2,minor)=major1 < 4 || major1 == 4 && major2 <  6 || major1 == 4 && major2 == 6 && minor <= 0' -DVERSION_base=4.6.0.1 'RegExDot.RegEx' 'RegExDot.InstanceInt'	#Re-enter ghci with the modified instance-declaration.
GHCi, version 7.6.3: https://www.haskell.org/ghc/  :? for help
Loading package ghc-prim … linking … done.
Loading package integer … linking … done.
Loading package base … linking … done.
[ 1 of 15] Compiling RegExDot.Tree			( RegExDot/Tree.hs, interpreted )
[ 2 of 15] Compiling RegExDot.ShowablePredicate		( RegExDot/ShowablePredicate.hs, interpreted )
[ 3 of 15] Compiling RegExDot.ExecutionOptions		( RegExDot/ExecutionOptions.hs, interpreted )
[ 4 of 15] Compiling RegExDot.ConsumptionBounds		( RegExDot/ConsumptionBounds.hs, interpreted )
[ 5 of 15] Compiling RegExDot.ConsumptionProfile		( RegExDot/ConsumptionProfile.hs, interpreted )
[ 6 of 15] Compiling RegExDot.Consumer			( RegExDot/Consumer.hs, interpreted )
[ 7 of 15] Compiling RegExDot.Repeatable			( RegExDot/Repeatable.hs, interpreted )
[ 8 of 15] Compiling RegExDot.CompilationOptions		( RegExDot/CompilationOptions.hs, interpreted )
[ 9 of 15] Compiling RegExDot.RegExOpts			( RegExDot/RegExOpts.hs, interpreted )
[10 of 15] Compiling RegExDot.BracketExpressionMember	( RegExDot/BracketExpressionMember.hs, interpreted )
[11 of 15] Compiling RegExDot.BracketExpression		( RegExDot/BracketExpression.hs, interpreted )
[12 of 15] Compiling RegExDot.Meta			( RegExDot/Meta.hs, interpreted )
[13 of 15] Compiling RegExDot.Anchor			( RegExDot/Anchor.hs, interpreted )
[14 of 15] Compiling RegExDot.RegEx			( RegExDot/RegEx.hs, interpreted )
[15 of 15] Compiling RegExDot.InstanceInt			( RegExDot/InstanceInt.hs, interpreted )
Ok, modules loaded: RegExDot.Anchor, RegExDot.BracketExpression, RegExDot.BracketExpressionMember, RegExDot.CompilationOptions, RegExDot.Consumer, RegExDot.ConsumptionBounds, RegExDot.ConsumptionProfile, RegExDot.ExecutionOptions, RegExDot.InstanceInt, RegExDot.Meta, RegExDot.RegEx, RegExDot.RegExOpts, RegExDot.Repeatable, RegExDot.ShowablePredicate, RegExDot.Tree.

*RegExDot.RegEx> read "[\\d]" :: ExtendedRegEx Int --Show the expanded shortcut.
Loading package transformers-0.3.0.0 … linking … done.
Loading package array-0.4.0.1 … linking … done.
Loading package mtl-2.1.2 … linking … done.
Loading package deepseq-1.3.0.1 … linking … done.
Loading package containers-0.5.0.0 … linking … done.
Loading package parallel-3.2.0.3 … linking … done.
Loading package bytestring-0.10.0.2 … linking … done.
Loading package text-0.11.3.1 … linking … done.
Loading package parsec-3.1.3 … linking … done.
Loading package old-locale-1.0.0.5 … linking … done.
Loading package time-1.4.0.1 … linking … done.
Loading package random-1.0.1.1 … linking … done.
Loading package QuickCheck-2.6 … linking … done.
Loading package filepath-1.3.0.1 … linking … done.
Loading package unix-2.6.0.1 … linking … done.
Loading package directory-1.2.0.1 … linking … done.
Loading package toolshed-0.16.0.0 … linking … done.
[[0,1,2,3,4,5,6,7,8,9]]

*RegExDot.RegEx> [0,1,6,3,2,3,4,5,6,7,8] =~ RegExDot.RegExOpts.mkRegEx (read "^[([([4,4]|[0])?,[1,2,3,7,8,9],\\d{1,4}])?,[2,3,4,5,6,7,8,9],\\d{3,7}]$" :: ExtendedRegEx Int)4
True

*RegExDot.RegEx> [0,1,6,3,2,3,4,5,6,7,8] +~ RegExDot.RegExOpts.mkRegEx (read "^[([([4,4]|[0])?,[1,2,3,7,8,9],\\d{1,4}])?,[2,3,4,5,6,7,8,9],\\d{3,7}]$" :: ExtendedRegEx Int)4
(Nothing,Just [[[[[(0,0,[0])]],([1,2,3,7,8,9],1,[1]),([0,1,2,3,4,5,6,7,8,9]{1,4},2,[6,3,2,3])]],([2,3,4,5,6,7,8,9],6,[4]),([0,1,2,3,4,5,6,7,8,9]{3,7},7,[5,6,7,8])],Nothing)
RegExDot.RegEx.Result-tree

Note the Perl-style shortcut, which was expanded when reading the regex-specification from a String, was also displayed in that expanded form in the result; in fact the manner in which such shortcuts are shown, depends on their implementation of the method "RegExDot.Meta.expand".

Here the complete mapping of the input-data has been revealed. The result is a triple composed from:

Like the definition of the regex, the result suffers from syntactic overload, in that it's difficult to disambiguate the semantics of the "[]", which depending on the context, are used both to construct a Haskell-list, & to delimit a bracket-expression. Hopefully the help-text available when you hover your mouse-cursor over them, will clarify the distinction.

Odd & Even

Now let's try the two other shortcuts included in the above modified version of RegExDot.InstanceInt, which aim respectively, to define the sets of odd & even numbers.

*RegExDot.RegEx> read "[\\o]" :: ExtendedRegEx Int	--Show the expanded shortcut.
[[-9223372036854775808,-9223372036854775806,-9223372036854775804,-9223372036854775802,-9223372036854775800,-9223372036854775798,^CInterrupted.

*RegExDot.RegEx> [1,3,5] =~ RegExDot.RegExOpts.mkRegEx (read "^[\\o+]$" :: ExtendedRegEx Int) --This may take some time.
^CInterrupted.

*RegExDot.RegEx> read "[\\e]" :: ExtendedRegEx Int	--Show the expanded shortcut.
[\e]

*RegExDot.RegEx> [0,2,4] =~ RegExDot.RegExOpts.mkRegEx (read "^[\\e+]$" :: ExtendedRegEx Int)
True

*RegExDot.RegEx> [1,3,5] =~ RegExDot.RegExOpts.mkRegEx (read "^[[^\\e]+]$" :: ExtendedRegEx Int) --Simulate "\o".
True

There's something odd about the implementation of the shortcut /\o/, & it's not difficult to see what. In contrast to a traditional regex, which operates on the relatively small set of characters, now we're operating on the conceptually infinite5 set of Int, which in this naïve implementation results in an infeasibly large bracket-expression.

To avoid the resulting ghastly inefficiency, & the inability to show shortcuts implemented in this manner, a superior approach was used for the implementation of /\e/; the match-operation was defined by a predicate-function. Since functions don't naturally implement the type-class "Show", the shortcut's mnemonic is printed instead.

One could also modify the minimal definition of the other type-class, "RegExDot.RegEx.ShortcutExpander Int", from the module "RegExDot.InstanceInt", to replace whole sub-expressions with Perl-style shortcuts too, but it's rather more involved, since the return-type is more complex; there are examples of this in "bridge-0.1.0.12/src-lib/Bridge/Tier4/ExtendedRegExBid.hs".

If you actually performed this example, remember to revert to the original unmodified module:

$ cd 'RegExDot/' && mv 'InstanceInt.hs.old' 'InstanceInt.hs'

Type-parameter = Bridge.Tier2.Bid.Bid

Contract Bridge Auction.

You may not be familiar with contract bridge auctions, but briefly, they're composed from bids (typically defining a number of tricks & a trump-suit, but there are also three special non-constructive bids; Pass, Double & Redouble) issued sequentially by each of the four players, with the aim of establishing the number of tricks that must be won should the hand be played with that suit as trumps. It's like an English auction, in that the constructive bids must increase, but one in which they can't necessarily be interpreted literally. The intended meaning of individual bids, & to a greater extent, the sequences of bids issued by successive players, can only be understood in the context of a set of conventions, agreed within each of the two opposing partnerships, into which the four players are divided. Distinguishing conventional bidding-sequences from natural ones, can be tricky, but when the player is implemented in software rather then wetware, it's an ideal domain in which to deploy a polymorphic regex-engine.

The data-type "Bridge.Tier2.Bid.Bid", was the original reason for the development of the package "regexdot". One can specify regexen composed from them in the same manner as was done above for Ints, but with the additional convenience that Perl-style shortcuts have already been defined in the modules "Bridge.Tier3.MetaBid" & "Bridge.Tier4.ExtendedRegExBid", of the package "bridge". This set of shortcuts isn't complete, but provides an adequate example of the principle.

Shortcuts defined for the data-type "Bridge.Tier2.Bid.Bid"
Shortcut-mnemonic Type-class
RegExDot.Meta.ShortcutExpander RegExDot.RegEx.ShortcutExpander
1 .. 7 Any bid at the referenced level
C Any ♣-bid
D Any -bid
H Any -bid
S Any ♠-bid Stayman
N Any bid in No-trumps
M Any bid in a major suit
m Any bid in a minor suit
b Any black suit
r Any red suit Jacoby Red-suit Transfer
q Quantitative Raise

The rather dubious but convenient choice, in the first row of the table, of a single digit as a mnemonic, would normally conflict with the traditional regex-syntax used to denote a back-reference (\1 .. \9), except that regexdot doesn't currently support these; though it does support the associated concept of capture-groups.

Though some mnemonics are overloaded between the two type-classes, they exist in different name-spaces, & therefore don't conflict. Whilst such overloading doesn't aid the intelligibility of the regex in which they're embedded, neither would the choice of arbitrary but unique mnemonics, & it's as well to understand that's it's permissible.

In contrast to earlier versions, package "bridge-0.1.0.12" exposes a library which must be installed for the following example.

$ ghci
Loading package ghc-prim … linking … done.
Loading package integer-gmp … linking … done.
Loading package base … linking … done.
Prelude> :module + Bridge.Tier2.Bid Bridge.Tier3.MetaBid Bridge.Tier4.ExtendedRegExBid RegExDot.RegEx

Prelude Bridge.Tier2.Bid Bridge.Tier3.MetaBid Bridge.Tier4.ExtendedRegExBid RegExDot.RegEx> read "\\S" {-Regex encoded within a String-} :: ExtendedRegEx Bid -- Define the result-type.
Loading package transformers-0.3.0.0 … linking … done.
Loading package array-0.4.0.1 … linking … done.
Loading package mtl-2.1.2 … linking … done.
Loading package deepseq-1.3.0.1 … linking … done.
Loading package containers-0.5.0.0 … linking … done.
Loading package parallel-3.2.0.3 … linking … done.
Loading package bytestring-0.10.0.2 … linking … done.
Loading package text-0.11.3.1 … linking … done.
Loading package parsec-3.1.3 … linking … done.
Loading package binary-0.7.6.1 … linking … done.
Loading package filepath-1.3.0.1 … linking … done.
Loading package old-locale-1.0.0.5 … linking … done.
Loading package time-1.4.0.1 … linking … done.
Loading package unix-2.6.0.1 … linking … done.
Loading package directory-1.2.0.1 … linking … done.
Loading package pretty-1.1.1.0 … linking … done.
Loading package process-1.1.0.2 … linking … done.
Loading package Cabal-1.22.4.0 … linking … done.
Loading package primes-0.2.1.0 … linking … done.
Loading package random-1.0.1.1 … linking … done.
Loading package QuickCheck-2.6 … linking … done.
Loading package toolshed-0.16.0.0 … linking … done.
Loading package factory-0.2.1.2 … linking … done.
Loading package regexdot-0.11.1.2 … linking … done.
Loading package bridge-0.1.0.12 … linking … done.
^[([([Pass{0,3}]|[Pass{0,2},[1C,1D,1H,1S],([Pass{2}])?]),1NT,Pass,2C]|[Pass{0,3},2NT,Pass,3C])]

Prelude Bridge.Tier2.Bid Bridge.Tier3.MetaBid Bridge.Tier4.ExtendedRegExBid RegExDot.RegEx> read "[1NT,Pass,2C,Pass,2H,Pass,3NT,Pass,4S,Pass,Pass,Pass]" =~ RegExDot.RegExOpts.mkRegEx (read "\\S" :: ExtendedRegEx Bid)
True

Prelude Bridge.Tier2.Bid Bridge.Tier3.MetaBid Bridge.Tier4.ExtendedRegExBid RegExDot.RegEx> Data.List.intersperse Pass [twoNTs,threeClubs,threeSpades,fourSpades,Pass] =~ RegExDot.RegExOpts.mkRegEx (read "\\S")
True
Prelude Bridge.Tier2.Bid Bridge.Tier3.MetaBid Bridge.Tier4.ExtendedRegExBid RegExDot.RegEx> read "[(\\r|^[Pass{0,3},\\r])]" :: ExtendedRegEx Bid -- Define the result-type.
[(^[([([Pass{0,3}]|[Pass{0,2},[1C,1D,1H,1S]]|[[1C,1D,1H,1S],Pass{2}]),1NT,Pass,([2D,.,2H]|[2H,.,2S])]|[Pass{0,3},2NT,Pass,([3D,.,3H]|[3H,.,3S])])]|^[Pass{0,3},\r])]

Continuing the previous ghci-session, two Perl-style shortcuts were referenced, but not, despite the similarity in their mnemonics, identical. The first is implemented by the type-class "RegExDot.RegEx.ShortcutExpander", & defines the set of bidding-sequences which can be classified as a Red-suit Transfer (another favourite amongst bridge-players). The second is implemented by the type-class "RegExDot.Meta.ShortcutExpander", & defines a bracket-expression representing a single bid in any red suit. Though defined using the same mnemonic, their different contexts within the regex, & consequently name-spaces, preventing the apparent conflict.

Note that because the latter shortcut was implemented using the type-class "RegExDot.BracketExpressionMember.ShortcutExpander", it appears as merely the original mnemonic when the RegExDot.RegEx.Result is shown.

This regex possesses another unusual feature, internal anchors. Both alternative sub-expressions in the outer capture-group, have been individually bow-anchored; though one of these is implicit in the definition of the shortcut. In general, each alternative sub-expression, being syntactically identical to a regex, can have its own anchors, though this only makes sense when there's a possibility that, any quantified meta-data specified either before an internal bow-anchor, or after an internal stern-anchor, are capable of consuming nothing; otherwise the regex will always fail to match.

Syntactic Limitations

Regrettably, packing a polymorphic regex into a String, as though it were a traditional character-based regex, presents a problem. Though it works in the above examples, that's because by good fortune, the String-representations of these actual type-parameters, didn't conflict with the meta-characters used to qualify them & the syntax used to delimit them.

If the implementation of the type-class "Read", for the polymorphic data-type, can consume any of the postfix meta-characters used in standard regex-syntax for quantification, it'll starve the regex's parser. Equally, the regex's parser will interpret '(' as the start of a capture-group, & will interpret '[' as the start of a bracket-expression, potentially starving the polymorphic data-type's read-method.

If one can't unambiguously read a polymorphic regex from a String, because the regex's parser competes for data with the specific implementation of the type-class "Read", then one may be able to remove the ambiguity by wrapping the actual type-parameter in a newtype, which re-implements the type-classes "Read" & "Show". Alternatively one can bypass the ambiguity by composing the regex.

Footnotes

Footnotes

  1. About which, you should probably read before attempting this page.
  2. Actually, the module "RegExDot.RegEx", defines several related data-types, each of which separately implements the type-class "Read".
  3. Except '\' itself.
  4. Note the additional '\', required to escape \d from string-interpolation.
  5. Merely 264 on my machine.