Grecce: a Command-line Utility

Though there are already regex-engines designed for character-based input-data, which provide a wide spectrum of features, the library exported by the package "regexdot", besides being polymorphic in terms of the type from which the input-data is composed, provides the additional ability to discover the complete mapping of each input-datum into the regex. This provides a useful aid to debugging regex-syntax, & it is for this reason, that the derived library "regexchar" (which inherits this ability), has been linked into an executable "grecce", which is in other respects, a re-implementation of egrep.

Despite exploiting parallelism in both the evaluation of any alternative sub-expressions defined in the regex, & in the processing of multiple input-data files (where specified), the performance of grecce is relatively poor because the underlying polymorphic library can't exploit character-specific optimisations to read its input-data rapidly. The matching-algorithm is also of theoretically inferior time-complexity to that used by TDFA, but this additional factor is only significant for rather atypical pathological regexen.

Examples

Words containing all the vowels, in alphabetic order.

$ grecce 'a[[:alpha:]]*e[[:alpha:]]*i[[:alpha:]]*o[[:alpha:]]*u' '/usr/share/dict/words'
abstemious
abstemiously
abstemiousness
abstentious
adenocarcinomatous
adventitious
adventitiously
adventitiousness
amentiferous
androdioecious
andromonoecious
anemophilous
antenniferous
antireligious
arenicolous
argentiferous
arsenious
arteriovenous
asclepiadaceous
autecious
auteciously
bacteriophagous
caesalpiniaceous
cavernicolous
chaetiferous
facetious
facetiously
facetiousness
flagelliferous
garnetiferous
hamamelidaceous
lateritious
parecious
quadrigeminous
sacrilegious
sacrilegiously
sacrilegiousness
sarraceniaceous
supercalifragilisticexpialidocious
ultrareligious
ultraserious
valerianaceous

The longest words which can be spelt, using only the top row of letters on a typewriter.

$ grecce '^[qwertyuiop]{10,}$' '/usr/share/dict/words'
peppertree
pepperwort
perpetuity
perruquier
pirouetter
prerequire
proprietor
repertoire
rupturewort
typewriter

One can obtain the mapping of input-data into the regex, for any of these, by specifying the "--verbose" flag.

$ echo 'A typewriter.' | grecce --verbose '[qwertyuiop]{10,}'
(Just (.*?,0,"A "),Just [(['q','w','e','r','t','y','u','i','o','p']{10,},2,"typewriter")],Just (.*,12,"."))