## page was renamed from RegularExpression ## page was renamed from DotNetSamples/RegularExpression <> = Regular Expressions in Dyalog APL = Regular expressions can be used in Dyalog APL through .Net. The following code was prepared with V11 of Dyalog. == Introduction == Note that this article is not about regular expressions as such: instead the reader is assumed to be familiar with regexes, their syntax, groupings, etc. .Net regular expressions are based on that of Perl and are compatible with Perl 5 regular expressions. .Net contains a set of powerful classes that makes it even easier to use regular expressions. The following is a list of classes in the namespace: Capture:: Represents the results from a single subexpression capture. Capture represents one substring for a single successful capture. CaptureCollection:: Represents a sequence of capture substrings. !CaptureCollection returns the set of captures done by a single capturing group. Group:: Group represents the results from a single capturing group. A capturing group can capture zero, one, or more strings in a single match because of quantifiers, so Group supplies a collection of Capture objects. GroupCollection:: Represents a collection of captured groups. !GroupCollection returns the set of captured groups in a single match. Match:: Represents the results from a single regular expression match. MatchCollection:: Represents the set of successful matches found by iteratively applying a regular expression pattern to the input string. Regex:: Represents an immutable (read only) regular expression. == Examples == Here are a few examples on how to use them. {{{ ⎕USING←'System.Text.RegularExpressions,system.dll' ⍝ This where the Regex class resides ⎕wx ⎕io←3 0 ⍝ There are 2 matching functions: and ⍝ Let's start with the function: ⍝ This function deals with all the matches, regardless of grouping: m←Regex.Matches 'xxababababa' 'aba' ⍝ this function is non-overlapping m.Count ⍝ only 2 matches, not 4 2 m[0 1].Index ⍝ they start at offset 2 and 6 (4 overlaps) 2 6 (⌷m).Index ⍝ more succinctly 2 6 ⍝ Another example text←'"tit for tat" said that fat and tall top cat' p1←'[ct].{0,3}[pt]' ⍝ find 'c' or 't' followed by 0 to 3 characters then by 'p' or 't' m←Regex.Matches text p1 m.Count ⍝ 5 found 5 ⌷m ⍝ these are objects, not strings tit tat that top cat DISPLAY ⍕¨⌷m ┌───┬───┬────┬───┬───┐ │tit│tat│that│top│cat│ └───┴───┴────┴───┴───┘ (⌷m).Index 1 9 19 37 41 (⌷m).Length 3 3 4 3 3 ⍝ Let's see the function: m←Regex.Match 'xxababababa' 'aba' ⍝ this function is non-overlapping m.Success 1 m.Index 2 m←m.NextMatch m.Success 1 m.Index 6 m←m.NextMatch m.Success 0 ⍝ Let's capture groups with the function. ⍝ We're looking for names that have 4 sections separated by _ like a_b_c_d text←' a b_c+de_fg_hij÷kl_bnm_iop_good-qq_21_z9_not_this*5' ⍝ [ this word ] pattern←'\b([a-z0-9]+_){3}([a-z0-9]+)\b' ⍝ [ group 1 ] [ group 2 ] - group 0 is the entire match +m←Regex.Match text pattern kl_bnm_iop_good m.(Index Length) 17 15 m.Groups.Count ⍝ groups 0 (all), 1 & 2. 3 m.Groups[2] good }}} == Warning! == Some characters are treated in a special way in Dyalog. In particular, the caret, used in regexes, appears twice in ⎕AV and care must be taken to use the right one. The Classic version of Dyalog does not offer a way to enter both characters distinctly from the keyboard. In a regular expression, apart from meaning "a caret", the caret can mean two things: * at the beginning of a pattern it means 'pattern starts at the beginning' * as the first character inside [sets] it means negate The caret used in regexes for that purpose is found at []AV[235] ([]IO 0). The one used for the APL function AND is found at []AV[167]. Thus to look for a line starting with ABC and not followed by D or E you would use the pattern {{{'^ABC[^DE]'}}} which would be constructed as {{{ ⎕AV[235],'ABC[',⎕AV[235],'DE]' }}} In the Unicode version the 2 characters are distinct and they can be entered directly from the keyboard. == Options == .Net allows some searches to be conducted in a different manner. The main options are !CultureInvariant, !IgnoreCase, !IgnorePatternWhitespace, Multiline, Singleline To use options use the !RegexOptions class as in !RegexOptions.Multiline. To use several options simply add them up: !RegexOptions.(Multiline+!IgnoreCase) == Other examples == Looking for an IP address: (4 numbers of up to 3 digits separated by dot) {{{ text←'Dan, 192.168.1.2, foo-foo' z←Regex.Matches text '(\d{1,3}\.){3}\d{1,3}' z[0] 192.168.1.2 }}} Looking for H1 text: {{{ text←'

APL is greatl!

' pattern←'

(.*?)

' ⍝ group 1 contains the text in between (Regex.Match text pattern RegexOptions.IgnoreCase).Groups[1] APL is greatl! }}} Looking for text between '''any''' tag: {{{ text←'

APL is powerful!

' pattern←'<(\w+)>(.*?)' ⍝ group 2 contains the text in between (Regex.Match text pattern RegexOptions.IgnoreCase).Groups[2] APL is powerful! }}} Looking for an APL identifier (including system names): {{{ ∇ test;local [1] global←1 ⋄ local←2 [2] label:⎕IO←1 [3] :If 1 ⍝ ⍺ [4] ∆special∆←1 ⋄ special⍙←2 [5] _special←3 ⋄ Áspecial←4 [6] :EndIf ∇ ok←0≤⎕NC 256 1⍴⎕AV ⍝ find all name forming characters: ∆, ⍙, Á, etc. r←'a-zA-Z',(ok/⎕AV)~,⎕AV[(⎕AV⍳'Aa')∘.-1-⍳26] ⍝ The pattern is any of those characters, followed by 0 or more of the same characters plus digits ⍝ and not preceded by : (for those :statements). No accounting for quotes or comments here. pattern←'((? test local global local label ⎕IO ⍺ ∆special∆ special⍙ _special Áspecial }}} You can use named groups instead of numered groups (the default): {{{ pattern←'

(?.*?)

' ⍝ group 'STR' is to contain the text in between (Regex.Match 'aaaaz

Title

sad ' pattern RegexOptions.IgnoreCase).Groups[⊂'STR'] Title }}} The function returns 1 if the pattern is found ANYWHERE. Example: Validate password conditions such as: "Password must be from 8 to 20 characters, must contain at least 2 letters and at least 2 digits. It can only contain letters and digits." {{{ p←⎕new Regex,⊂⊂⎕AV[235],'(?=.*?\d.*?\d)(?=(.*?[a-zA-Z]){2,})[\da-zA-Z]{8,20}$' p.IsMatch∘⊂¨'ds' ' 32a ' '0123456789x' '01234abcde56789wxyzp1' '0123aAzZ' 0 0 0 0 1 }}} You can split a string into substring using the function. This is sort of like the complement of where it does not return the matches but everything else. Example: split where X is followed by a digit: {{{ ⌷m←Regex.Split '1stX1aaaX2bbbX3cccX4ddd' 'X\d' 1st aaa bbb ccc ddd }}} == Using regular expressions to replace strings == You specify the pattern, text and how to replace using $n to denote group 'n'. === Example 1 === Change "Surname, Name" into "Name Surname" (and account for spaces): {{{ pat←⎕new Regex (⊂'\s*(\w+)\s*,\s*(\w+)\s*') ⍝ group 1 is surname, group 2 is name pat.Replace ' Iverson, Ken ' '$2 $1' Ken Iverson ⍝ or, for a one time event: Regex.Replace' Iverson, Ken ' '\s*(\w+)\s*,\s*(\w+)\s*' '$2 $1' }}} You can also use named groups instead of numbers. {{{ Regex.Replace' Iverson, Ken ' '\s*(?\w+)\s*,\s*(?\w+)\s*' '${First} ${Last}' Ken Iverson }}} === Example 2 === If you need special treatment to be done you can use your own function to perform the replacement using a !MatchEvaluator. You can think of a !MatchEvaluator as an event handler that fires when an "!OnMatch event" occurs. For example if you want example 1 to ensure only the first letter is capitalized you can write {{{ ∇ str←cap arg [1] str←(ToUpper arg.Groups[3].Value),ToLower arg.Groups[4].Value [2] str,←' ',(ToUpper arg.Groups[1].Value),ToLower,arg.Groups[2].Value ∇ capor←⎕NEW MatchEvaluator (⎕or'cap') pat←⎕NEW Regex,⊂⊂'\s*(\w)(\w*)\s*,\s*(\w)(\w*)\s*' ⍝ groups 1 & 3 are 1st name letters pat.Replace ' iVErson, kEn ' capor Ken Iverson }}} Author: DanBaronet ---- CategoryRegularExpressions - CategoryDotNet - CategoryDyalogDotNet - CategoryDyalogExamplesDotNet