Differences between revisions 7 and 17 (spanning 10 versions)
Revision 7 as of 2015-04-01 00:49:34
Size: 635
Comment:
Revision 17 as of 2015-04-14 20:18:23
Size: 1842
Comment:
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
== Remove the Accents (diacritics) ==
The following function can be used to remove the accents of Unicode words. This is useful to normalize the text input of user when used for searching.
{{{
 r←RemoveAccents string;str;strFormD;stringBuilder;⎕USING
⍝ Function to remove the accents.
⍝ For example: 'Crème Brûlée' becomes 'Creme Brulee'
Line 28: Line 34:
CategoryDyalogExamplesDotNet ⍝ Adapted from the following posts:
⍝ http://www.siao2.com/2005/02/19/376617.aspx
⍝ http://www.siao2.com/2007/05/14/2629747.aspx

 ⎕USING←'System,mscorlib.dll' 'System.Text,mscorlib.dll' 'System.Globalization,mscorlib.dll'

 str←⎕NEW String(⊂string)

 strFormD←str.Normalize(NormalizationForm.FormD)

 stringBuilder←⎕NEW StringBuilder

 {UnicodeCategory.NonSpacingMark≠CharUnicodeInfo.GetUnicodeCategory(⍵):{}stringBuilder.Append(⍵)}¨strFormD

 str←⎕NEW String(⊂stringBuilder.ToString ⍬)

 r←str.Normalize(NormalizationForm.FormC)
}}}
And here is some utilization of the function:
{{{
      RemoveAccents 'Crème Brûlée'
Creme Brulee

      RemoveAccents 'âãäåçèéêë ìíîïðñòó ôõöùúûüý'
aaaaceeee iiiiðnoo ooouuuuy
}}}
----
CategoryDyalog CategoryDyalogDotNet CategoryDyalogExamplesDotNet

netUpperLowerCase

LowerCase and UpperCase

The 2 functions ToLowercase and ToUppercase are used when dealing with Unicode characters:

 ToLowercase←{
     (0=1↑0⍴⍵):''
     ⎕USING←',mscorlib.dll'
     (⎕NEW System.String(⊂,⍵)).ToLowerInvariant
 }

 ToUppercase←{
     (0=1↑0⍴⍵):''
     ⎕USING←',mscorlib.dll'
     (⎕NEW System.String(⊂,⍵)).ToUpperInvariant
 }

      ToUppercase 'monday'
MONDAY

      ToUppercase¨ 'sunday' 'monday' 'tuesday'
 SUNDAY  MONDAY  TUESDAY

      ToUppercase 'Вторник'
ВТОРНИК

Remove the Accents (diacritics)

The following function can be used to remove the accents of Unicode words. This is useful to normalize the text input of user when used for searching.

 r←RemoveAccents string;str;strFormD;stringBuilder;⎕USING
⍝ Function to remove the accents.
⍝ For example: 'Crème Brûlée' becomes 'Creme Brulee'

⍝ Adapted from the following posts:
⍝ http://www.siao2.com/2005/02/19/376617.aspx
⍝ http://www.siao2.com/2007/05/14/2629747.aspx

 ⎕USING←'System,mscorlib.dll' 'System.Text,mscorlib.dll' 'System.Globalization,mscorlib.dll'

 str←⎕NEW String(⊂string)

 strFormD←str.Normalize(NormalizationForm.FormD)

 stringBuilder←⎕NEW StringBuilder

 {UnicodeCategory.NonSpacingMark≠CharUnicodeInfo.GetUnicodeCategory(⍵):{}stringBuilder.Append(⍵)}¨strFormD

 str←⎕NEW String(⊂stringBuilder.ToString ⍬)

 r←str.Normalize(NormalizationForm.FormC)

And here is some utilization of the function:

      RemoveAccents 'Crème Brûlée'
Creme Brulee

      RemoveAccents 'âãäåçèéêë ìíîïðñòó ôõöùúûüý'
aaaaceeee iiiiðnoo ooouuuuy


CategoryDyalog CategoryDyalogDotNet CategoryDyalogExamplesDotNet

netUpperLowerCase (last edited 2015-04-14 20:18:23 by PierreGilbert)