Differences between revisions 15 and 16
Revision 15 as of 2015-04-05 01:08:04
Size: 678
Comment:
Revision 16 as of 2015-04-14 20:17:16
Size: 1841
Comment: Function RemoveAccents added
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
== Remove the Accents (diacritics) ==
The following function can be used to remove the accents of Unicode words. This is useful to normalize the text input of user when used for searching.
{{{
 r←RemoveAccents string;str;strFormD;stringBuilder;⎕USING
⍝ Function to remove the accent.
⍝ For example: 'Crème Brûlée' becomes 'Creme Brulee'

⍝ Adapted from the following posts:
⍝ http://www.siao2.com/2005/02/19/376617.aspx
⍝ http://www.siao2.com/2007/05/14/2629747.aspx

 ⎕USING←'System,mscorlib.dll' 'System.Text,mscorlib.dll' 'System.Globalization,mscorlib.dll'

 str←⎕NEW String(⊂string)

 strFormD←str.Normalize(NormalizationForm.FormD)

 stringBuilder←⎕NEW StringBuilder

 {UnicodeCategory.NonSpacingMark≠CharUnicodeInfo.GetUnicodeCategory(⍵):{}stringBuilder.Append(⍵)}¨strFormD

 str←⎕NEW String(⊂stringBuilder.ToString ⍬)

 r←str.Normalize(NormalizationForm.FormC)
}}}
And here is some utilization of the function:
{{{
      RemoveAccents 'Crème Brûlée'
Creme Brulee

      RemoveAccents 'âãäåçèéêë ìíîïðñòó ôõöùúûüý'
aaaaceeee iiiiðnoo ooouuuuy
}}}
Line 28: Line 61:
CategoryDyalog CategoryDyalogDotNet CategoryDyalogExamplesDotNet    CategoryDyalog CategoryDyalogDotNet CategoryDyalogExamplesDotNet

netUpperLowerCase

LowerCase and UpperCase

The 2 functions ToLowercase and ToUppercase are used when dealing with Unicode characters:

 ToLowercase←{
     (0=1↑0⍴⍵):''
     ⎕USING←',mscorlib.dll'
     (⎕NEW System.String(⊂,⍵)).ToLowerInvariant
 }

 ToUppercase←{
     (0=1↑0⍴⍵):''
     ⎕USING←',mscorlib.dll'
     (⎕NEW System.String(⊂,⍵)).ToUpperInvariant
 }

      ToUppercase 'monday'
MONDAY

      ToUppercase¨ 'sunday' 'monday' 'tuesday'
 SUNDAY  MONDAY  TUESDAY

      ToUppercase 'Вторник'
ВТОРНИК

Remove the Accents (diacritics)

The following function can be used to remove the accents of Unicode words. This is useful to normalize the text input of user when used for searching.

 r←RemoveAccents string;str;strFormD;stringBuilder;⎕USING
⍝ Function to remove the accent.
⍝ For example: 'Crème Brûlée' becomes 'Creme Brulee'

⍝ Adapted from the following posts:
⍝ http://www.siao2.com/2005/02/19/376617.aspx
⍝ http://www.siao2.com/2007/05/14/2629747.aspx

 ⎕USING←'System,mscorlib.dll' 'System.Text,mscorlib.dll' 'System.Globalization,mscorlib.dll'

 str←⎕NEW String(⊂string)

 strFormD←str.Normalize(NormalizationForm.FormD)

 stringBuilder←⎕NEW StringBuilder

 {UnicodeCategory.NonSpacingMark≠CharUnicodeInfo.GetUnicodeCategory(⍵):{}stringBuilder.Append(⍵)}¨strFormD

 str←⎕NEW String(⊂stringBuilder.ToString ⍬)

 r←str.Normalize(NormalizationForm.FormC)

And here is some utilization of the function:

      RemoveAccents 'Crème Brûlée'
Creme Brulee

      RemoveAccents 'âãäåçèéêë ìíîïðñòó ôõöùúûüý'
aaaaceeee iiiiðnoo ooouuuuy


CategoryDyalog CategoryDyalogDotNet CategoryDyalogExamplesDotNet

netUpperLowerCase (last edited 2015-04-14 20:18:23 by PierreGilbert)