cheapesthostings1

Posts

Showing posts from December, 2015

Unicode Regular Expressions

December 11, 2015

I have long been familiar with processing Unicode characters with RegExp (Regular Expressions). I was also aware that RegExp could be used to match Unicode characters based upon their Unicode assigned character properties. I had not yet though coded such property based RegExp. A few days ago I decided to explore this area. An interesting property, for example, is the script to which a character belongs. e.g. \p{Hangul} will match with any character which belongs to the Hangul script. Hangul is the script used to write Korean. I started with Perl and here is my simple Perl program: #!/usr/bin/perl if("노팅엄"=~/^\p{Hangul}+$/){print "korean\n";}else{print "not korean\n";} ...and this code did not work. I know that 노팅엄 is Korean hangul but my code disagreed. After much searching I discovered I needed to include the statement use utf8 which instructs Perl to use Unicode UTF8 encoding. So my working version of the code is: #!/usr/bin/perl use utf8; if("노...