September 17, 2003

Regular Expressions & Text Processing!

It was actually many years ago when I got into the first editors and IDEs which supported RegExp as a language to enhance their Find/Replace features. Well, I was rarely using them to do that complex searchings within a text file or even a full directory structure (names and so).

But time has changed! Now I have some great applications fitting in RegExp playground. Recently I had to do some text processings at two of my personal projects to extract and convert some texts from one text format to another. A bunch of conversion/extraction rules... One of the options I had was a fully grammar based structured text processing engine like Chaperon but on my tests, it was simply failing as the text goes even a bit out of the defined grammar which was exactly my case. On the other hand, Chaperon 's main goal (which is quite great at it) is to process and convert a fully structured text to XML (so not simple text). Obviously this was not the case for RegExp, since the worst case was so that some of the expressions (actually my matching rules) didn't match any out-of-grammar texts so didn't do any conversion too which was okay for me. It didn't stop or break the whole conversion process so a big positive to RegExp for this great flexibility :-)

The implementation I chose was Apache ORO, one of the best and most complete ones in java. I'm highly satisfied with it! Really easy API to work with and it does the job well. All I did, was to define a list of my matching rules (expressions) as well as their corresponding replacements (e.g. Find blabla and output it as newnew) in a text file. Then having a main class to read and interpret the text file, construct the needed rule classes and put them in a pool to be processing by ORO engine one by one. All wonderful and simple, yet powerful... Will definitely use RegExp's more in my future projects, whenever they fit :-)

Armond

Posted by armond at September 17, 2003 06:13 PM
Comments

test

Posted by: test at February 4, 2004 11:37 PM