Posts Tagged ‘regular expressions’

Baseplane Tools: Regular Expressions

Thursday, March 13th, 2008

Regular Expressions are a great tool for programmers on any platform. We aim to highlight the skills that work well for programmers on any system and this is definitely one of them. Regular Expressions are not black magic, they are very helpful and when it comes to templating, highlighting or any kind of searching textual content they are the best tool for the job.

Some Platforms or Systems that use Regex

One of the best regular expression sites is just http://www.regular-expressions.info/ but there are many. If you are new to regular expressions or finally want to start understanding what makes them work

Some samples of common programming tasks:

Example of White Space Trimming

You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace. Do both by combining the regular expressions into ^[ \t]+|[ \t]+$ . Instead of [ \t] which matches a space or a tab, you can expand the character class into [ \t\r\n] if you also want to strip line breaks. Or you can use the shorthand \s instead.

Grabbing HTML tags or Markup

<TAG\b[^>]*>(.*?)</TAG> matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do. This regex will not properly match tags nested inside themselves, like in <TAG>one<TAG>two</TAG>one</TAG>.

<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1> will match the opening and closing pair of any HTML tag. Be sure to turn off case sensitivity. The key in this solution is the use of the backreference \1 in the regex. Anything between the tags is captured into the second backreference. This solution will also not match tags nested in themselves.

This is just a few great uses. When you build your regex libraries, they never die, they span all platforms.

Check out ten great reasons to learn and use regular expressions. Also the javascript RegexPal.

Pseudo-code for using regex in a recursive way

When used by themselves, these regular expressions may not have the intended result. If a comment appears inside a string, the comment regex will consider the text inside the string as a comment. The string regex will also match strings inside comments. The solution is to use more than one regular expression, like in this pseudo-code:

GlobalStartPosition := 0;
while GlobalStartPosition < LengthOfText do
  GlobalMatchPosition := LengthOfText;
  MatchedRegEx := NULL;
  foreach RegEx in RegExList do
    RegEx.StartPosition := GlobalStartPosition;
    if RegEx.Match and RegEx.MatchPosition < GlobalMatchPosition then
      MatchedRegEx := RegEx;
      GlobalMatchPosition := RegEx.MatchPosition;
    endif
  endforeach
  if MatchedRegEx <> NULL then
    // At this point, MatchedRegEx indicates which regex matched
    // and you can do whatever processing you want depending on
    // which regex actually matched.
  endif
  GlobalStartPosition := GlobalMatchPosition;
endwhile

Regex makes you a better solution provider, it give the engineer the tools to transcend platforms with ease. They help to create a baseplane solution in technology and a market platform.

Your Ad Here
Your Ad Here


baseplane – technology platforms is proudly powered by WordPress
Entries (RSS) and Comments (RSS).

Unless othewise specified the content in this site is licensed under a Creative Commons License
Your Ad Here Your Ad Here Your Ad Here Your Ad Here