Regular Expressions and VBScript

Regular Expressions provide a much more powerful and efficient way of manipulating strings of text than the use of a variety of standard string functions. They have a reputation of being cryptic and difficult to learn, but are actually quite easy to learn and use.

The RexExp Object

The RegExp object has three properties and three methods. The properties are:

  • Pattern property - holds the regular expression pattern
  • Global property - True or False (default False). If False, matching stops at first match.
  • IgnoreCase property - True or False (default True). If True, allows case-insensitive matching

The methods are:

  • Execute method - executes a match against the specified string. Returns a Matches collection, which contains a Match object for each match. The Match object can also contain a SubMatches collection
  • Replace method - replaces the part of the string found in a match with another string
  • Test method - executes an attempted match and returns True or False

To set up a RegExp object:

Dim re
Set re = New RegExp
With re
    .Pattern = "some_pattern"
    .Global = True
    .IgnoreCase = True
End With

A Pattern can be any string value. For example, if the pattern is "Hello World", the RegExp object will match that in the target string. If IgnoreCase is True, it will match any case, so "hellO wORld" would be matched. If Global is set to True, it will contine to search the string for all instances of "Hello World". If False, it will stop searching after the first instance is found.

Execute method, returning Matches collection

Dim re, targetString, colMatch, objMatch
Set re = New RegExp
With re
  .Pattern = "a"
  .Global = True
  .IgnoreCase = True
End With 
targetString = "The rain in Spain falls mainly in the plain"

Set colMatch = re.Execute(targetString)
For each objMatch  in colMatch
  Response.Write objMatch.Value & "<br />"
Next 

The above will produce a list of 5 letter a's.

Test method, returning True or False

Dim re, targetString
Set re = New RegExp
With re
  .Pattern = "a"
  .Global = False
  .IgnoreCase = False
End With
targetString = "The rain in Spain falls mainly in the plain"

re.Test(targetString)

The above will return True as soon as it hits the first instance of "a"

Metacharacters

Metacharacters are special characters that can be combined with literal characters (which is all that have been used so far) to extend the power of Regular Expressions way beyond the simple examples already seen, and are what set Regular Expressions apart from simple string functions.

Character Description
\ Marks the next character as either a special character or a literal. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\\" matches "\" and "\(" matches "(".
^ Matches the beginning of input.
$ Matches the end of input.
* Matches the preceding character zero or more times. For example, "zo*" matches either "z" or "zoo".
+ Matches the preceding character one or more times. For example, "zo+" matches "zoo" but not "z".
? Matches the preceding character zero or one time. For example, "a?ve?" matches the "ve" in "never".
. Matches any single character except a newline character.
(pattern) Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n]. To match parentheses characters ( ), use "\(" or "\)".
x|y Matches either x or y. For example, "z|wood" matches "z" or "wood". "(z|w)oo" matches "zoo" or "wood".
{n} n is a nonnegative integer. Matches exactly n times. For example, "o{2}" does not match the "o" in "Bob," but matches the first two o's in "foooood".
{n,} n is a nonnegative integer. Matches at least n times. For example, "o{2,}" does not match the "o" in "Bob" and matches all the o's in "foooood." "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m} m and n are nonnegative integers. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood." "o{0,1}" is equivalent to "o?".
[xyz] A character set. Matches any one of the enclosed characters. For example, "[abc]" matches the "a" in "plain".
[^xyz] A negative character set. Matches any character not enclosed. For example, "[^abc]" matches the "p" in "plain".
[a-z] A range of characters. Matches any character in the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range "a" through "z".
[^m-z] A negative range characters. Matches any character not in the specified range. For example, "[m-z]" matches any character not in the range "m" through "z".
\b Matches a word boundary, that is, the position between a word and a space. For example, "er\b" matches the "er" in "never" but not the "er" in "verb".
\B Matches a non-word boundary. "ea*r\B" matches the "ear" in "never early".
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any white space including space, tab, form-feed, etc. Equivalent to "[ \f\n\r\t\v]".
\S Matches any nonwhite space character. Equivalent to "[^ \f\n\r\t\v]".
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
\W Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
\num Matches num, where num is a positive integer. A reference back to remembered matches. For example, "(.)\1" matches two consecutive identical characters.
\n Matches n, where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long. For example, "\11" and "\011" both match a tab character. "\0011" is the equivalent of "\001" & "1". Octal escape values must not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions.
\xn Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows ASCII codes to be used in regular expressions.

Examples

\d+ will match any digit one or more times, and is the equivalent to [0-9]+
<[^>]*> will match any html tag, and looks for an opening "<", followed by anything that isn't a closing block ">", followed finally by a closing block ">". It uses a "negative character set" [^>]

Constructing a RegExp pattern

Form input validation is a key area in which regular expressions can be used, and a common task is to validate the structure of an email address. Initially, the task needs to be broken down into its constituent rules:

  • Must have 1 or more letters or numbers
  • Can have underscores, hyphens, dots, apostrophes
  • Must have an "@" sign following this
  • First part of domain name must follow the "@", and must contain at least 3 letters or numbers
  • May contain underscore, dots or hyphen
  • Must be at least one dot, which must be followed by the TLD.

"[\w\-\'\.]+@{1}[\w\.?\-?]{3,}\.[\a-z]+" will do it, but can be improved upon depending on how specific you want to be.

SubMatches collection

There will be instances where, once a match is found, you want to extract parts of that match for later use. As an example, suppose you have an html page which contains a list of links:

<a href="somepage.asp?id=12345">Company A</a><br />
<a href="somepage.asp?id=45678">Company B</a><br /> 
<a href="somepage.asp?id=66745">Company C</a><br />
<a href="somepage.asp?id=33471">Company D</a><br /> 
<a href="somepage.asp?id=90765">Company E</a><br /> 
...

The required parts are the Company name and the id in the querystring. These need to be collected and inserted into a database, for example. The html is fed in as the strSearchOn, and the pattern uses parenthesis to search for each item - The id ([0-9]{5}), which is a 5 digit number, and ([\w\s]+) which collects a series of letters and spaces, and will stop collecting them when the opening angle bracket is reached (</a>).

Set objRegExpr = New regexp
objRegExpr.Pattern = "somepage.asp\?id=([0-9]{5})" & chr(34) & ">([\w\s]+)"
objRegExpr.Global = True 
objRegExpr.IgnoreCase = True
set colmatches = objRegExpr.Execute(strSearchOn)
For Each objMatch in colMatches
id = objMatch.SubMatches(0)
company = objMatch.SubMatches(1)
sql = "Insert Into table (idfield, company) Values (" & id & ",'" & company & "')"
conn.execute(sql)
Next

Date Posted: Monday, April 9, 2007 12:42 PM
Last Updated: Wednesday, January 2, 2013 5:48 AM
Posted by: Mikesdotnetting
Total Views to date: 85926

3 Comments

Wednesday, January 2, 2013 2:15 AM - paul

you have an error in your test method on this page:
http://www.mikesdotnetting.com/Article/24/Regular-Expressions-and-VBScript

you write: Set rs = New RegExp

I think you meant: Set re = New RegExp

Wednesday, January 2, 2013 5:50 AM - Mike

@paul

Thanks - it only took about 6 years for someone to spot that!

Monday, October 21, 2013 10:08 AM - Mike^

One small correction on the email validation: the domain part can be two or more characters, not three or more (as in "ey.com").
Add your comment

If you have any comments to make about this article, please use this form to do so. Make sure that your comment relates specifically to the article above. More general comments can be posted through the form on the Contact page.

Please note, all comments are moderated, and some may not be published. The kind of things that will ensure your comment is deleted without ever seeing the light of day are as follows:

  • Not relevant to the article
  • Gratuitous links to your own site or product
  • Anything abusive or libellous
  • Spam
  • Anything in a language I don't understand including gibberish.

I do not pass email addresses on to spammers, so a valid one will assist me in responding to you personally if required.

Recent Comments

Gautam 11/20/2014 8:01 AM
In response to I'm Writing A Book On WebMatrix
Hello Mike, I read your book, loved it! However, I have a few request/suggestions: 1) an example...

Bret Dev 11/19/2014 8:39 PM
In response to The Difference Between @Helpers and @Functions In WebMatrix
Excellent post! One concern - where can you place global @Functions code within an MVC project to Is...

Rob Farquharson 11/19/2014 4:28 PM
In response to iTextSharp - Links and Bookmarks
How can I place text at an absolute position on the page? Also, how can I rotate text?...

Andy 11/17/2014 8:08 PM
In response to MVC 5 with EF 6 in Visual Basic - Sorting, Filtering and Paging
Hello I'm testing your sorting instructions above. This is great and I was able to get it to work...

Gautam 11/17/2014 5:51 PM
In response to WebMatrix - Database Helpers for IN Clauses
Hi Mike, I am very new to programming: In the above example if I want to use a delete button the...

donramon 11/17/2014 3:22 PM
In response to Entity Framework 6 Recipe - Alphabetical Paging In ASP.NET MVC
Congratulations on your new website look and the excellent articles. Thank you!...

Gautam 11/17/2014 11:26 AM
In response to Looking At The WebMatrix WebGrid
Hi Mike, I add the jquery script at the end of my html file.. when ajax attribute is added to the be...

Chet Ripley 11/15/2014 6:57 PM
In response to Adding A New Field
It appears the command is case sensitive. I had the same issue as Cameron. When I changed the to it...

Alvin 11/14/2014 12:49 PM
In response to Razor Web Pages E-Commerce - Adding A Shopping Cart To The Bakery Template Site
Great article Mike! When do you plan to extend the bakery shopping cart beyond this point?...

Gautam 11/14/2014 10:16 AM
In response to Web Pages - Efficient Paging Without The WebGrid
to get the count can we use only the below sql, why to join category and author table var sql =...