Home >  Products >  Compare It! 3 >  Online Help
Welcome to Compare It!
About Compare It!
Installing and Removing Compare It!
What's New in This release?
Quick Start
User Interface Basics
Application Window Illustration
Using Compare It!
Comparing files
Merging Files
Editing Files
Setting Options
File Filters
Advanced Comparison
Printing and Reporting
Print options
Print Options: Margins
Print Options: Advanced
Comparison report
Comparison statistics
Keyboard Shortcuts, Toolbar Buttons, Menu Commands
Advanced Functions
Adding Shortcuts to External Applications
Adaptive comparison
Auto Backup
Automatch rules
Command Line Usage
Configuration File
Changes Only Mode
Custom Syntax Files
Defining Column Ranges
Enhanced Open File
File lists support
Folders comparison
Ignore Difference
Manual Matching
Merged Report
Moved Sections
Partial matching
Result File Support
Unicode Support
Using Compare It! with File Manager
Using fixed/alternative font
Using Regular Expressions
Using RegExps substrings
Word files comparison
Excel files comparison
Other file formats support
Zip File Support
General Information
What is Shareware?
How To Register
Warranty and License
Future Releases
Other Products
Frequently Asked Questions
Updates and Support
Using Regular Expressions Using fixed font Using RegExp substrings

Normally, when you search for a sub-string in a string, the match should be exact. So if you search for a sub-string "abc" then the string being searched should contain these exact letters in the same sequence for a match to be found.

We can extend this kind of search to a case-insensitive search, where the sub-string "abc" will find strings like "Abc", "ABC" and so on. That is, case is ignored but the sequence of the letters should be exactly the same. Sometimes, a case insensitive search is still not enough. For example, if we want to search for numeric digits, then we basically end up searching for each digit independently. This is where regular expressions come in to help.

Regular expressions are text patterns that are used for string matching. Regular expressions are strings that contain a mix of plain text and special characters to indicate what kind of matching to do. Here is a very brief tutorial on using regular expressions before we move on to the code for handling regular expressions.

Regular Expressions Syntax

All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "\". A literal is a character that matches itself.

The dot character "." matches any single character.

A repeat is an expression that is repeated an arbitrary number of times.
An expression followed by * can be repeated any number of times including zero.
An expression followed by + can be repeated any number of times, but at least once.
An expression followed by ? may be repeated zero or one times only.
When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator {} may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds.

	"ba*" will match all of "b", "ba", "baaa" etc.
	"ba+" will match "ba" or "baaaa" for example but not "b".
	"ba?" will match "b" or "ba".
	"ba{2,4}" will match "baa", "baaa" and "baaaa".

Parentheses () are used to group items together into a sub-expression. For example, the expression "(ab)*" would match all of the string "ababab".

Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.

	"a(b|c)" could match "ab" or "ac".
	"abc|def" could match "abc" or "def".

A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, and character classes. Set declarations that start with "^" contain the complement of the elements that follow.

	Character literals:
	"[abc]" will match either of "a", "b", or "c".
	"[^abc]" will match any character other than "a", "b", or "c".
	Character ranges:
	"[a-z]" will match any character in the range "a" to "z".
	"[^A-Z]" will match any character other than those in the range "A" to "Z".

Character classes
A character class is a special sequence to simplify common-used character types. Available classes are:

Class Description Equivalent
\w Any word character - all alphanumeric characters plus the underscore. [a-zA-Z_]
\s Any whitespace character (spaces and tabs).  
\d Any digit. [0-9]
\l Any lower case character. [a-z]
\u Any upper case character. [A-Z]

The uppercase version of these classes means NOT, for example, \S is non-spacing character.

Summary of Regular Expressions Syntax Elements

The following table summarizes the syntax elements used in regular expressions.

Character Description
^ Beginning of the string. The expression "^A" will match an "A" only at the beginning of the string.
^ The caret (^) immediately following the left bracket ([) has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression "[^0-9]" indicates that the target character should not be a digit.
$ The dollar sign ($) will match the end of the string. The expression "abc$" will match the sub-string "abc" only if it is at the end of the string.
| The alternation character (|) allows either expression on its side to match the target string. The expression "a|b" will match "a" as well as "b".
. The dot (.) will match any character.
* The asterisk (*) indicates that the character to the left of the asterisk in the expression should match 0 or more times.
+ The plus (+) is similar to asterisk but there should be at least one match of the character to the left of the + sign in the expression.
? The question mark (?) matches the character to its left 0 or 1 times.
() The parenthesis affects the order of pattern evaluation and also serves as a tagged expression that can be used when replacing the matched sub-string with another expression.
[] Brackets ([ and ]) enclosing a set of characters indicates that any of the enclosed characters may match the target character.
{N} Repeats expression exactly N times.
{N,M} Repeats expression between N and M times.
{N,} Repeats expression N or more times.


I would like to thank Dr John Maddock for the description and implementation of the regular expressions engine.

© 1996-2009, Grig Software, All Rights Reserved Using fixed font Using RegExp substrings
Browser Based Help. Published by chm2web software.