[closed] Minimum String to match

Guest · Post by **Guest** » Fri Jun 17, 2005 9:00 am

Hi Walt,

I believe the use of 'Regular Expressions' (found under Options - Comparison) might help you.

With RE you'll be able to define a match for a paragraph number, and make Compare It ignore all matching strings. The help file is very clear on the use of RE

Also for your second problem the use of RE can be of assistance. You'll be able to set a RE to search for where(R1), replace with were(R2) and select 'Replace' in the drop down box offered while defining your RE.

Any change from where to were will automaticcally be ignored while comparing the documents

G

Post by **grigsoft** » Fri Jun 17, 2005 12:24 pm

I tend to agree with G

You can easily create rule to ignore changes in numbering, and add rules to always see "then"="than", etc. However I'm afraid that global switch "ignore N non-matching chars" may result in unexpected comparison errors loss.

WaltP · Post by **WaltP** » Thu Jun 23, 2005 2:06 am

Anonymous wrote:Hi Walt,

I believe the use of 'Regular Expressions' (found under Options - Comparison) might help you.

With RE you'll be able to define a match for a paragraph number, and make Compare It ignore all matching strings. The help file is very clear on the use of RE

I think I can see how this one can be used.

Anonymous wrote:Also for your second problem the use of RE can be of assistance. You'll be able to set a RE to search for where(R1), replace with were(R2) and select 'Replace' in the drop down box offered while defining your RE.

Any change from where to were will automaticcally be ignored while comparing the documents

G

The problem with this is both files will have to be processed because this RE will change every occurrence of "then" to "than" -- even the ones that are correct. I'm assuming this is preprocessing the files before the actual comparison is studied.

grigsoft wrote: You can easily create rule to ignore changes in numbering, and add rules to always see "then"="than", etc. However I'm afraid that global switch "ignore N non-matching chars" may result in unexpected comparison errors loss.

Yes, I'm aware some small changes can fall thru the cracks with this idea -- but that is the idea. If there are worthwhile changes to be verified, 90% of then will involve words and paragraphs, not individual characters. In the rare cases they involve only a couple characters (like the recent 50=>200 change we had) this won't work. But for most changes it would save a lot of time, especially if my "New Thought" is considered:

Alternately, and probably better, process like Ignore inserted empty lines, i.e. simply do not add these to the Overview Bar but keep the changes flagged in the text panes.

This will allow the small changes to be flagged in the displays but only larger changes will be shown in the Overview. And this can easily be turned on and off by changing the number of min characters.

Guest · Post by **Guest** » Thu Jun 23, 2005 12:39 pm

WaltP wrote:
The problem with this is both files will have to be processed because this RE will change every occurrence of "then" to "than" -- even the ones that are correct. I'm assuming this is preprocessing the files before the actual comparison is studied.

Hi Walt.
This is a "virtual replace statement" and it does not influence your docs at all! I understand it to work like this: When loading both files in memory CompareIt replaces every occurrence of "then" to "than". After that it will compare the files but it will never highlight the non matching 'then' to 'than', since it is not aware of any 'then' being present in the original file. In its memory it only knows 'than'. Surely it is a clever feature since changing the docs and saving them back to disk leaves any original 'then' intact.

Beware of this: In case the two strings do not have the same lenght you have to set the longer string in the R1-box and have it replaced with the shorter (R2-box).

I mainly use CompareIt with numbers, not with text. May I ask you, since you work with doc-files and rtf- files: Why don't you use the Word native capacity to compare files? Look for: Tools - Track Changes - Compare Documents...

Gerrit

Guest · Post by **Guest** » Thu Jun 23, 2005 1:01 pm

Igor,
The feature Walt asks for made me try the capacities of RE on this.

I define a RE like '\S{1,2}'. This should match any entity not containing a non-spacing character with a lenght of 1 to 2. Likewise a RE like '\S{1,5}' should match any string of 5 characters or shorter.

If I use the replace statement and make this string replace with a space (blank) and I set the option in the options-dialop tp ignore blanks it should filter out any smaller strings as defined.

I cannot get this to work. Even longer strings that are different are skipped.
Maybe you should look into this, since using this RE might be attractive for a lot of users.

Gerrit

Post by **grigsoft** » Thu Jun 23, 2005 2:11 pm

I have not yet tried this, but if you want to skip whole words, you have to add either blanks on ends, or line start/end marker - otherwise your regexp would also match any part of longer string, and if you haven't set "Use once only", whole non-space line would be "eaten"

Guest · Post by **Guest** » Thu Jun 23, 2005 8:02 pm

Oh, yes, ofcourse, how could I have missed that.

More correct would be this search string: \S{1,3}(\s|\.|,|:|;|!|\?)

look for any mach of (max) three characters not containing any white space followed by one character that is either
a white space
a .
a ,
a :
a ;
a ! or
a ?

Add anything that you consider relevent...

This is actually made for text only. Please note that it would malfunction on numbers like $ 123.000 because of the point. You might miss vital changes in your budget with this on...

But nice to know!

Gerrit