ORF Forum  >  ORF Technical Support  >  regex matches when it shouldnt?

regex matches when it shouldnt?

 
Hello,

I just discovered that our sender-blacklist regex:
.*@*ebay*

seems to match ANYTHING that has "eba" anywhere before or after the @ sign... disregarding the Y completely.

The goal was to block "anything followed by an @ sign followed by anything which includes the letters ebay anywhere after the @ sign"

Is the Y sacred or something? why does the regex disregard it?

http://img257.imageshack.us/img257/791/ebayhuh.th.jpg[/IMG][/URL]
Bryon Humphrey (November 1, 2011)
oh wrong screenshot link, here's the full size one:
http://imageshack.us/photo/my-images/257/ebayhuh.jpg/
Bryon Humphrey (November 1, 2011)
actually, the current expression means "any character (any number of repetitions), followed by @ (any number of repetitions), followed by eba followed by y (any number of repetitions)". You should try

.*@.*ebay.*

instead.
Krisztian Fekete (Vamsoft) (November 2, 2011)
in response to
I see, i will give that a whirl

Curious tho, why does it match when i dont include a Y in the sample and the eba is left of the @?

bryon (November 2, 2011)
Because * in regular expressions means "any number of repetitions" including zero repetitions.

* means any repetitions
+ means one or more
? means zero or one

So even if the @ character is missing, the original expression will match any text including "eba"
Krisztian Fekete (Vamsoft) (November 2, 2011)
so if i understand correctly - and i dont meant to turn this into a regex class, i just like to learn everything i can:

.*@*ebay* would mean anything including nothing, with an @ sign after it, followed by anything including nothing, followed by the characters ebay, followed by anything including nothing

doesn't that force "ebay" to be to the right of the @ sign? how does it take the Y out of the equasion? or does the final * modify the Y (and in that case the middle * modifies the @ ?)

Bryon (November 2, 2011)
The expression can be broken into 4 parts:

.* means any character, any number of repetitions (zero or more) followed by
@* meaning @ character, any number of repetitions (zero or more) followed by
eba, followed by
y* meaning the y character, any number of repetitions (zero or more)

The last part takes out Y out of the equation because the * wildcard (any repetitions, zero or more) is always applied to the preceding character. The @ is taken out as well the same way, so "eba" will match regardless of its position of the @ character, moreover, it will also match if @ is absent.
Krisztian Fekete (Vamsoft) (November 2, 2011)
oh i see, so the period turns on a general wildcard for the asterisk... but without the period, the asterisk modifies the previous character

thanks for taking the time to explain that to me
Bryon Humphrey (November 3, 2011)
you are welcome :) Yes, basically in regular expressions the dot character is the wildcard for any character and the number of repetitions is controlled by the trailing * character. If you want to match the dot character itself in a regex, you should "escape" it using backslash like:

.*@vamsoft\.com
Krisztian Fekete (Vamsoft) (November 3, 2011)
in response to
your regexp should be .*@.*ebay.* if you want to filter any string contains @ebay or @(any number of letter or number)ebay
sungpill Han (6 months ago)

1. Your name:

2. Your email address (will not be published):

3. Your comment:

4. Please enter the words below: (must be completed only once)