Basic or extended regex?

Regular expressions (regex) come in several flavours. The two main ones you will come across are:
– Basic Regular Expressions (BRE)
– Extended Regular Expressions (ERE)

From a practical standpoint, the main difference between the two lies in the way they handle special characters.

These characters perform a special task or give a certain meaning to the character or the sequence of characters that precedes them, such as . that matches any character, or $ that specifies that a matching sequence must be at the end of a line.

If you have absolutely no clue what this is all about, I have a post coming on regex that should get you started.

Basic Regular Expressions

The BRE format is kinda the legacy format, used in the earlier versions of the UNIX grep command.
This is the format you will use to search for emails in a text file, using a command like the following:
cat test.txt | grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+\.[a-zA-Z.]\{2,\}"

With the BRE syntax, characters such as ?, +, {, |, (, and ) require a backslash to gain their attributes as special characters. Otherwise, they are considered standard characters.

In the example above, you will notice the +, { and } characters are escaped (yes, a backslash is needed before a closing } as well).

Still with me? Good. Now, if an unescaped caret (^) appears inside the regex (meaning neither at the beginning of a line, nor directly after a \( or \| sequence), it will be considered an ordinary character and will not act as its corresponding special character (the ^ special character specifies that a match needs to be at the beginning of a line).

Likewise, if an unescaped $ appears inside the regex (meaning neither at the end of a line, nor directly before a \| or \) sequence), it will be considered an ordinary character and will not act as its corresponding special character (the $ special character specifies that a match needs to be at the end of a line).

Finally, if an unescaped * appears at the beginning of the regex, or appears directly after \( or \| or a caret (^), it will be considered an ordinary character and not a repetition operator. To use it as a repetition operator, you will need to escape it as such: \*

Extended Regular Expressions

The ERE format is more up to date. This is the one you will be using with tools like https://regex101.com or https://regexr.com (two great resources to check the validity of a regex – I strongly suggest you bookmark them).

You will also use the ERE format with grep commands such as the following:
cat test.txt | grep -oE "[a-zA-Z0-9._]+@[a-zA-Z]+\.[a-zA-Z.]{2,}"

With the ERE syntax, characters such as ?, +, {, |, (, and ) act as special characters by default. Add a backslash if you want to escape these characters and use them as ordinary characters.

In the example above, notice the +, { and } characters are not escaped, unlike the example in the previous paragraph.
Also notice the -oE flag instead of -oe in the grep command.

Which one should you use?

For consistency, I generally use the ERE syntax, which can be easily flight checked in https://regex101.com or https://regexr.com.
When I need to use it in the terminal as part of a grep command, I make sure to include the -oE flag and not the -oe flag.

Special thanks to Dana Epp who put me on the right track to research this story. 🙂

Hi! I'm a tech journalist, getting my feet wet in ethical hacking. What you will find here is me taking notes on the tools and techniques I’m learning and offering answers to the questions I had when I first got started not so very long ago.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top