Home     Contact us    

Regex Metacharacters

In this article you will learn about metacharacters in regular expressions and their special meaning. Learn how they are used to get the desired regexp pattern match

In literal characters search a character matches a character, simple. C will match C in Cars and e will match e in pen and idea will match idea. Well, try to search for a parenthesis in your test string. 

If you want to learn Regex with Simple & Practical Examples, I will suggest you to see this simple and to the point Complete Regex Course with step by step approach & exercises. This video course teaches you the Logic and Philosophy of Regular Expressions from scratch to advanced level.

Lets say your test string is   it is an example of convolutional neural network (cnn).

You want to match (cnn) and write the regex as 

/ (cnn) /

Now if you try to write this regex even the first character ( an opening parenthesis will raise a pattern error. Now what is this. Why does writing ( creates an error in regex. It simply means one thing ( can not be matched with a simple ( as is done in literal search. Actually some characters are assigned special meanings in regex for processing regular expressions searching sorting and matching operations. These characters are called special characters or metacharacters. 

If you try to match 2+4=6 by simply

/ 2+4=6 / 

It will not raise an error but it will say NO MATCH. Which may seem a bit strange. But + is a metacharacter and it has special meaning in regex hence we can not use + in its literal meaning by simply writing +

Definition: A metacharacter is a character that has special meaning instead of its literal meaning. 

Metacharacters are used in regular expressions to define the search criteria and text manipulations.

Most metacharacters are errors when used alone or they produce unexpected results.

Special care is needed to handle special characters :)

Types of Metacharacters: 

There are two basic types of metacharacters. Single character metacharacters and double character metacharacters. Double character metacharacters are also called shorthand character classes. We are going to discuss them one by one. 


Single Character Metacharacters:


First is single character metacharacters. As the name shows these metacharacters consist of only single characters and ofcourse they have special meaning and they can not be used as a literal character alone and most of them raise a pattern error when they are used alone. Even if they don't raise an error they will produce unexpected results. Here are twelve basic metacharacters.

Metacharacters-Basics

There are twelve basic metacharacters or characters with special meanings which are reserved for special use. These characters are the backslash \, the caret ^ sign, the dot or period ., the pipe symbol |, the dollar sign $, the question mark ?, the asterisk *, the plus sign +, the opening and closing parenthesis ( ), the opening square brackets [ and the opening curly bracket {. All these are special characters and usually raise an error flag when used alone. 

Different flavors of regex may have a little difference in the interpretation of these metacharacters. Some of the special meaning or use are discussed here, while their role is discussed under different headings later in this tutorial. 

Metacharacters for repetition as quantifiers. ? * + these three metacharacters are used as quantifiers in regex.  ? means zero or one repetition hence it is used for making anything optional. * means zero or more repetition while + plus sign means one or more repetition. ^ and $ are used for anchoring. ^ is used for start of string while $ is used for matching at the end of string or line in multiline mode. | or pipe is used for alternation which means selecting out of two or more options. dot or period is used to match all as it is a wildcard character. Parenthesis are used for creating groups. Square brackets are used for creating character classes.

How to Escape a Metacharacter 

There is one question what about matching these metacharacters in regex. For example if you want to match ? the question mark then how to match it as only writing question mark either will raise an error or will produce unexpected results if used with other characters. The answer is very simple. Use it with a backslash before it. just like 

/ \? /

When any single character metacharacter is used with a backslash just before it, the metacharacter loses its special meaning and behavior and just becomes a literal character. Now you can use it to match it in literal sense. 

For matching a backslash? any views. Yes, as it is a metacharacter, to escape we have to use a backslash before it. Hence to match a backslash use this regex

/ \\ /

One more example here. Test string: 2+2=4 

/ 2\+2=4 /

And for a more practical application

Test string: http://www.regextutorial.org

/ http:\/\/www\.regextutorial\.org /

So if you want to escape a metacharacter use backslash just before metacharacter. 

Now there are some more special metacharacters.

Metacharacter classes: 

secondary metacharacters

You will find it sort of funny that some metacharacters are metacharacters alone and when we use backslash before them they are converted into literal characters. While some metacharacters have a backslash with them, they have their special behavior in the presence of this backslash. And if this backslash is dropped they lose their special behavior and become a literal character. Actually these special metacharacters are also called as shorthand character classes. As you can match a group of characters with one metacharacter. And the question about how to escape these simply avoid the backslash.

\d is used to match numbers from zero to nine i.e. 0123456789

In order to use it for more than one digit, a quantifier is used with \d. 

\D will match any character but a digit. You can call it a negation of \d. It will match all but \d.

\w is used to match word characters upper and lower a to z alphabets, \d and underscore. All of them.

\W is a negation of \w which means match all but \w

\s is used to match white space while \S is negation of \s. 

So this is a brief introduction to different metacharacters, which will be discussed in detail in coming sections of this regex tutorial.




Features

  • Regex made Easy for all
  • High School and College
  • University level
  • Professionals doctors, engineers, scientists
  • Data Analysts