In very simple terms we can say that
"regular expression is a group of characters that defines a pattern" and using that pattern we find out specific information that we required in our case
So the regular expressions are nothing but a group of characters that have special meanings to regular expression engine which is already installed in .Net framework and represented by System.Text.RegularExpressions.Regex .
Goal of Article
My goal in this article is to give you a basic understanding of regular expressions in very short amount of time. I will guide you that enough so that you can create and use your own regular expressions in your .Net applications to meet your needs. At the end of this article you will be able to create your own regular expressions to match a standard phone number, social security number, email address, postal code etc.
You must have a basic understanding of C# Language and some basic concepts of OOP (object-oriented programming).
Let’s Jump into Regular Expressions
Before I start discussing the syntax and regular expression engine let me answer some questions that you might have in your mind.
If you are confuse and thinking regular expressions are hard to learn then trust regular expressions are not that bad, in fact it's really very easy to learn and use them. Once you are up and running with regular expressions, I believe you will be doing more cool and fun stuff with regex and will be fully utilizing the power of .net regular expressions engine.
What Regular Expressions are?
As I told you earlier regular expressions are nothing but a string of special characters defining a pattern that further will be used in our c# program to extract out specific information from a large block of text.
Why We Need Regular Expressions?
As a .Net developer we work on different types of applications like web, mobile or desktop apps. Now there is one thing is common in all type of application is taking input from users. Users might intentionally or unintentionally put wrong input. Now it’s our duty as a developer to validate that input before we process the information and store that into database. Wrong inputs might crash our applications. So we need regular expressions to validate that input.That was the only one reason but there so many reasons why we use regular expression, it's because after all regular expressions provide us a powerful and fast way to manipulate and parse text. There is plus point for us because regular expressions syntax is same for all types of .net applications.
Uses of Regular Expressions
There are many practical uses of regular expressions but let me tell you some common one’s
• Regular expressions can be used to manipulate and validate user inputs.
• Regular expressions can be used to replace, remove and pull out values from text input.
• You can use regular expressions to parse Html document for taking out some specific data to store in database.
• Regular Expressions might use to find out specific words or sentences in a large document instead reading the whole document.
The following example will give you a basic visual understanding of regular expressions. This actually happens when your regex exactly return what you want.
Now Let’s Start Learning and Practicing
The best way to learn anything in the world is start practicing and getting your hand dirty into it before you completely learn it and at the end there is always something more to learn.
How Regular Expressions Work?
Regular expressions used to process text-based on regular expression engine that is already installed in .Net Framework and represented by System.Text.RegularExpressions.Regex.
Regular expression engine needs only two things to process texts.
1. The regular expression pattern that you defined to find text. (Don’t worry later in this article we will learn the syntax of regular expression).
2. And the second thing is the input text that we need to parse.
Now it’s time learn the basic syntax of .Net regular expressions so that we can create and use them in our C# programs.
As I told earlier regular expressions are a group of special characters with special meanings.
There are some mostly used special characters are listed below in the table that i referenced from MSDN.
|Represents the position at the beginning and ending of the word.|
Represents any digit character.
|\t|| Represents a backspace character.|
|\n ||Represent new line character.|
|\s ||Represents any white space character.|
|. ||Represents every character on same line.|
|\w ||Represents any non-digit alphanumeric character.|
|^ ||Matches position at beginning of whole string.|
|Represents position at the end of whole string.|
Basic Understanding of Regex Class
Regex is a standard C# .Net class that used to represent regular expressions in .Net. We can easily say that Regex is used to represent an immutable regular expression, it’s because later we will see that Regex actually accepts a regular expression value in form of a string. String class is an immutable class in .Net. Immutable means once we set a value to string object later we can’t change that value. To learn more about string class and immutability nature of that you can click here.
Now we know Regex class represents a regular expression, to use that class in our program we need to create an instance of that so that we can find matches and to do more crazy stuff with our text inputs. To create an instance of Regex class we will use one of its constructor that will take regular expression pattern string as an argument.
Regex regex = new Regex(@"\bimportant\b");
Methods of Regex
Here we will explain some most commonly used Regex class methods.
|IsMatch(String) || That particular method will return a Boolean value true or false that will represent whether or not regular expression specified in Regex instance will find the match in input text. True means that match founded and false means match not found.|
| Matches(String) ||That method finds all the matches based on regular expression specified and return the matches in form of MatchCollection object |
|Replace(string,replacementString) ||That particular method replaces all the matches based on regular expression specified with a specific replacement string.|
If you are interested in learning more about Regex class and want to explore all of its constructors, properties and methods you click here at MSDN.
Explaining Some Simple Expressions with Examples
- “important” literally speaking it will find ‘important’ as it is
The regular expression pattern ‘important’ which is a very simple form of regular expression will find the match for 9 words ‘I’, ’m’, ’p’, ’o’, ’r’, ’t’, ’a’, ’n’, ’t’ in exact same sequence as they are written above. If there are some characters before and after the sequence other than space, inappropriately it will find those matches too like word like unimportant, very-important and importanttt etc.
As we saw in above example the weak point of our expression. Now let’s improve our expression so we can get what we actually want
2.“\bimportant\b” now it will find the ‘important’ as whole word.
Now we have improved our expression by adding ‘\b’ before and after that. As you are already familiar with ‘\b’ from above table. So ‘\b’ is a special character that tells regular expressions engine please start finding match for that particular expression at beginning of word and stops at the ending of word.In simple terms ‘\b’ represents the position at beginning and ending of the word.
Now you can see from above snapshot we got only one result back.
3. Example of ‘\s’ character.
Here we will explain an example where we will use ‘\s’ character to explain you the purpose and use of that particular character. As I mentioned above in special character’s table that ‘\s’ character is used to represent a white space character in text. So with use of ‘\s’ we will create a regular expression that will help us to replace spaces between words with ’_’ character.
4.Finding Number Words in string ("\b\w+\b").
In this example we will write an expression that’s going to help us to find number of words in a particular text input. In any text or document words are separated by space character, so in that case space character will help us to find our words. The above will skip the spaces b/w words and will pick up every word that starts and ends with any alphanumeric character and must have 1 or more characters inside.
|\b ||means starts with |
| \b\w ||means starts with any alphanumeric character|
| \b\w+ ||means starts with any alphanumeric character and repeat the previous match 1 or more time (in simple terms it means the word we are going to match must contain at least one character)|
|\b\w +\b|| At the end \b means word also must ends with an alphanumeric character|