A regular expression, sometimes known as “regex,” is an effective tool for finding patterns in text. A regex can be used to match a single character or a predetermined group of characters at its most basic level, but it can also be used to match more sophisticated patterns that span numerous lines or contain optional or recurring elements.
Functions in Regular expression in Python
The re module in Python supports regular expressions. The Python re module includes the following typical functions for using regular expressions:
re.compile | Create a regular expression object by compiling a regular expression pattern. The object’s match(), search(), and other methods, which are detailed below, can then be used to match data. |
re.search | When there is a match anyplace within the string, returns a Match object. |
re.match | Match at the start of the string only. |
re.fullmatch | Return a related match object if the entire string satisfies the regular expression pattern. |
re.split | Split string according to pattern occurrences. |
re.findall | Provides a list of all matches as a list or tuple. |
re.finditer | Return an iterator that produces matching objects for all of the RE pattern’s non-overlapping matches in the string. |
re.sub | Replaces a string with one or more matches. |
re.subn | Same as a sub but returns as a tuple. |
re.escape | In patterns, special characters are escaped. |
re.purge | Cache memory is cleared |
Metacharacters
Metacharacters are unique characters with specific meanings that are used in regular expressions. They are applied to strings to match particular character patterns. The following list of typical metacharacters for regular expressions:
Character | Description |
[] | A collection of characters |
\ | Can be considered special characters |
^ | Any entity |
$ | String terminates with |
* | None or more instances |
+ | One or more instances |
? | None or one instance |
{} | The exact number of instances that were requested |
| | Either or |
() | Acquire and assemble |
Special Sequences
Regular expressions use special sequences, a subclass of metacharacter, to match the particular character or pattern kinds. In regular expressions, the following common special sequences are used:
Character | Description |
\A | If the provided characters appear at the start of the string, it returns a match. |
\b | Provides a match if the requested characters appear at the start or end of a term. |
\B | Provides a match if the requested characters do not appear at the start or end of a term. |
\d | The match is found if the string contains the digit. |
\D | The match is found where the string doesn’t contain the digit. |
\s | The match is found when the string contains whitespace characters. |
\S | The match is found when the string doesn’t contain a whitespace character. |
\w | Match when the string contains any word character it could be (a-z, A-z,0-9,_) |
\W | Match when the string contains doesn’t contain any word character for instance it should not be (a-z, A-z,0-9,_) |
\Z | Match found if provided characters are at end of the string |
You can use these unique sequences to create more exact regular expressions because they are highly effective at matching particular characters or patterns in a string.
Finding the digits from a given string in python
In Python, regular expressions can be used to find every digit in a given text. Here is an illustration of how to use regular expressions in Python to extract every digit from a given string:
import re string = ‘I was born in the year 1999 at 2:06pm on 16th of June.’ pattern = ‘\d+’ #this detects the digits in the string result = re.findall(pattern, string) print(result) OUTPUT: [‘1999’, ‘2’, ’06’, ’16’] |
Searching for a keyword in a string
The search function is used for finding a keyword inside a string. It output us the index position of the searched word.
import re txt = “The summer is hot in India” x = re.search(“^The.*India$”, txt) print(x) OUTPUT: <re.Match object; span=(0, 26), match=’The summer is hot in India’> |
Email validation in python
Using regular expressions, there are a few different approaches to validate an email address in Python. One method is to compare the email address to a pattern that specifies the format of a valid email address using the re-module. Here is an illustration of how to validate an email address in Python using regular expressions:
import re emails = ”’ abc@mail.com xyz@companymail.com a1b2c3@collegemail.edu.in ”’ pattern = re.compile(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+’) matches = pattern.finditer(emails) for match in matches: print(match) |
Phone number validation in Python
In Python, you may check a phone number against a pattern that specifies the format of a legitimate phone number to see if it is valid by using regular expressions. An illustration of how to validate a phone number in Python using regular expressions is given below:
import re def is_valid_phone_number(phone_number): pattern = re.compile(r’^\(\d{3}\) \d{3}-\d{4}$’) match = pattern.search(phone_number) if match: return True return False phone_number = “(123) 456-7890” print(is_valid_phone_number(phone_number)) # True phone_number = “123-456-7890” print(is_valid_phone_number(phone_number)) OUTPUT: True False |
The above program is only applicable to United states phone numbers only. If you want to make any changes you can change the pattern and phone number variables. For example- to verify the Indian phone number using regular expression is shown down below
def is_valid_indian_phone_number(phone_number): pattern = re.compile(r’^\d{5}\d{5}$’) match = pattern.search(phone_number) if match: return True return False phone_number = “9839957123” print(is_valid_indian_phone_number(phone_number)) OUTPUT: True |
Applications of regular expression
- Regular expressions can be used in text processing to look for certain text patterns, and replace them with new text For instance, you could use a regular expression to extract the URLs from a collection of web pages or to search for all occurrences of a phone number or email address in a document.
- Regular expressions can be used to check user input, including passwords, phone numbers, and email addresses. This guarantees that the input follows a particular format and can be properly processed.
- Regular expressions are an effective method for deciphering log files and extracting important data. They can be used to extract crucial information, look for specific text patterns, and summarise the log data.
- To match and filter IP addresses, hostnames, and other network-related data, regular expressions can be employed in networking.
- Regular expressions can be used to match and filter data in databases. For instance, they can be used to look for specific patterns in a field or to locate all the entries that satisfy given criteria.
Pros and cons
Pros:
- With just a few lines of code, regular expressions can be used to match intricate patterns of characters in a string. They are succinct and expressive.
- Regular expressions are a commonly utilized and adaptable skill since they can be employed in a variety of programming languages and tools.
- When checking email addresses and phone numbers, looking for patterns in log files, or extracting data from web pages, regular expressions can be utilized for a variety of tasks.
Cons
- When used to match vast volumes of data or when the pattern is very complicated, regular expressions can be slow to run.
- Due to the potential for unexpected matches or non-matches caused by even minor mistakes in the pattern, regular expressions can be challenging to debug and manage.
- If written carelessly, regular expressions are susceptible to ReDoS (Regular Expression Denial of Service) attacks.
- Nested structures, such as matching nested brackets or balanced delimiters, etc., cannot be handled by regular expressions.
Conclusion
In this tutorial, we learn about a variety of regular expression commands, their definitions, and examples of usage. If necessary, you can incorporate it into your projects, especially when working with huge text databases whose contents you are unsure of. It is practical and quite simple to use and explore. Practice alone will be the sole effective learning method.