Regular Expressions in Python

Margaret Awojide
CodeX
Published in
4 min readJul 8, 2022

--

Credit : Avotrix

What is Regex?

Regular Expressions (or simply Regex) is a sequence of characters that match a string pattern. A lot of programming languages make use of Regular Expressions e.g. Python, Java, C# etc. Regex is a very handy tool in data wrangling when working with strings.

String Operations and Manipulations

Credit : Allika Tech

A string is a sequence of characters enclosed in quotes. Before diving into the rudiments of Regular Expressions, it is a good idea to be familiar with string operations. Python has built-in functions for string manipulations. Some popular ones are:

String Indexing

In strings, indexing is used to select a character or group of characters from a string. Indexing is zero-based in python i.e. positions are counted from zero, not one.

string_1 = “The quick brown fox jumps over the lazy dog”

Check this documentation for more on String Indexing

Stripping Characters

Stripping is used to remove specific characters from the string. If the parameter to be stripped is not specified, python strips the string off of whitespaces. By default, python strips character off of trailing and leading spaces. The stripping (trailing, leading or both) can also be specified. A specific character to be stripped can also be passed as an argument e.g. ‘$’

Check this documentation for stripping strings

Finding and Replacing

The index method is used to search for substrings in a string within a specified position. When the search location is not specified, the substring is searched in the entire string. If the substring is present in the string, the position of the substring is returned, if not, a Value error is returned.

string_1.index("fox")
This returns the position of fox in string_1
string_1.index("fox", 0,10)
This returns the position of fox, given that it exists in the specified location.

The replace method is used to replace occurrence(s) of a substring found in a string with a new substring. If the number of occurrence for replacement is not specified, all occurrences are replaced.

string_1.replace("fox","monkey") 
This replaces all "fox" in the string with "monkey"
string_1.replace(" ","_",2)This replaces the first 2 whitespace with underscores

String Formatting

String formatting is used to insert a custom string or variable in an already defined text. The {} is used as a placeholder for the custom string/variable.

Text : "My brother has 2 bags and 8 oranges in each bag"Example 1:print ("My brother has {} bags and {} oranges in each bag".format(2,8))We can also specify the index (index is zero indexed) in the placeholder i.e.:print ("My brother has {1} bags and {0} oranges in each bag".format(8,2))The number of decimal places for a float value can also be specified.Example 2:number_1 = 8/3
number_2 = 2/3
print ("8 divided by 3 is {0:.2f} but 2 divided by 3 is {1:.2f}".format(number_1, number_2))The F-String literal is sometimes preferred:
print (f"8 divided by 3 into 2 decimal places is {number_1:.2f}")

Regular Expressions

Credit : Coderpad

Regular Expression uses a combination of meta-characters and special characters to find patterns in a text. It is especially useful for matching patterns in a complex string. Python uses the re module to handle Regex. Some popularly used Metacharacters and special characters are:

Credit : Author
Credit : Author

re.search ( )

The re.search() method is used to search for a specified Regex pattern in a string. For instance, specifying a regular expression the that follows the standard Zip code format in USA, we can apply re.search() to confirm in the zip code is present in a statement.

re.sub()

The re. sub() method is used to substitute or replace patterns in a string with a specified value. For example, an accountant might want to remove the currency symbols from a text to easily perform the quantitative analysis by replacing the symbols with whitespace.

re.split()

The re.split() method returns substrings of a string based on the split argument.

re.findall()

The re.findall() method searches a string for matches (non-overlapping) of a pattern and returns a list of sub-strings that follows the pattern. For example, we can apply the findall() argument to extract phone numbers from a string.

Sometimes, it is important to capture groups in Regex. A group is a part of a regular expression pattern enclosed in parenthesis. Using the phone number example above, we can select the names of the students using the capturing group method for this.

Regular Expressions and string operations are useful in string manipulation and data wrangling. In this article, we have gone through basic terminologies in string manipulation and regular expressions. We have also gone through some string methods and Regex functions with applications using Python.

--

--