# Python Regex Superpower [Full Tutorial]

What’s the best-kept productivity secret of code masters?

Here’s what ex-Google’s tech lead says is the most important skill as a coder (spoiler: it has to do with the topic of the tutorial):

Congratulations – you’re about to become a regular expression master. I’ve not only written the most comprehensive free regular expression tutorial on the web (16812 words) but also added a lot of tutorial videos wherever I saw fit.

So take your cup of coffee, scroll through the tutorial, and enjoy your brain cells getting active!

If you need to brush up your Python skills, feel free to read my Python Crash Course first.

Note that I use both both terms “regular expression” and the more concise “regex” in this tutorial.

## Regex Methods Overview

Python’s re module comes with a number of regular expression methods that help you achieve more with less.

Think of those methods as the framework connecting regular expressions with the Python programming language. Every programming language comes with its own way of handling regular expressions. For example, the Perl programming language has many built-in mechanisms for regular expressions—you don’t need to import a regular expression library—while the Java programming language provides regular expressions only within a library. This is also the approach of Python.

These are the most important regular expression methods of Python’s re module:

• re.findall(pattern, string): Checks if the string matches the pattern and returns all occurrences of the matched pattern as a list of strings.
• re.search(pattern, string): Checks if the string matches the regex pattern and returns only the first match as a match object. The match object is just that: an object that stores meta information about the match such as the matching position and the matched substring.
• re.match(pattern, string): Checks if any string prefix matches the regex pattern and returns a match object.
• re.fullmatch(pattern, string): Checks if the whole string matches the regex pattern and returns a match object.
• re.compile(pattern): Creates a regular expression object from the pattern to speed up the matching if you want to use the regex pattern multiple times.
• re.split(pattern, string): Splits the string wherever the pattern regex matches and returns a list of strings. For example, you can split a string into a list of words by using whitespace characters as separators.
• re.sub(pattern, repl, string): Replaces (substitutes) the first occurrence of the regex pattern with the replacement string repl and return a new string.

Example: Let’s have a look at some examples of all the above functions:

import re

text = '''
LADY CAPULET

Alack the day, she's dead, she's dead, she's dead!

CAPULET

Ha! let me see her: out, alas! she's cold:
Her blood is settled, and her joints are stiff;
Life and these lips have long been separated:
Death lies on her like an untimely frost
Upon the sweetest flower of all the field.

Nurse

O lamentable day!
'''

print(re.findall('she', text))
'''
Finds the pattern 'she' four times in the text:

['she', 'she', 'she', 'she']
'''

print(re.search('she', text))
'''
Finds the first match of 'she' in the text:

<re.Match object; span=(34, 37), match='she'>

The match object contains important information
such as the matched position.
'''

print(re.match('she', text))
'''
Tries to match any string prefix -- but nothing found:

None
'''

print(re.fullmatch('she', text))
'''
Fails to match the whole string with the pattern 'she':

None
'''

print(re.split('\n', text))
'''
Splits the whole string on the new line delimiter '\n':

['', 'LADY CAPULET', '',
"    Alack the day, she's dead, she's dead, she's dead!",
'', 'CAPULET', '',
"    Ha! let me see her: out, alas! she's cold:",
'    Her blood is settled, and her joints are stiff;',
'    Life and these lips have long been separated:',
'    Death lies on her like an untimely frost',
'    Upon the sweetest flower of all the field.', '',
'Nurse', '', '    O lamentable day!', '']
'''

print(re.sub('she', 'he', text))
'''
Replaces all occurrences of 'she' with 'he':

LADY CAPULET

Alack the day, he's dead, he's dead, he's dead!

CAPULET

Ha! let me see her: out, alas! he's cold:
Her blood is settled, and her joints are stiff;
Life and these lips have long been separated:
Death lies on her like an untimely frost
Upon the sweetest flower of all the field.

Nurse

O lamentable day!
'''


Now, you know the most important regular expression functions. You know how to apply regular expressions to strings. But you don’t know how to write your regex patterns in the first place. Let’s dive into regular expressions and fix this once and for all!

## Basic Regex Operations

A regular expression is a decades-old concept in computer science. Invented in the 1950s by famous mathematician Stephen Cole Kleene, the decades of evolution brought a huge variety of operations. Collecting all operations and writing up a comprehensive list would result in a very thick and unreadable book by itself.

Fortunately, you don’t have to learn all regular expressions before you can start using them in your practical code projects. Next, you’ll get a quick and dirty overview of the most important regex operations and how to use them in Python. In follow-up chapters, you’ll then study them in detail — with many practical applications and code puzzles.

Here are the most important regex operators:

• . The wild-card operator (‘dot’) matches any character in a string except the newline character ‘\n’. For example, the regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.
• * The zero-or-more asterisk operator matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex. For example, the regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.
• ? The zero-or-one operator matches (as the name suggests) either zero or one occurrences of the immediately preceding regex. For example, the regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.
• + The at-least-one operator matches one or more occurrences of the immediately preceding regex. For example, the regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.
• ^ The start-of-string operator matches the beginning of a string. For example, the regex ‘^p’ would match the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.
• $ The end-of-string operator matches the end of a string. For example, the regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.
• A|B The OR operator matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions. For example, the regex ‘(hello)|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.
• AB  The AND operator matches first the regex A and second the regex B, in this sequence. We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.

Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.

We’ve already seen many examples but let’s dive into even more!

import re

text = '''
Ha! let me see her: out, alas! he's cold:
Her blood is settled, and her joints are stiff;
Life and these lips have long been separated:
Death lies on her like an untimely frost
Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''

print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character.
Can you figure out why Python doesn't find any?
[]
'''

This article is all about the start of line ^ and end of line $regular expressions in Python’s re library. These two regexes are fundamental to all regular expressions—even outside the Python world. So invest 5 minutes now and master them once and for all! ### Python Re Start-of-String (^) Regex You can use the caret operator ^ to match the beginning of the string. For example, this is useful if you want to ensure that a pattern appears at the beginning of a string. Here’s an example: >>> import re >>> re.findall('^PYTHON', 'PYTHON is fun.') ['PYTHON'] The findall(pattern, string) method finds all occurrences of the pattern in the string. The caret at the beginning of the pattern ‘^PYTHON’ ensures that you match the word Python only at the beginning of the string. In the previous example, this doesn’t make any difference. But in the next example, it does: >>> re.findall('^PYTHON', 'PYTHON! PYTHON is fun') ['PYTHON'] Although there are two occurrences of the substring ‘PYTHON’, there’s only one matching substring—at the beginning of the string. But what if you want to match not only at the beginning of the string but at the beginning of each line in a multi-line string? In other words: #### Python Re Start-of-Line (^) Regex The caret operator, per default, only applies to the start of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the beginning of the string. However, you may want to match at the beginning of each line. For example, you may want to find all lines that start with ‘Python’ in a given string. You can specify that the caret operator matches the beginning of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag: >>> import re >>> text = ''' Python is great. Python is the fastest growing major programming language in the world. Pythonistas thrive.''' >>> re.findall('^Python', text) [] >>> re.findall('^Python', text, re.MULTILINE) ['Python', 'Python', 'Python'] >>> The first output is the empty list because the string ‘Python’ does not appear at the beginning of the string. The second output is the list of three matching substrings because the string ‘Python’ appears three times at the beginning of a line. #### Python re.sub() The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. You can use the caret operator to substitute wherever some pattern appears at the beginning of the string: >>> import re >>> re.sub('^Python', 'Code', 'Python is \nPython') 'Code is \nPython' Only the beginning of the string matches the regex pattern so you’ve got only one substitution. Again, you can use the re.MULTILINE flag to match the beginning of each line with the caret operator: >>> re.sub('^Python', 'Code', 'Python is \nPython', flags=re.MULTILINE) 'Code is \nCode' Now, you replace both appearances of the string ‘Python’. #### Python re.match(), re.search(), re.findall(), and re.fullmatch() Let’s quickly recap the most important regex methods in Python: • The re.findall(pattern, string, flags=0) method returns a list of string matches. • The re.search(pattern, string, flags=0) method returns a match object of the first match. • The re.match(pattern, string, flags=0) method returns a match object if the regex matches at the beginning of the string. • The re.fullmatch(pattern, string, flags=0) method returns a match object if the regex matches the whole string. You can see that all four methods search for a pattern in a given string. You can use the caret operator ^ within each pattern to match the beginning of the string. Here’s one example per method: >>> import re >>> text = 'Python is Python' >>> re.findall('^Python', text) ['Python'] >>> re.search('^Python', text) <re.Match object; span=(0, 6), match='Python'> >>> re.match('^Python', text) <re.Match object; span=(0, 6), match='Python'> >>> re.fullmatch('^Python', text) >>> So you can use the caret operator to match at the beginning of the string. However, you should note that it doesn’t make a lot of sense to use it for the match() and fullmatch() methods as they, by definition, start by trying to match the first character of the string. You can also use the re.MULTILINE flag to match the beginning of each line (rather than only the beginning of the string): >>> text = '''Python is Python''' >>> re.findall('^Python', text, flags=re.MULTILINE) ['Python', 'Python'] >>> re.search('^Python', text, flags=re.MULTILINE) <re.Match object; span=(0, 6), match='Python'> >>> re.match('^Python', text, flags=re.MULTILINE) <re.Match object; span=(0, 6), match='Python'> >>> re.fullmatch('^Python', text, flags=re.MULTILINE) >>> Again, it’s questionable whether this makes sense for the re.match() and re.fullmatch() methods as they only look for a match at the beginning of the string. ### Python Re End of String ($) Regex

Similarly, you can use the dollar-sign operator $to match the end of the string. Here’s an example: >>> import re >>> re.findall('fun$', 'PYTHON is fun')
['fun']

The findall() method finds all occurrences of the pattern in the string—although the trailing dollar-sign $ensures that the regex matches only at the end of the string. This can significantly alter the meaning of your regex as you can see in the next example: >>> re.findall('fun$', 'fun fun fun')
['fun']

Although, there are three occurrences of the substring ‘fun’, there’s only one matching substring—at the end of the string.

But what if you want to match not only at the end of the string but at the end of each line in a multi-line string?

[]
>>> re.findall('fun$', text, flags=re.MULTILINE) ['fun', 'fun', 'fun'] >>> The first output is the empty list because the string ‘fun’ does not appear at the end of the string. The second output is the list of three matching substrings because the string ‘fun’ appears three times at the end of a line. #### Python re.sub() The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in the Finxter blog tutorial. You can use the dollar-sign operator to substitute wherever some pattern appears at the end of the string: >>> import re >>> re.sub('Python$', 'Code', 'Is Python\nPython')
'Is Python\nCode'

Only the end of the string matches the regex pattern so there’s only one substitution.

Again, you can use the re.MULTILINE flag to match the end of each line with the dollar-sign operator:

>>> re.sub('Python$', 'Code', 'Is Python\nPython', flags=re.MULTILINE) 'Is Code\nCode' Now, you replace both appearances of the string ‘Python’. #### Python re.match(), re.search(), re.findall(), and re.fullmatch() All four methods—re.findall(), re.search(), re.match(), and re.fullmatch()—search for a pattern in a given string. You can use the dollar-sign operator$ within each pattern to match the end of the string. Here’s one example per method:

>>> import re
>>> text = 'Python is Python'
>>> re.findall('Python$', text) ['Python'] >>> re.search('Python$', text)
<re.Match object; span=(10, 16), match='Python'>
>>> re.match('Python$', text) >>> re.fullmatch('Python$', text)
>>>

So you can use the dollar-sign operator to match at the end of the string. However, you should note that it doesn’t make a lot of sense to use it for the fullmatch() methods as it, by definition, already requires that the last character of the string is part of the matching substring.

You can also use the re.MULTILINE flag to match the end of each line (rather than only the end of the whole string):

>>> text = '''Is Python
Python'''
>>> re.findall('Python$', text, flags=re.MULTILINE) ['Python', 'Python'] >>> re.search('Python$', text, flags=re.MULTILINE)
<re.Match object; span=(3, 9), match='Python'>
>>> re.match('Python$', text, flags=re.MULTILINE) >>> re.fullmatch('Python$', text, flags=re.MULTILINE)
>>>

As the pattern doesn’t match the string prefix, both re.match() and re.fullmatch() return empty results.

### How to Match the Caret (^) or Dollar ($) Symbols in Your Regex? You know that the caret and dollar symbols have a special meaning in Python’s regular expression module: they match the beginning or end of each string/line. But what if you search for the caret (^) or dollar ($) symbols themselves? How can you match them in a string?

The answer is simple: escape the caret or dollar symbols in your regular expression using the backslash. In particular, use ‘\^’ instead of ‘^’ and ‘\$’ instead of ‘$’. Here’s an example:

>>> import re
>>> text = 'The product ^^^ costs $3 today.' >>> re.findall('\^', text) ['^', '^', '^'] >>> re.findall('\$', text)
['$'] By escaping the special symbols ^ and$, you tell the regex engine to ignore their special meaning.

### Summary

You’ve learned everything you need to know about the caret operator ^ and the dollar-sign operator $in this regex tutorial. Summary: The caret operator ^ matches at the beginning of a string. The dollar-sign operator$ matches at the end of a string. If you want to match at the beginning or end of each line in a multi-line string, you can set the re.MULTILINE flag in all the relevant re methods.

## The Regex Or Operator |

This article is all about the or | operator of Python’s re library.

### What’s the Python Regex Or | Operator?

Given a string. Say, your goal is to find all substrings that match either the string ‘iPhone’ or the string ‘iPad’. How can you achieve this?

The easiest way to achieve this is the Python or operator | using the regular expression pattern (iPhone|iPad).

Here’s an example:

>>> import re
>>> text = 'Buy now: iPhone only $399 with free iPad' >>> re.findall('(iPhone|iPad)', text) ['iPhone', 'iPad'] You have the (salesy) text that contains both strings ‘iPhone’ and ‘iPad’. You use the re.findall() method. In case you don’t know it, here’s the definition from the Finxter blog article: The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings. Please consult the blog article to learn everything you need to know about this fundamental Python method. The first argument is the pattern (iPhone|iPad). It either matches the first part right in front of the or symbol |—which is iPhone—or the second part after it—which is iPad. The second argument is the text ‘Buy now: iPhone only$399 with free iPad’ which you want to search for the pattern.

The result shows that there are two matching substrings in the text: ‘iPhone’ and ‘iPad’.

### Python Regex Or: Examples

Let’s study some more examples to teach you all the possible uses and border cases—one after another.

You start with the previous example:

>>> import re
>>> text = 'Buy now: iPhone only \$399 with free iPad'
>>> re.findall('(iPhone|iPad)', text)
['iPhone', 'iPad']

What happens if you don’t use the parenthesis?

>>> text = 'iPhone iPhone iPhone iPadiPad'
>>> re.findall('(iPhone|iPad)', text)
['iPhone', 'iPhone', 'iPhone', 'iPad', 'iPad']
>>> re.findall('iPhone|iPad', text)
['iPhone', 'iPhone', 'iPhone', 'iPad', 'iPad']

In the second example, you just skipped the parentheses using the regex pattern iPhone|iPad rather than (iPhone|iPad). But no problem–it still works and generates the exact same output!

But what happens if you leave one side of the or operation empty?

>>> re.findall('iPhone|', text)
['iPhone', '', 'iPhone', '', 'iPhone', '', '', '', '', '', '', '', '', '', '']

The output is not as strange as it seems. The or operator allows for empty operands—in which case it wants to match the non-empty string. If this is not possible, it matches the empty string (so everything will be a match).

The previous example also shows that it still tries to match the non-empty string if possible. But what if the trivial empty match is on the left side of the or operand?

>>> re.findall('|iPhone', text)
['', 'iPhone', '', '', 'iPhone', '', '', 'iPhone', '', '', '', '', '', '', '', '', '', '']

This shows some subtleties of the regex engine. First of all, it still matches the non-empty string if possible! But more importantly, you can see that the regex engine matches from left to right. It first tries to match the left regex (which it does on every single position in the text). An empty string that’s already matched will not be considered anymore. Only then, it tries to match the regex on the right side of the or operator.

Think of it this way: the regex engine moves from the left to the right—one position at a time. It matches the empty string every single time. Then it moves over the empty string and in some cases, it can still match the non-empty string. Each match “consumes” a substring and cannot be matched anymore. But an empty string cannot be consumed. That’s why you see the first match is the empty string and the second match is the substring ‘iPhone’.

### How to Nest the Python Regex Or Operator?

Okay, you’re not easily satisfied, are you? Let’s try nesting the Python regex or operator |.

>>> text = 'xxx iii zzz iii ii xxx'
>>> re.findall('xxx|iii|zzz', text)
['xxx', 'iii', 'zzz', 'iii', 'xxx']

So you can use multiple or operators in a row. Of course, you can also use the grouping (parentheses) operator to nest an arbitrary complicated construct of or operations:

>>> re.findall('x(i|(zz|ii|(x| )))', text)
[('x', 'x', 'x'), (' ', ' ', ' '), ('x', 'x', 'x')]

But this seldomly leads to clean and readable code. And it can usually avoided easily by putting a bit of thought into your regex design.

### Python Regex Or: Character Class

If you only want to match a single character out of a set of characters, the character class is a much better way of doing it:

>>> import re
>>> text = 'hello world'
>>> re.findall('[abcdefghijklmnopqrstuvwxyz]+', text)
['hello', 'world']

A shorter and more concise version would be to use the range operator within character classes:

>>> re.findall('[a-z]+', text)
['hello', 'world']

The character class is enclosed in the bracket notation [ ] and it literally means “match exactly one of the symbols in the class”. Thus, it carries the same semantics as the or operator: |. However, if you try to do something on those lines…

>>> re.findall('(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)+', text)
['o', 'd']

… you’ll first write much less concise code and, second, risk of getting confused by the output. The reason is that the parenthesis is the group operator—it captures the position and substring that matches the regex. Used in the findall() method, it only returns the content of the last matched group. This turns out to be the last character of the word ‘hello’ and the last character of the word ‘world’.

### How to Match the Or Character (Vertical Line ‘|’)?

So if the character ‘|’ stands for the or character in a given regex, the question arises how to match the vertical line symbol ‘|’ itself?

The answer is simple: escape the or character in your regular expression using the backslash. In particular, use ‘A\|B’ instead of ‘A|B’ to match the string ‘A|B’ itself. Here’s an example:

>>> import re
>>> re.findall('A|B', 'AAAA|BBBB')
['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
>>> re.findall('A\|B', 'AAAA|BBBB')
['A|B']

Do you really understand the outputs of this code snippet? In the first example, you’re searching for either character ‘A’ or character ‘B’. In the second example, you’re searching for the string ‘A|B’ (which contains the ‘|’ character).

### Python Regex Not

How can you search a string for substrings that do NOT match a given pattern? In other words, what’s the “negative pattern” in Python regular expressions?

The answer is two-fold:

• If you want to match all characters except a set of specific characters, you can use the negative character class [^…].
• If you want to match all substrings except the ones that match a regex pattern, you can use the feature of negative lookahead (?!…).

Here’s an example for the negative character class:

>>> import re
>>> re.findall('[^a-m]', 'aaabbbaababmmmnoopmmaa')
['n', 'o', 'o', 'p']

And here’s an example for the negative lookahead pattern to match all “words that are not followed by words”:

>>> re.findall('[a-z]+(?![a-z]+)', 'hello world')
['hello', 'world']

The negative lookahead (?![a-z]+) doesn’t consume (match) any character. It just checks whether the pattern [a-z]+ does NOT match at a given position. The only times this happens is just before the empty space and the end of the string.

### Summary

You’ve learned everything you need to know about the Python Regex Or Operator.

Given a string. Say, your goal is to find all substrings that match either the string ‘iPhone’ or the string ‘iPad’. How can you achieve this?

The easiest way to achieve this is the Python or operator | using the regular expression pattern (iPhone|iPad).

## The Regex And Operator

This tutorial is all about the AND operator of Python’s re library. You may ask: what? (And rightly so.)

Sure, there’s the OR operator (example: ‘iPhone|iPad’). But what’s the meaning of matching one regular expression AND another?

There are different interpretations for the AND operator in a regular expression (regex):

• Ordered: Match one regex pattern after another. In other words, you first match pattern A AND then you match pattern B. Here the answer is simple: you use the pattern AB to match both.
• Unordered: Match multiple patterns in a string but in no particular order (source). In this case, you’ll use a bag-of-words approach.

I’ll discuss both in the following.

### Ordered Python Regex AND Operator

Given a string. Say, your goal is to find all substrings that match string ‘iPhone’, followed by string ‘iPad’. You can view this as the AND operator of two regular expressions. How can you achieve this?

The straightforward AND operation of both strings is the regular expression pattern iPhoneiPad.

In the following example, you want to match pattern ‘aaa’ and pattern ‘bbb’—in this order.

>>> import re
>>> text = 'aaabaaaabbb'
>>> A = 'aaa'
>>> B = 'bbb'
>>> re.findall(A+B, text)
['aaabbb']
>>>

You use the re.findall() method. The first argument is the pattern A+B which evaluates to ‘aaabbb’. There’s nothing fancy about this: each time you write a string consisting of more than one character, you essentially use the ordered AND operator.

The second argument is the text ‘aaabaaaabbb’ which you want to search for the pattern.

The result shows that there’s a matching substring in the text: ‘aaabbb’.

### Unordered Python Regex AND Operator

But what if you want to search a given text for pattern A AND pattern B—but in no particular order? In other words: if both patterns appear anywhere in the string, the whole string should be returned as a match.

Now this is a bit more complicated because any regular expression pattern is ordered from left to right. A simple solution is to use the lookahead assertion (?.*A) to check whether regex A appears anywhere in the string. (Note we assume a single line string as the .* pattern doesn’t match the newline character by default.)

Let’s first have a look at the minimal solution to check for two patterns anywhere in the string (say, patterns ‘hi’ AND ‘you’).

>>> import re
>>> pattern = '(?=.*hi)(?=.*you)'
>>> re.findall(pattern, 'hi how are yo?')
[]
>>> re.findall(pattern, 'hi how are you?')
['']

In the first example, both words do not appear. In the second example, they do.

But how does the lookahead assertion work? You must know that any other regex pattern “consumes” the matched substring. The consumed substring cannot be matched by any other part of the regex.

Think of the lookahead assertion as a non-consuming pattern match. The regex engine goes from the left to the right—searching for the pattern. At each point, it has one “current” position to check if this position is the first position of the remaining match. In other words, the regex engine tries to “consume” the next character as a (partial) match of the pattern.

The advantage of the lookahead expression is that it doesn’t consume anything. It just “looks ahead” starting from the current position whether what follows would theoretically match the lookahead pattern. If it doesn’t, the regex engine cannot move on.

A simple example of lookahead. The regular expression engine matches (“consumes”) the string partially. Then it checks whether the remaining pattern could be matched without actually matching it.

Let’s go back to the expression (?=.*hi)(?=.*you) to match strings that contain both ‘hi’ and ‘you’. Why does it work?

The reason is that the lookahead expressions don’t consume anything. You first search for an arbitrary number of characters .*, followed by the word hi. But because the regex engine hasn’t consumed anything, it’s still at the same position at the beginning of the string. So, you can repeat the same for the word you.

Note that this method doesn’t care about the order of the two words:

>>> import re
>>> pattern = '(?=.*hi)(?=.*you)'
>>> re.findall(pattern, 'hi how are you?')
['']
>>> re.findall(pattern, 'you are how? hi!')
['']

No matter which word “hi” or “you” appears first in the text, the regex engine finds both.

You may ask: why’s the output the empty string? The reason is that the regex engine hasn’t consumed any character. It just checked the lookaheads. So the easy fix is to consume all characters as follows:

>>> import re
>>> pattern = '(?=.*hi)(?=.*you).*'
>>> re.findall(pattern, 'you fly high')
['you fly high']

Now, the whole string is a match because after checking the lookahead with ‘(?=.*hi)(?=.*you)’, you also consume the whole string ‘.*’.

### Summary:

There are different interpretations for the AND operator in a regular expression (regex):

• Ordered: Match one regex pattern after another. In other words, you first match pattern A AND then you match pattern B. Here the answer is simple: you use the pattern AB to match both.

Unordered: Match multiple patterns in a string but in no particular order. In this case, you’ll use a bag-of-words approach.

## Where to Go From Here

Wow. You’ve spent a lot of time learning everything you need to know about Python regular expressions. Thanks for your time!

At this point, I know you have skills. But do you actually leverage those skills in the most effective way? In other words: do you earn money with Python?

If the answer is no, let me show you a simple way how you can create your simple, home-based coding business online:

