1.4 Regular Expression
The post explains how to use regular expression to find and modify text.
- Regular exp. library
- Identifiers in Regex
- Quantifiers in Regex
- Groups in Regex search
- Or operator |
- Wildcard characters
- Starts with and ends with
- Exclusion
- Removing the punctuation
import re
text = "The phone number given in the helpline is 408-999-4567"
pattern = 'phone'
re.search(pattern, text)
If the match is found then search return the location of the match. Note: It only gives the first instance in the text.
Span is the starting and ending index of the match. (Index starts from zero)
match=re.search(pattern, text)
match
.span() give the span of the match, .start() give the start index, .end() gives the end index
match.span()
match.start()
match.end()
text1 = "My phone is a hi-tech phone. The phone is dual band, with the lastest phone-tech processor"
matches = re.findall("phone", text1)
matches
len(matches)
for match in re.finditer('phone', text1):
print(match.span())
To find the word matched, use .group() method
match.group()
Character | Description | Example Pattern Code | Exammple Match |
---|---|---|---|
\d | A digit | file_\d\d | file_25 |
\w | Alphanumeric | \w-\w\w\w | A-b_1 |
\s | White space | a\sb\sc | a b c |
\D | A non digit | \D\D\D | ABC |
\W | Non-alphanumeric | \W\W\W\W\W | *-+=) |
\S | Non-whitespace | \S\S\S\S | Yoyo |
Character | Description | Example Pattern Code | Exammple Match |
---|---|---|---|
+ | Occurs one or more times | Version \w-\w+ | Version A-b1_1 |
{3} | Occurs exactly 3 times | \D{3} | abc |
{2,4} | Occurs 2 to 4 times | \d{2,4} | 123 |
{3,} | Occurs 3 or more | \w{3,} | anycharacters |
\* | Occurs zero or more times | A\*B\*C* | AAACC |
? | Once or none | plurals? | plural |