1.4 Regular Expression
The post explains how to use regular expression to find and modify text.
- Regular exp. library
- Identifiers in Regex
- Quantifiers in Regex
- Groups in Regex search
- Or operator |
- Wildcard characters
- Starts with and ends with
- Exclusion
- Removing the punctuation
import re
text = "The phone number given in the helpline is 408-999-4567"
pattern = 'phone'
re.search(pattern, text)
If the match is found then search return the location of the match. Note: It only gives the first instance in the text.
Span is the starting and ending index of the match. (Index starts from zero)
match=re.search(pattern, text)
match
.span() give the span of the match, .start() give the start index, .end() gives the end index
match.span()
match.start()
match.end()
text1 = "My phone is a hi-tech phone. The phone is dual band, with the lastest phone-tech processor"
matches = re.findall("phone", text1)
matches
len(matches)
for match in re.finditer('phone', text1):
print(match.span())
To find the word matched, use .group() method
match.group()
| Character | Description | Example Pattern Code | Exammple Match |
|---|---|---|---|
| \d | A digit | file_\d\d | file_25 |
| \w | Alphanumeric | \w-\w\w\w | A-b_1 |
| \s | White space | a\sb\sc | a b c |
| \D | A non digit | \D\D\D | ABC |
| \W | Non-alphanumeric | \W\W\W\W\W | *-+=) |
| \S | Non-whitespace | \S\S\S\S | Yoyo |
| Character | Description | Example Pattern Code | Exammple Match |
|---|---|---|---|
| + | Occurs one or more times | Version \w-\w+ | Version A-b1_1 |
| {3} | Occurs exactly 3 times | \D{3} | abc |
| {2,4} | Occurs 2 to 4 times | \d{2,4} | 123 |
| {3,} | Occurs 3 or more | \w{3,} | anycharacters |
| \* | Occurs zero or more times | A\*B\*C* | AAACC |
| ? | Once or none | plurals? | plural |