f-Strings

1. Print an f-string that displays NLP stands for Natural Language Processing using the variables provided.

abbr = 'NLP'
full_text = 'Natural Language Processing'

### Enter your code here:
NLP stands for Natural Language Processing

Solution

abbr = 'NLP'
full_text = 'Natural Language Processing'
print(f'{abbr} stands for {full_text}')
NLP stands for Natural Language Processing

Files

2. Create a file in the current working directory called contacts.txt by running the cell below:

%%writefile contacts.txt
First_Name Last_Name, Title, Extension, Email
Overwriting contacts.txt

Solution

%%writefile contacts.txt
First_Name Last_Name, Title, Extension, Email
Overwriting contacts.txt

3. Open the file and use .read() to save the contents of the file to a string called fields. Make sure the file is closed at the end.

### Write your code here:
    
### Run fields to see the contents of contacts.txt:
fields
'First_Name Last_Name, Title, Extension, Email'

Solution

with open("contacts.txt") as text:
    fields = text.read()
    
fields
'First_Name Last_Name, Title, Extension, Email\n'

Working with PDF Files

4. Use PyPDF2 to open the file Business_Proposal.pdf. Extract the text of page 2.

# Open the file as a binary object

# Use PyPDF2 to read the text of the file

# Get the text from page 2 (CHALLENGE: Do this in one step!)
page_two_text = 

# Close the file

# Print the contents of page_two_text
print(page_two_text)
AUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 

Solution

import PyPDF2

# Open the file as a binary object
pdf1 = open("data_files/Business_Proposal.pdf", 'rb')

# Use PyPDF2 to read the text of the file
pdf_reader = PyPDF2.PdfFileReader(pdf1)

# Get the text from page 2 (CHALLENGE: Do this in one step!)
page_two_text = pdf_reader.getPage(1).extractText()

# Close the file
pdf1.close()

# Print the contents of page_two_text
print(page_two_text)
AUTHORS:
 

Amy Baker, Finance Chair, x345, abaker@ourcompany.com
  

Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
  

Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
  
import re
re.findall(r'[^(AUTHORS:)]', page_two_text)
['\n',
 ' ',
 '\n',
 '\n',
 'm',
 'y',
 ' ',
 'B',
 'a',
 'k',
 'e',
 'r',
 ',',
 ' ',
 'F',
 'i',
 'n',
 'a',
 'n',
 'c',
 'e',
 ' ',
 'C',
 'h',
 'a',
 'i',
 'r',
 ',',
 ' ',
 'x',
 '3',
 '4',
 '5',
 ',',
 ' ',
 'a',
 'b',
 'a',
 'k',
 'e',
 'r',
 '@',
 'o',
 'u',
 'r',
 'c',
 'o',
 'm',
 'p',
 'a',
 'n',
 'y',
 '.',
 'c',
 'o',
 'm',
 '\n',
 ' ',
 ' ',
 '\n',
 '\n',
 'C',
 'h',
 'r',
 'i',
 's',
 ' ',
 'D',
 'o',
 'n',
 'a',
 'l',
 'd',
 's',
 'o',
 'n',
 ',',
 ' ',
 'c',
 'c',
 'o',
 'u',
 'n',
 't',
 'i',
 'n',
 'g',
 ' ',
 'D',
 'i',
 'r',
 '.',
 ',',
 ' ',
 'x',
 '6',
 '2',
 '1',
 ',',
 ' ',
 'c',
 'd',
 'o',
 'n',
 'a',
 'l',
 'd',
 's',
 'o',
 'n',
 '@',
 'o',
 'u',
 'r',
 'c',
 'o',
 'm',
 'p',
 'a',
 'n',
 'y',
 '.',
 'c',
 'o',
 'm',
 '\n',
 ' ',
 ' ',
 '\n',
 '\n',
 'E',
 'r',
 'i',
 'n',
 ' ',
 'F',
 'r',
 'e',
 'e',
 'm',
 'a',
 'n',
 ',',
 ' ',
 'r',
 '.',
 ' ',
 'V',
 'P',
 ',',
 ' ',
 'x',
 '8',
 '7',
 '9',
 ',',
 ' ',
 'e',
 'f',
 'r',
 'e',
 'e',
 'm',
 'a',
 'n',
 '@',
 'o',
 'u',
 'r',
 'c',
 'o',
 'm',
 'p',
 'a',
 'n',
 'y',
 '.',
 'c',
 'o',
 'm',
 '\n',
 ' ',
 ' ']

5. Open the file contacts.txt in append mode. Add the text of page 2 from above to contacts.txt.

CHALLENGE: See if you can remove the word "AUTHORS:"


First_Name Last_Name, Title, Extension, EmailAUTHORS:
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 

**Simple Solution**
myfile = open('contacts.txt', 'a+')
myfile.seek(0)
print(myfile.read())
First_Name Last_Name, Title, Extension, Email

myfile.write(page_two_text)
myfile.seek(0)
print(myfile.read())
First_Name Last_Name, Title, Extension, Email
AUTHORS:
 

Amy Baker, Finance Chair, x345, abaker@ourcompany.com
  

Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
  

Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
  
myfile.close()

First_Name Last_Name, Title, Extension, Email
 
Amy Baker, Finance Chair, x345, abaker@ourcompany.com
 
Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
 
Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
 

with open('contacts.txt','a+') as c:
    c.write(page_two_text[8:])
    c.seek(0)
    print(c.read())
First_Name Last_Name, Title, Extension, Email
AUTHORS:
 

Amy Baker, Finance Chair, x345, abaker@ourcompany.com
  

Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
  

Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
  
 

Amy Baker, Finance Chair, x345, abaker@ourcompany.com
  

Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com
  

Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com
  

Regular Expressions

6. Using the page_two_text variable created above, extract any email addresses that were contained in the file Business_Proposal.pdf.

import re

# Enter your regex pattern here. This may take several tries!
pattern = 

re.findall(pattern, page_two_text)
['abaker@ourcompany.com',
 'cdonaldson@ourcompany.com',
 'efreeman@ourcompany.com']

Solution

import re

pattern = r'\w+@\w+.com'
re.findall(pattern, page_two_text)
['abaker@ourcompany.com',
 'cdonaldson@ourcompany.com',
 'efreeman@ourcompany.com']