For this assessment we'll be using the short story An Occurrence at Owl Creek Bridge by Ambrose Bierce (1890).
The story is in the public domain; the text file was obtained from Project Gutenberg.

import spacy
nlp = spacy.load('en_core_web_sm')

1. Create a Doc object from the file owlcreek.txt

HINT:Use with open('../TextFiles/owlcreek.txt') as f:


doc[:36]
AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

Solution

with open('data_files/owlcreek.txt') as f:
    doc = nlp(f.read())

print(doc[:36])
AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

2. How many tokens are contained in the file?

Solution

len(doc)
4835

3. How many sentences are contained in the file?
HINT: You'll want to build a list first!


211

Solution

sentences = []
for sent in doc.sents:
    sentences.append(sent)
   

len(sentences)
205
sents= [sent for sent in doc.sents]
len(sents)
205

4. Print the second sentence in the document
HINT: Indexing starts at zero, and the title counts as the first sentence.

Solution

sentences[0].text
'AN OCCURRENCE AT OWL CREEK BRIDGE\n\nby Ambrose Bierce\n\nI\n\nA man stood upon a railroad bridge in northern Alabama, looking down\ninto the swift water twenty feet below.'

5. For each token in the sentence above, print its text, POS tag, dep tag and lemma
CHALLENGE: Have values line up in columns in the print output.


A DET det a
man NOUN nsubj man
stood VERB ROOT stand
upon ADP prep upon
a DET det a
railroad NOUN compound railroad
bridge NOUN pobj bridge
in ADP prep in
northern ADJ amod northern
Alabama PROPN pobj alabama
, PUNCT punct ,
looking VERB advcl look
down PART prt down

 SPACE  

into ADP prep into
the DET det the
swift ADJ amod swift
water NOUN pobj water
twenty NUM nummod twenty
feet NOUN npadvmod foot
below ADV advmod below
. PUNCT punct .
  SPACE   

A               DET   det        a              
man             NOUN  nsubj      man            
stood           VERB  ROOT       stand          
upon            ADP   prep       upon           
a               DET   det        a              
railroad        NOUN  compound   railroad       
bridge          NOUN  pobj       bridge         
in              ADP   prep       in             
northern        ADJ   amod       northern       
Alabama         PROPN pobj       alabama        
,               PUNCT punct      ,              
looking         VERB  advcl      look           
down            PART  prt        down           

               SPACE            
              
into            ADP   prep       into           
the             DET   det        the            
swift           ADJ   amod       swift          
water           NOUN  pobj       water          
twenty          NUM   nummod     twenty         
feet            NOUN  npadvmod   foot           
below           ADV   advmod     below          
.               PUNCT punct      .              
                SPACE                           
**Solution**
for token in sentences[0]:
    #print(token.text,token.pos_,token.tag_,token.lemma_)
    print(f'{token.text:>{10}}{token.pos_:>{10}}{token.dep_:>{10}}{token.lemma_:>{10}}')
        AN       DET       det        an
OCCURRENCE      NOUN     nsubjoccurrence
        AT       ADP      prep        at
       OWL     PROPN  compound       OWL
     CREEK      VERB      amod     CREEK
    BRIDGE     PROPN  compound    BRIDGE
        

     SPACE       dep        


        by       ADP      prep        by
   Ambrose     PROPN  compound   Ambrose
    Bierce     PROPN      pobj    Bierce
        

     SPACE       dep        


         I      PRON     nsubj         I
        

     SPACE       dep        


         A       DET       det         a
       man      NOUN     nsubj       man
     stood      VERB      ROOT     stand
      upon     SCONJ      prep      upon
         a       DET       det         a
  railroad      NOUN  compound  railroad
    bridge      NOUN      pobj    bridge
        in       ADP      prep        in
  northern       ADJ      amod  northern
   Alabama     PROPN      pobj   Alabama
         ,     PUNCT     punct         ,
   looking      VERB     advcl      look
      down       ADV    advmod      down
         
     SPACE       dep         

      into       ADP      prep      into
       the       DET       det       the
     swift       ADJ      amod     swift
     water      NOUN      pobj     water
    twenty       NUM    nummod    twenty
      feet      NOUN  npadvmod      foot
     below       ADV    advmod     below
         .     PUNCT     punct         .

6. Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text
HINT: You should include an 'IS_SPACE': True pattern between the two words!

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
 
 
[(12881893835109366681, 1274, 1277), (12881893835109366681, 3607, 3610)]

Solution

pattern = [{'LOWER': 'swimming'},{'IS_SPACE':True,'OP':'*'},{'LOWER':'vigorously'}]

matcher.add('Swimming',None, pattern)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Users\VICKY~1.CRA\AppData\Local\Temp/ipykernel_8136/1760381471.py in <module>
      1 pattern = [{'LOWER': 'swimming'},{'IS_SPACE':True,'OP':'*'},{'LOWER':'vigorously'}]
      2 
----> 3 matcher.add('Swimming',None, pattern)

~\Anaconda3\lib\site-packages\spacy\matcher\matcher.pyx in spacy.matcher.matcher.Matcher.add()

TypeError: add() takes exactly 2 positional arguments (3 given)
pattern = [{'LOWER': 'swimming'}, {'IS_SPACE': True, 'OP':'*'}, {'LOWER': 'vigorously'}]

matcher.add('Swimming',[pattern])
found_matches = matcher(doc)
print(found_matches)
[(12881893835109366681, 1274, 1277), (12881893835109366681, 3609, 3612)]

7. Print the text surrounding each found match

Solution

print(doc[1265:1290])
By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home
print(doc[3600:3615])
all this over his shoulder; he was now swimming
vigorously with the current

EXTRA CREDIT:
Print the sentence that contains each found match


By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.  

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.  

Solution

for sent in sentences:
    if found_matches[0][1]<sent.end:
        print(sent)
        break
 By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.
for sent in sentences:
    if found_matches[1][1]<sent.end:
        print(sent)
        break

The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.