The challenge of correctly identifying parts of speech is summed up nicely in the spaCy docs:

Processing raw text intelligently is difficult: most words are rare, and it's common for words that look completely different to mean almost the same thing. The same words in a different order can mean something completely different. Even splitting text into useful word-like units can be difficult in many languages. While it's possible to solve some problems starting from only the raw characters, it's usually better to use linguistic knowledge to add useful information. That's exactly what spaCy is designed to do: you put in raw text, and get back a **Doc** object, that comes with a variety of annotations.
In this section we'll take a closer look at coarse POS tags (noun, verb, adjective) and fine-grained tags (plural noun, past-tense verb, superlative adjective).
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

View token tags

Recall that you can obtain a particular token by its index position.

  • To view the coarse POS tag use token.pos_
  • To view the fine-grained tag use token.tag_
  • To view the description of either type of tag use spacy.explain(tag)
Note that `token.pos` and `token.tag` return integer hash values; by adding the underscores we get the text equivalent that lives in **doc.vocab**.
print(doc.text)
The quick brown fox jumped over the lazy dog's back.
print(doc[4].text, doc[4].pos_, doc[4].tag_, spacy.explain(doc[4].tag_))
jumped VERB VBD verb, past tense

We can apply this technique to the entire Doc object:

for token in doc:
    print(f'{token.text:{10}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')
The        DET      DT     determiner
quick      ADJ      JJ     adjective (English), other noun-modifier (Chinese)
brown      ADJ      JJ     adjective (English), other noun-modifier (Chinese)
fox        NOUN     NN     noun, singular or mass
jumped     VERB     VBD    verb, past tense
over       ADP      IN     conjunction, subordinating or preposition
the        DET      DT     determiner
lazy       ADJ      JJ     adjective (English), other noun-modifier (Chinese)
dog        NOUN     NN     noun, singular or mass
's         PART     POS    possessive ending
back       NOUN     NN     noun, singular or mass
.          PUNCT    .      punctuation mark, sentence closer

Coarse-grained Part-of-speech Tags

Every token is assigned a POS Tag from the following list:

</p> </div> </div> </div>

Fine-grained Part-of-speech Tags

Tokens are subsequently given a fine-grained tag as determined by morphology:

POS DESCRIPTION EXAMPLES
ADJ adjective *big, old, green, incomprehensible, first*
ADP adposition *in, to, during*
ADV adverb *very, tomorrow, down, where, there*
AUX auxiliary *is, has (done), will (do), should (do)*
CONJ conjunction *and, or, but*
CCONJ coordinating conjunction *and, or, but*
DET determiner *a, an, the*
INTJ interjection *psst, ouch, bravo, hello*
NOUN noun *girl, cat, tree, air, beauty*
NUM numeral *1, 2017, one, seventy-seven, IV, MMXIV*
PART particle *'s, not,*
PRON pronoun *I, you, he, she, myself, themselves, somebody*
PROPN proper noun *Mary, John, London, NATO, HBO*
PUNCT punctuation *., (, ), ?*
SCONJ subordinating conjunction *if, while, that*
SYM symbol *$, %, §, ©, +, −, ×, ÷, =, :), 😝*
VERB verb *run, runs, running, eat, ate, eating*
X other *sfpksdpsxmsa*
SPACE space
POS Description Fine-grained Tag Description Morphology
ADJ adjective AFX affix Hyph=yes
ADJ JJ adjective Degree=pos
ADJ JJR adjective, comparative Degree=comp
ADJ JJS adjective, superlative Degree=sup
ADJ PDT predeterminer AdjType=pdt PronType=prn
ADJ PRP\$ pronoun, possessive PronType=prs Poss=yes
ADJ WDT wh-determiner PronType=int rel
ADJ WP\$ wh-pronoun, possessive Poss=yes PronType=int rel
ADP adposition IN conjunction, subordinating or preposition
ADV adverb EX existential there AdvType=ex
ADV RB adverb Degree=pos
ADV RBR adverb, comparative Degree=comp
ADV RBS adverb, superlative Degree=sup
ADV WRB wh-adverb PronType=int rel
CONJ conjunction CC conjunction, coordinating ConjType=coor
DET determiner DT determiner
INTJ interjection UH interjection
NOUN noun NN noun, singular or mass Number=sing
NOUN NNS noun, plural Number=plur
NOUN WP wh-pronoun, personal PronType=int rel
NUM numeral CD cardinal number NumType=card
PART particle POS possessive ending Poss=yes
PART RP adverb, particle
PART TO infinitival to PartType=inf VerbForm=inf
PRON pronoun PRP pronoun, personal PronType=prs
PROPN proper noun NNP noun, proper singular NounType=prop Number=sign
PROPN NNPS noun, proper plural NounType=prop Number=plur
PUNCT punctuation -LRB- left round bracket PunctType=brck PunctSide=ini
PUNCT -RRB- right round bracket PunctType=brck PunctSide=fin
PUNCT , punctuation mark, comma PunctType=comm
PUNCT : punctuation mark, colon or ellipsis
PUNCT . punctuation mark, sentence closer PunctType=peri
PUNCT '' closing quotation mark PunctType=quot PunctSide=fin
PUNCT "" closing quotation mark PunctType=quot PunctSide=fin
PUNCT `` opening quotation mark PunctType=quot PunctSide=ini
PUNCT HYPH punctuation mark, hyphen PunctType=dash
PUNCT LS list item marker NumType=ord
PUNCT NFP superfluous punctuation
SYM symbol # symbol, number sign SymType=numbersign
SYM \$ symbol, currency SymType=currency
SYM SYM symbol
VERB verb BES auxiliary "be"
VERB HVS forms of "have"
VERB MD verb, modal auxiliary VerbType=mod
VERB VB verb, base form VerbForm=inf
VERB VBD verb, past tense VerbForm=fin Tense=past
VERB VBG verb, gerund or present participle VerbForm=part Tense=pres Aspect=prog
VERB VBN verb, past participle VerbForm=part Tense=past Aspect=perf
VERB VBP verb, non-3rd person singular present VerbForm=fin Tense=pres
VERB VBZ verb, 3rd person singular present VerbForm=fin Tense=pres Number=sing Person=3
X other ADD email
X FW foreign word Foreign=yes
X GW additional word in multi-word expression
X XX unknown
SPACE space _SP space
NIL missing tag

For a current list of tags for all languages visit https://spacy.io/api/annotation#pos-tagging

Working with POS Tags

In the English language, the same string of characters can have different meanings, even within the same sentence. For this reason, morphology is important. spaCy uses machine learning algorithms to best predict the use of a token in a sentence. Is "I read books on NLP" present or past tense? Is wind a verb or a noun?

doc = nlp(u'I read books on NLP.')
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')
read       VERB     VBD    verb, past tense
doc = nlp(u'I read a book on NLP.')
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')
read       VERB     VBD    verb, past tense

In the first example, with no other cues to work from, spaCy assumed that read was present tense.
In the second example the present tense form would be I am reading a book, so spaCy assigned the past tense.

Counting POS Tags

The Doc.count_by() method accepts a specific token attribute as its argument, and returns a frequency count of the given attribute as a dictionary object. Keys in the dictionary are the integer values of the given attribute ID, and values are the frequency. Counts of zero are not included.

doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

# Count the frequencies of different coarse-grained POS tags:
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts
{90: 2, 84: 3, 92: 3, 100: 1, 85: 1, 94: 1, 97: 1}

This isn't very helpful until you decode the attribute ID:

doc.vocab[83].text
'LANG'

Create a frequency list of POS tags from the entire document

Since POS_counts returns a dictionary, we can obtain a list of keys with POS_counts.items().
By sorting the list we have access to the tag and its count, in order.

POS_counts.items()
dict_items([(90, 2), (84, 3), (92, 3), (100, 1), (85, 1), (94, 1), (97, 1)])
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')
83. ADJ  : 3
84. ADP  : 1
89. DET  : 2
91. NOUN : 3
93. PART : 1
96. PUNCT: 1
99. VERB : 1
TAG_counts = doc.count_by(spacy.attrs.TAG)

for k,v in sorted(TAG_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')
74. POS : 1
1292078113972184607. IN  : 1
10554686591937588953. JJ  : 3
12646065887601541794. .   : 1
15267657372422890137. DT  : 2
15308085513773655218. NN  : 3
17109001835818727656. VBD : 1
**Why did the ID numbers get so big?** In spaCy, certain text values are hardcoded into `Doc.vocab` and take up the first several hundred ID numbers. Strings like 'NOUN' and 'VERB' are used frequently by internal operations. Others, like fine-grained tags, are assigned hash values as needed.
**Why don't SPACE tags appear?** In spaCy, only strings of spaces (two or more) are assigned tokens. Single spaces are not.
DEP_counts = doc.count_by(spacy.attrs.DEP)

for k,v in sorted(DEP_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')
402. amod: 3
415. det : 2
429. nsubj: 1
439. pobj: 1
440. poss: 1
443. prep: 1
445. punct: 1
8110129090154140942. case: 1
8206900633647566924. ROOT: 1

Here we've shown spacy.attrs.POS, spacy.attrs.TAG and spacy.attrs.DEP.
Refer back to the Vocabulary and Matching lecture from the previous section for a table of Other token attributes.

Fine-grained POS Tag Examples


These are some grammatical examples (shown in bold) of specific fine-grained tags. We've removed punctuation and rarely used tags:

POS TAG DESCRIPTION EXAMPLE
ADJ AFX affix The Flintstones were a **pre**-historic family.
ADJ JJ adjective This is a **good** sentence.
ADJ JJR adjective, comparative This is a **better** sentence.
ADJ JJS adjective, superlative This is the **best** sentence.
ADJ PDT predeterminer Waking up is **half** the battle.
ADJ PRP\$ pronoun, possessive **His** arm hurts.
ADJ WDT wh-determiner It's blue, **which** is odd.
ADJ WP\$ wh-pronoun, possessive We don't know **whose** it is.
ADP IN conjunction, subordinating or preposition It arrived **in** a box.
ADV EX existential there **There** is cake.
ADV RB adverb He ran **quickly**.
ADV RBR adverb, comparative He ran **quicker**.
ADV RBS adverb, superlative He ran **fastest**.
ADV WRB wh-adverb **When** was that?
CONJ CC conjunction, coordinating The balloon popped **and** everyone jumped.
DET DT determiner **This** is **a** sentence.
INTJ UH interjection **Um**, I don't know.
NOUN NN noun, singular or mass This is a **sentence**.
NOUN NNS noun, plural These are **words**.
NOUN WP wh-pronoun, personal **Who** was that?
NUM CD cardinal number I want **three** things.
PART POS possessive ending Fred**'s** name is short.
PART RP adverb, particle Put it **back**!
PART TO infinitival to I want **to** go.
PRON PRP pronoun, personal **I** want **you** to go.
PROPN NNP noun, proper singular **Kilroy** was here.
PROPN NNPS noun, proper plural The **Flintstones** were a pre-historic family.
VERB MD verb, modal auxiliary This **could** work.
VERB VB verb, base form I want to **go**.
VERB VBD verb, past tense This **was** a sentence.
VERB VBG verb, gerund or present participle I am **going**.
VERB VBN verb, past participle The treasure was **lost**.
VERB VBP verb, non-3rd person singular present I **want** to go.
VERB VBZ verb, 3rd person singular present He **wants** to go.
</div>