Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these stop words, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words

import spacy
nlp = spacy.load('en_core_web_sm')

Print the set of spaCy's default stop words

print(nlp.Defaults.stop_words)

{'whereafter', 'her', 'although', 'across', 'make', 'be', 'wherein', 'hereby', 'latterly', 'last', 'ours', '‘ll', 'twelve', 'behind', 'they', 'did', 'herself', 'anyhow', 'namely', 'him', 'always', 'however', 'or', 'ever', 'ten', 'again', 'almost', 'has', 'formerly', 'hereafter', 'do', 'anyway', 'cannot', 'due', 'via', 'more', 'whatever', 'my', 'by', 'get', "'d", 'some', 'were', 'name', 'seeming', 'mostly', 'same', 'often', 'wherever', 'six', 'yours', 'keep', 'everywhere', 'here', 'thus', 'used', 'part', 'anything', 'over', 'for', 'upon', 'until', 'seem', 'onto', 'can', 'above', 'this', 'not', 'down', 'yet', 'something', 'further', 'amongst', 'whether', 'she', 'someone', 'if', 'beforehand', 'should', 'all', 'whence', 'becomes', 'one', '‘m', 'while', 'another', "'m", '‘ve', 'before', '‘s', 'anywhere', 'go', 'first', 'sixty', 'whereupon', 'does', 'your', 'nothing', 'least', 'but', 'various', 'anyone', 'been', 'might', 'though', 'why', 'say', 'see', 'both', 'beyond', 'somewhere', 'its', 'five', 'afterwards', 'seems', 'these', 'even', 'ourselves', 'still', 'nowhere', 'next', 'top', 'those', 'became', 'quite', 'yourself', 'well', 'fifteen', 'he', 'amount', 'moreover', 'other', "'re", 'much', 'nevertheless', 'themselves', 'must', 'mine', 'an', 'whose', 'thereupon', 'made', 'therein', 'own', 'whenever', 'except', 'to', '’ll', 'at', 'under', 'elsewhere', 'alone', 'eight', 'now', 'without', 'otherwise', 'empty', 'four', 'us', "'ve", 'then', 'eleven', 'too', 'n‘t', "'ll", 'thence', 'full', 'as', 'their', 'in', 'out', 'along', '’re', '’s', 'whither', 'whoever', 'put', 'since', 'itself', 'never', 'around', 'hence', 'during', 'everything', 'about', 'using', 'himself', 'yourselves', 'doing', 'against', 'also', 'being', 'which', 'we', 'with', 'side', 'besides', 're', 'very', 'forty', 'through', 'sometimes', 'unless', 'together', 'serious', 'sometime', 'move', 'up', 'throughout', 'would', 'latter', 'two', 'meanwhile', 'therefore', 'such', 'where', 'whereby', 'who', 'many', 'how', 'thereby', 'twenty', 'from', 'back', 'into', 'the', 'else', 'herein', 'becoming', 'myself', 'nor', 'am', 'what', 'done', 'just', 'is', 'seemed', 'every', '’m', 'already', 'that', "n't", 'hundred', 'within', 'on', 'call', 'beside', 'whole', 'among', 'please', 'others', 'ca', '’ve', 'me', 'n’t', "'s", 'none', 'several', 'few', 'indeed', 'so', 'regarding', 'most', 'whom', 'whereas', '’d', 'nine', 'may', 'between', 'once', 'i', 'will', 'no', 'only', 'either', 'towards', 'was', 'our', 'a', 'three', 'had', 'thereafter', 'there', 'are', 'when', 'front', 'than', 'less', '‘d', 'nobody', 'per', 'take', 'neither', 'could', 'it', 'enough', 'after', 'off', 'hers', 'former', 'below', 'bottom', 'become', 'hereupon', 'noone', 'show', 'give', 'third', 'everyone', 'somehow', 'them', 'each', 'any', 'because', 'have', 'rather', 'really', 'and', 'you', 'of', '‘re', 'toward', 'his', 'perhaps', 'thru', 'fifty'}

len(nlp.Defaults.stop_words)

326

Check if a word is stop word or not

nlp.vocab['myself'].is_stop

True

nlp.vocab['elephant'].is_stop

False

Step 1 - Add the word to the set of stop words. Use lowercase!

nlp.Defaults.stop_words.add('btw')

Step 2 - Set the stop word tag on the lexeme

nlp.vocab['btw'].is_stop = True

len(nlp.Defaults.stop_words)

327

nlp.vocab['btw'].is_stop

True

When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.

Removing a stop word from the default list.

Step 1 - Remove the word from the set of stop words

nlp.Defaults.stop_words.remove('beyond')

Step 2 - Remove the stop_word tag from the lexeme

nlp.vocab['beyond'].is_stop = False

len(nlp.Defaults.stop_words)

326

nlp.vocab['beyond'].is_stop

False