Installing the library PyPDF2

pip install PyPDF2
Collecting PyPDF2
  Downloading PyPDF2-2.4.0-py3-none-any.whl (197 kB)
Requirement already satisfied: typing-extensions in c:\users\vicky.crasto\anaconda3\lib\site-packages (from PyPDF2) (3.10.0.2)
Installing collected packages: PyPDF2
Successfully installed PyPDF2-2.4.0
Note: you may need to restart the kernel to use updated packages.
import PyPDF2

Working with PyPDF2

Reading a pdf

pdf1 = open("data_files/US_Declaration.pdf", 'rb')

Note : the mode is 'rb' - read the file as a binary

Creating a pdf reader instance

pdf_reader = PyPDF2.PdfFileReader(pdf1)

Number of pages in the pdf

pdf_reader.numPages
5

Extracting text from a page

page_one = pdf_reader.getPage(0)
page_one_text = page_one.extractText()
page_one_text
" Declaration of Independence\nIN CONGRESS, July 4, 1776.  \nThe unanimous Declaration of the thirteen united States of America, \nWhen in the Course of human events, it becomes necessary for one people to dissolve the\npolitical bands which have connected them with another, and to assume among the powers of the\nearth, the separate and equal station to which the Laws of Nature and of Nature's God entitle\nthem, a decent respect to the opinions of mankind requires that they should declare the causes\nwhich impel them to the separation. \nWe hold these truths to be self-evident, that all men are created equal, that they are endowed by\ntheir Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit\nof Happiness.— \x14That to secure these rights, Governments are instituted among Men, deriving\ntheir just powers from the consent of the governed,—  \x14That whenever any Form of Government\nbecomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to\ninstitute new Government, laying its foundation on such principles and organizing its powers in\nsuch form, as to them shall seem most likely to effect their Safety and Happiness. Prudence,\nindeed, will dictate that Governments long established should not be changed for light and\ntransient causes; and accordingly all experience hath shewn, that mankind are more disposed to\nsuffer, while evils are sufferable, than to right themselves by abolishing the forms to which they\nare accustomed. But when a long train of abuses and usurpations, pursuing invariably the same\nObject evinces a design to reduce them under absolute Despotism, it is their right, it is their duty,\nto throw off such Government, and to provide new Guards for their future security.— \x14Such has\nbeen the patient sufferance of these Colonies; and such is now the necessity which constrains\nthem to alter their former Systems of Government. The history of the present King of Great\nBritain is a history of repeated injuries and usurpations, all having in direct object the\nestablishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a\ncandid world. \nHe has refused his Assent to Laws, the most wholesome and necessary for the\npublic good.\nHe has forbidden his Governors to pass Laws of immediate and pressing\nimportance, unless suspended in their operation till his Assent should be obtained;\nand when so suspended, he has utterly neglected to attend to them.\nHe has refused to pass other Laws for the accommodation of large districts of\npeople, unless those people would relinquish the right of Representation in the\nLegislature, a right inestimable to them and formidable to tyrants only. \nHe has called together legislative bodies at places unusual, uncomfortable, and distant\nfrom the depository of their public Records, for the sole purpose of fatiguing them into\ncompliance with his measures."
pdf1.close()

Adding pages to pdf file

Open the pdf and extracting the first page

pdf2 = open("data_files/US_Declaration.pdf",'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf2)
first_page = pdf_reader.getPage(0)

Creating a writer object

pdf_writer = PyPDF2.PdfFileWriter()
pdf_writer.addPage(first_page)
pdf_output = open("New_doc.pdf", 'wb')
pdf_writer.write(pdf_output)
pdf_output.close()
pdf2.close()

Checking the new doc which is created

pdf3 = open("New_doc.pdf", 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf3)
pdf_reader.numPages
1
pdf3.close()