We can use pdf.getPage() to get a specific page from the pdf object. A heads-up – we’ll have to slightly modify this list later on. So we can construct a list to store the page numbers. Pdf.getDocumentInfo() Use Python to extract basic PDF file infoįor demonstration, I’m going to pick some random pages to extract from the file, let’s say I want to get only pages 1-3, 5, 6, and 11-12. It looks like the author used MS Word to create this 12-page document then converted into PDF. Let’s check some basic info about this PDF file. Feel free to download the PDF to follow along. In this example, I’m using the same WHO Covid report that I used in another tutorial ( convert PDF to Excel using Python). And you can access the information contained in the PDF. Now we have an object called pdf to represent the actual PDF file. Pdf = PdfFileReader(r'C:\Users\JZ\Desktop\PythonInOffice\split_and_merge_pdf\data.pdf') from PyPDF4 import PdfFileReader, PdfFileWriter To read files sitting on my computer, I like to use the raw string (r-string) because of it’s simple syntax. And later, we’ll need to instantiate a PdfFileWriter object to save PDF files. We’ll instantiate (read: create) a PdfFileReader object to represent the PDF file. To work with PDF files, we’ll use the PyPDF4 library, use pip install to get it. Who doesn’t love a free solution? Install Python library and load a PDF file into Python Adobe Acrobat Pro DC allows you to split and merge PDF files, but at a cost like $200 USD/year, no thanks!Īs usual, I turned to Python for this situation. I didn’t want to send the whole file because some pages contain personal information that I’m not comfortable sharing. I once received a 20-page PDF bank statement, and I needed to forward just 3 of the pages to another party. In this short tutorial, I will walk you through how to split and merge PDF files using Python.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |