Extract PDF to Text File With Python

Today we are going to create a PDF to Text extractor.

As always, first we need to install and import the package. We are going to use the PyPDF package. It can be used to achieve what we want (text extraction), although it can do more than what we need. This package can also be used to generate, decrypting and merging PDF files.

To install this package, type the below command in the terminal.

pip install PyPDF2

# importing required modules 
import PyPDF2 
    
# We are opening the example.pdf and saved the file object as pdfFileObj
pdfFileObj = open('example.pdf', 'rb') 
    
# creating a pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
    
# printing number of pages in pdf file with the .numPages property
print(pdfReader.numPages) 
    
# creating a page object of PageObject class of PyPDF2 module
pageObj = pdfReader.getPage(0) 
    
# extracting text from page with the extractText() function
print(pageObj.extractText()) 

# saving the extracted text to .txt file
with open('output_file.txt', 'w') as the_file:
    the_file.write(pageObj.extractText())
    
# closing the pdf file object 
pdfFileObj.close()

Popular Topics

PopularView All

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Extract PDF to Text File With Python

Leave a ReplyCancel reply

How to Create GUI Application with Python and Tkinter

Convert Text to Speech with Python in Different Languages

High Resolution YouTube GUI Video Downloader

How to Create TCP Proxy in Python

Types of Missing Data in Data Science

Train a Simple Artificial Neural Network to Classify Images of Clothing

Scrape a Table From a Website and Save it as an Excel File

QR Code Generator in Python with Source Code

Create a Chatbot in Python with ChatterBot that Save The Conversation

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Extract PDF to Text File With Python

Leave a ReplyCancel reply

How to Create GUI Application with Python and Tkinter

Convert Text to Speech with Python in Different Languages

You May Also Like

High Resolution YouTube GUI Video Downloader

How to Create TCP Proxy in Python

Types of Missing Data in Data Science

Train a Simple Artificial Neural Network to Classify Images of Clothing

Matplotlib Tutorial for Beginners

Scrape a Table From a Website and Save it as an Excel File

QR Code Generator in Python with Source Code

Create a Chatbot in Python with ChatterBot that Save The Conversation