Extract PDF to Text File With Python

Today we are going to create a PDF to Text extractor.

As always, first we need to install and import the package. We are going to use the PyPDF package. It can be used to achieve what we want (text extraction), although it can do more than what we need. This package can also be used to generate, decrypting and merging PDF files.

To install this package, type the below command in the terminal.

pip install PyPDF2

# importing required modules 
import PyPDF2 
    
# We are opening the example.pdf and saved the file object as pdfFileObj
pdfFileObj = open('example.pdf', 'rb') 
    
# creating a pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
    
# printing number of pages in pdf file with the .numPages property
print(pdfReader.numPages) 
    
# creating a page object of PageObject class of PyPDF2 module
pageObj = pdfReader.getPage(0) 
    
# extracting text from page with the extractText() function
print(pageObj.extractText()) 

# saving the extracted text to .txt file
with open('output_file.txt', 'w') as the_file:
    the_file.write(pageObj.extractText())
    
# closing the pdf file object 
pdfFileObj.close()

Popular Topics

PopularView All

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Extract PDF to Text File With Python

Leave a ReplyCancel reply

How to Create GUI Application with Python and Tkinter

Convert Text to Speech with Python in Different Languages

Python LOOPS Exercise for Beginners and Intermediate

Linear Algebra with Python – Tutorial for Beginners

Python CONDITIONALS Exercise for Beginners and Intermediate

Create a Python Trivia Quiz Game with Python

Convert Speech to Text with Python Source Code for Beginners

Python’s Requests Library Tutorial

NumPy Tutorial for Beginners

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Extract PDF to Text File With Python

Leave a ReplyCancel reply

How to Create GUI Application with Python and Tkinter

Convert Text to Speech with Python in Different Languages

You May Also Like

Python LOOPS Exercise for Beginners and Intermediate

Linear Algebra with Python – Tutorial for Beginners

Python CONDITIONALS Exercise for Beginners and Intermediate

Create a Python Trivia Quiz Game with Python

Convert Speech to Text with Python Source Code for Beginners

Matplotlib Tutorial for Beginners

Python’s Requests Library Tutorial

NumPy Tutorial for Beginners