Convert Speech to Text with Python Source Code for Beginners

A few weeks ago, we created a Text to Speech project in Python. Today we are going to create an opposite Speech to Text script.
Transform your spoken words into written text with this powerful Speech to Text Python script! Using state-of-the-art machine learning algorithms, this script converts spoken language into written text in real-time, making it perfect for dictation, transcription, and voice-controlled applications. The script is easy to set up, customize and integrate into your projects, and it supports multiple languages. Whether you’re looking to automate speech recognition tasks or build cutting-edge voice-controlled systems, this script is the perfect solution for you.

To get started, let’s install the required module:

pip3 install SpeechRecognition pydub

Make sure you have an audio file in the current directory.

# Speech to Text Convertor with Python

import speech_recognition as sr


filename = "speech.wav"

# initialize the recognizer
r = sr.Recognizer()

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

You can use this script for small or medium size audio files, but for larger files we are going to use the following script:

# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    # split audio sound where silence is 700 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

Popular Topics

PopularView All

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Convert Speech to Text with Python Source Code for Beginners

Leave a ReplyCancel reply

Python Lambda Function with Practical Examples

Google Hacking (Dorking) Tutorial for Beginners

Python Lambda Function with Practical Examples

Python LOOPS Exercise for Beginners and Intermediate

QR Code Generator in Python with Source Code

Python’s Requests Library Tutorial

Create an Amazon Price Tracker with Python

NumPy Tutorial for Beginners

Linear Algebra with Python – Tutorial for Beginners

Train a Simple Artificial Neural Network to Classify Images of Clothing

Using grep command in Linux – Tutorial

Difference Between a Block and a ViewModel in Magento 2

Magento 2 Module Development Documentation

Understanding Magento 2 Architecture

Convert Speech to Text with Python Source Code for Beginners

Leave a ReplyCancel reply

Python Lambda Function with Practical Examples

Google Hacking (Dorking) Tutorial for Beginners

You May Also Like

Python Lambda Function with Practical Examples

Python LOOPS Exercise for Beginners and Intermediate

QR Code Generator in Python with Source Code

Python’s Requests Library Tutorial

Create an Amazon Price Tracker with Python

NumPy Tutorial for Beginners

Linear Algebra with Python – Tutorial for Beginners

Train a Simple Artificial Neural Network to Classify Images of Clothing