DIY Voice-Controlled GPT Chat: A Step-by-Step Guide for Beginners

8 min readJun 7, 2023

Effortless Voice-to-Text Communication with GPT Chat on Telegram

TL;DR: Feel free to copy this Pipedream workflow by simply clicking on this URL.

Imagine the convenience of opening Telegram, accessing a chat, and simply dictating voice messages to receive instant text responses from a GPT chat. This intelligent chatbot even remembers your conversation until you decide to reset the message history using a special command. In this article, Kirill Markin, founder ozma.io, will provide a comprehensive guide on achieving this user-friendly experience.

Using a voice-controlled GPT chat goes beyond casual conversations. It is particularly useful for tasks like drafting articles or composing emails, especially when working in a non-native language. Just express the core message through dictation, and with a few iterations, the GPT chat will help you enhance and refine the text to achieve a polished final result.

Step 1: Create a Telegram Bot and Setting Up Pipedream as the Data Source

To receive responses from a GPT chat when sending a voice message on Telegram, we need to establish a “brain” for our Telegram bot. This “brain” should be capable of processing our message, converting it to text, sending it to the GPT chat, incorporating our conversation history, and relaying the GPT chat’s response. Pipedream, a powerful tool for creating automated workflows, will assist us in achieving this automation.

Our first task is to set up a new data source, called “Source,” in Pipedream. This Source will trigger the automation each time our Telegram bot receives a new message. By configuring this Source in Pipedream, we lay the foundation for a smooth and user-friendly voice-to-text communication system within Telegram, leveraging the advanced capabilities of GPT chat technology.

Step 2: Configuring a Pipedream Workflow for Audio-to-Text Conversion

In Step 2, our objective is to automate the conversion of voice messages into text. To accomplish this, we will create a new workflow in Pipedream triggered by the previously configured Source whenever a new message is received.

// stop_if_not_file_or_command

export default defineComponent({
  async run({ steps, $ }) {
    const message = steps.trigger.event.message;
    if (!message.voice || !message.voice.file_id) {
      return $.flow.exit("message.voice.file_id does not exist")
    } else if (typeof message.text === 'string' && message.text.length !== 0 && message.text[0] === "/") {
      return $.flow.exit("message.text is a comand")
    } 
  },
})

Initially, we need to add a small piece of code to pause the trigger when it is not needed. This ensures that our system operates efficiently and only when necessary. Next, we will download the audio file sent on Telegram. As we initially only have the file ID, we must retrieve the actual file before proceeding.

// download_the_file

import fs from "fs";

import TelegramBot from "node-telegram-bot-api";

export default defineComponent({
  props: {
    telegram_bot_api: {
      type: "app",
      app: "telegram_bot_api",
    }
  },
  async run({ steps, $ }) {
    // replace the value below with the Telegram token you receive from @BotFather
    const token = this.telegram_bot_api.$auth.token;
    const fileId = steps.trigger.event.message.voice.file_id;

    const bot = new TelegramBot(token);

    // const stream = bot.getFileStream(fileId);
    // await fs.createWriteStream(`tmp/${fileId}`);

    const fileLocalPath = `/tmp/${fileId}.oga`;
    let fileWriter = fs.createWriteStream(fileLocalPath); //creating stream for writing to file

    // wrap to promise to use await as streams are not async/await based (they are based on events)
    const getReadStreamPromise = () => {
      return new Promise((resolve, reject) => {
        const stream = bot.getFileStream(fileId); //getting strean to file bytes
        stream.on('data', (chunk) => {
          console.log('getting data')
          fileWriter.write(chunk); //copying to our file chunk by chunk
        })
        stream.on('error', (err) => {
          console.log('err')
          reject(err);
        })
        stream.on('end', () => {
          console.log('end')
          fileWriter.end();  //Close the write stream once all data has been written
          resolve();
        })
      })
    }
    console.log('Start file downloading and saving');
    await getReadStreamPromise(); 
    console.log('File saved');

    return fileLocalPath
  },
})

Once we have the audio file, we need to convert it from the OGA format to MP3 since OpenAI’s transcription service does not support OGA files. With the MP3 file ready, the final step in this process is to send the converted audio file to OpenAI, which will generate a text transcript of the audio message.

// oga_to_mp3

import ffmpeg from "fluent-ffmpeg"; 
import ffmpegInstaller from "@ffmpeg-installer/ffmpeg"; 

export default defineComponent({ 
  async run({ steps, $ }) { 
    // Set up ffmpeg with the installed package 
    ffmpeg.setFfmpegPath(ffmpegInstaller.path); 
    
    const inputPath = steps.download_the_file.$return_value
    const outputPath = "/tmp/output.mp3"; 
    
    // Convert the OGA file to MP3 using ffmpeg 
    await new Promise((resolve, reject) => { 
      ffmpeg(inputPath) 
        .output(outputPath) 
        .on("end", resolve) 
        .on("error", reject) 
        .run(); 
    }); 
    
    // Return the path to the saved MP3 file 
    return outputPath; 
  }, 
});

By the end of Step 2, our Pipedream workflow will have seamlessly converted the voice message into text, laying the foundation for further integration of the GPT chat in the subsequent steps.

Step 3: Enhancing the Workflow for GPT Chat Integration and Data Storage

Step 3 focuses on integrating the GPT chat into our workflow and retrieving the conversation history to generate context-aware responses. We begin by accessing our conversation history stored in Pipedream’s Data Stores, ensuring that our GPT chat has the necessary context to provide meaningful replies.

Next, we add a hidden prompt to our dialogue to improve the flow of the conversation. If the conversation becomes too long and exceeds the GPT chat’s memory limit, we will need to reduce the number of messages following the prompt to maintain optimal performance.

// add_hidden_start_of_conversation

const prompt = (
`Act as assistant
Your name is Donna
You are female
You should be friendly
You should not use official tone
Your answers should be simple, and laconic but informative
Before providing an answer check information above one more time
Try to solve tasks step by step
I will send you questions or topics to discuss and you will answer me
`);

export default defineComponent({
  async run({ steps, $ }) {
    const messages = steps.get_history.$return_value;

    // Define new messages
    const newMessages = [
      {
        role: 'user', 
        content: prompt,
      },
    ];

    // Prepend new messages to the existing messages array
    return [...newMessages, ...messages];
  },
})

Once our conversation is properly formatted, we send the information to the GPT chat, which generates a response based on the input and conversation history. Finally, we receive the GPT chat’s response and send it back to Telegram, completing the seamless integration of the voice-controlled GPT chat.

// decrease_history_tokens

const maxCharsForDialog = 15000;

const charsForDialog = (messages) => {
  let result_string = 0;
  for (let i = 0; i < messages.length; i++) {
    result_string += messages[i]["content"] + ' ';
  }
  return result_string.length;
};

export default defineComponent({
  async run({ steps, $ }) {
    const messages = steps.add_hidden_start_of_conversation.$return_value;
    let messagesLen = charsForDialog(messages);
    console.log("texts from raw messages length: ", messagesLen);
    while (messagesLen > maxCharsForDialog) {
      // Remove 1 message
      messages.splice(2, 1);
      messagesLen = charsForDialog(messages);
    }
    console.log("texts from cleaned messages length: ", charsForDialog(messages));
    return messages;
  },
})

This step ensures that our GPT chat provides relevant and contextually appropriate responses, enhancing the overall user experience.

Step 4: Storing New Messages in Pipedream’s Data Stores for Future Conversations

In the final step of our workflow, we focus on storing the latest messages in our conversation history. This ensures that future GPT chat responses are informed by these exchanges. To achieve this, we save both the user’s message and the GPT chat’s response.

First, we convert each message into a string using specific functions, which will be demonstrated in accompanying screenshots. This process ensures that our conversation data is in a suitable format for storage and future use.

// user_message_to_str

export default defineComponent({
  async run({ steps, $ }) {
    return JSON.stringify(steps.create_transcription.$return_value.transcription)
  },
})

Next, we add the converted messages to the appropriate conversation history in Pipedream’s Data Stores. By doing this, we enable the GPT chat to reference these messages when generating new responses, resulting in more contextually accurate replies.

With these steps completed, our voice-controlled GPT chat workflow is now fully functional and ready for use. It delivers a seamless and efficient user experience.

// assistant_message_to_str

export default defineComponent({
  async run({ steps, $ }) {
    return JSON.stringify(steps.chat.$return_value.generated_message.content)
  },
})

Step 5: Enjoying the Results of Your Voice-Controlled GPT Chat

As we conclude this step-by-step guide, let’s reflect on what we have achieved. We have successfully created a seamless and efficient voice-controlled GPT chat integration within Telegram using Pipedream to automate the entire process. This powerful tool can assist us in various ways, from drafting articles to composing emails.

Real-life use cases for our voice-controlled GPT chat include writing articles and crafting emails through an iterative process. By providing initial input and offering feedback for necessary revisions, we can efficiently create well-crafted messages with ease. Although attempts to conduct strategic marketing sessions using this bot have not yet yielded significant results, the potential for easy, voice-based communication is evident.

In conclusion, we encourage you to set up your own voice-controlled GPT chat using this guide. If you encounter any difficulties or need further assistance, please feel free to reach out through my website or the provided contact information. Give it a try and explore the possibilities of enhancing your communication experience with this innovative tool!

Feel free to copy this Pipedream workflow by simply clicking on this URL.