samedi 16 mai 2020

Creating a simple assistant. How can I use the simple SpeechRecognition in Python to have an input for the Example Below?

This is a very loaded question but extremely detailed. Any help greatly appreciated.

This is the older code that had speech recognition of googles from Python...

def takeCommand():
    r = sr.Recognizer()

    with sr.Microphone() as source:

        print("Listening...")
        r.pause_threshold = .5
        audio = r.listen(source)

    try:
        print("Recognizing...")
        query = r.recognize_google(audio, language='en-us')
        print("User said: {query}\n")

    except Exception as e:
        print(e)
        speak("I can't hear you sir.")
        print("I can't hear you sir.")
        return "None"

    return query

These are some commands...

        query = takeCommand().lower()

        # All the commands said by user will be
        # stored here in 'query' and will be
        # converted to lower case for easily
        # recognition of command
        if 'wikipedia' in query:
            speak('Searching Wikipedia...')
            query = query.replace("wikipedia", "")
            results = wikipedia.summary(query, sentences=3)
            speak("According to Wikipedia")
            print(results)
            speak(results)

        elif 'open youtube' in query:
            speak("Here you go to Youtube\n")
            webbrowser.open("youtube.com")

        elif 'open google' in query:
            speak("Here you go to Google\n")
            webbrowser.open("google.com")

        elif 'open stackoverflow' in query:
            speak("Here is Stack Over Flow. Be sure to give me an upgrade!")
            webbrowser.open("stackoverflow.com")

Now, I am using IBM Cloud for the more advanced voice. Here is my current code for speaking to me within the Pycharm IDE...

from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import TextToSpeechV1
import vlc
import time

authenticator = IAMAuthenticator("API KEY")
text_to_speech = TextToSpeechV1(
    authenticator=authenticator
)

text_to_speech.set_service_url(
    'URL HERE')

with open('hello_world.wav', 'wb') as audio_file:
    audio_file.write(
        text_to_speech.synthesize(
            'Hello world',
            voice='en-US_AllisonV3Voice',
            accept='audio/wav'
        ).get_result().content)

    instance = vlc.Instance()

    # Create a MediaPlayer with the default instance
    player = instance.media_player_new()

    # Load the media file
    media = instance.media_new('hello_world.wav')

    # Add the media to the player
    player.set_media(media)

    # Play for 10 seconds then exit
    player.play()
    time.sleep(10)

It plays a generated audio file by IBM in the IDE and waits 10 seconds to then sleep.

I think I need to do something like this...

 with open('hello_world.wav', 'wb') as audio_file:
        audio_file.write(
            text_to_speech.synthesize(

And then have an input create that command with different wav file names. Like if "" is said, run the IBM text to speech and play this audio file

How can I create a similar system above where when I input with my voice a specific command, I can have it generate and play a different audio file. All the necessary code is above to reproduce. Thank you to anyone who can help me better structure this.

Aucun commentaire:

Enregistrer un commentaire