Speech Engineer

Job Category: Software Development
Job Type: Full Time
Job Location: Bengaluru

Gnani.ai aims to empower enterprises with AI-based speech technology.

Gnani.ai is an AI-based Speech Recognition and NLP Startup that is working on voice-based solutions for large businesses. AI is the biggest innovation that is disrupting the market and we are at the heart of this disruption. Funded by one of the largest global conglomerates in the world, and backed a number of market leaders in the tech industry.

We are working with some of the largest companies in the banking, insurance, e-commerce, and financial services sectors and we are not slowing down. With aggressive expansion plans, Gnani.ai aims to be the leader in the global market for voice-based solutions.

Gnani.ai is building the future for voice-based business solutions. If you are fascinated by AI and would like to work on the latest AI technologies in a high-intense, fast-growing and flexible work environment with immense growth opportunities, come and join us. We are looking for hard workers, who are ready to take on big challenges.

Speech Engineer

Gnani.ai is looking to hire Speech Engineers with 2 to 4 years of experience.


• Development of ASR engine using frameworks like ESPNET or FairSeq or Athena or Deep Speech using PyTorch or Tensorflow or Kaldi.
• Working on speech tech like Multilingual ASR, Contextual biasing, Text to Speech, Voice Biometric, speaker separation, and so on.
• Assist to define technology required for Speech Technology besides core engine and to design integration of these technologies.
• Work on improvement of adapting the model to multiple domains and channels.

Desired experience:

•  If fresher, projects should be in alignment with our domain.
•  Good understanding of signal processing, machine learning (ML) tools.
•  Should be well versed in classical speech processing methodologies like hidden Markov models (HMMs), Gaussian mixture models (GMMs), Artificial neural networks
(ANNs), Language modeling, etc.
•  Experience in working with low-latency and optimization techniques.
•  Understanding of traditional speech decoders.
•  Hands-on experience current deep learning (DL) techniques like Convolutional neural networks (CNNs), recurrent neural networks (RNNs), long-term short-term
memory (LSTM), connectionist temporal classification (CTC), Transformer, etc used for speech processing is essential.
•  The candidate should have hands-on experience and any of the end-to-end implementation of ASR tools such as ESPNET or FairSeq or Athena or Deep Speech
•  Hands-on PyTorch and Tensorflow and Kaldi experience are desirable.
•  Experience in techniques used for resolving issues related to accuracy, multiple noises.
•  Ability to implement recipes using scripting languages like bash.
•  Ability to develop applications using python, C++,
•  ML Ops and basic docker knowledge is good to have.


•  ML, Knowledge of Speech Recognition frameworks such as ESPNET or FairSeq or Athena or Deep Speech etc, Hands-on experience of deep learning (DL) techniques like
CNN, RNN, LSTM, etc.
•  Grapheme to Phonemes – DL-based and Non-DL-based, good to have.
•  Far-field speech technology – Speech Recognition, Voice Biometric – Using Beamforming techniques.

Apply for this position

Allowed Type(s): .pdf, .doc, .docx