- Aishwarya Ahuja
- Palak Jhamnani
- Paridhi Jain
- Shruti Chouhan
- Shoilayee Chaudhuri
- Vidhi Patidar
We aimed to develop an AI-based interviewer capable of asking topic-specific questions, analyzing the user's audio responses, and providing feedback. Additionally, it corrects grammatical errors and offers a reference answer to the user.
We scraped PDFs and websites containing interview question answers across various domains. This process resulted in a dataset comprising 3048 data points, structured as follows:
We explored various text-generation models like Alpaca Large, Mistralai (Mixtral-8x7B), Llama, and Google Gemma. Due to computational limitations and model sizes, we opted for Llama-2-7b-chat-hf. This choice was driven by its:
- Demonstrably good performance
- Efficient training times
- Ability to deliver results quickly
The script imports necessary libraries such as os
, tqdm
, torch
, librosa
, and gradio
for various tasks including file operations, progress tracking, numerical computations, audio processing, and visualization.
-
Launching Gradio with Wav2Vec2 Model: Gradio was launched with a Wav2Vec2 model from Hugging Face for audio recording.
-
Loading Pre-trained Whisper Model: We loaded a pre-trained Whisper model, which is a speech recognition model. This model transcribes audio files later in the script.
-
Calculating Speaking Pace: Speaking pace (words per minute) was calculated based on the duration of each audio file.
-
Providing Feedback: After processing all audio files, the script prints the collected speaking paces and offers feedback to the user by comparing them with the ideal speaking pace (140-160 words per minute).
We used Whisper, a state-of-the-art speech-to-text conversion tool, to transcribe spoken responses into text format. Whisper provides high accuracy and robustness, meeting our project's needs and enabling the system to process spoken inputs for analysis.
Following speech-to-text conversion, we used LanguageTool, a powerful grammar checking tool, to identify grammatical errors within the transcribed text. LanguageTool employs advanced algorithms to detect various types of grammatical mistakes, including punctuation errors, spelling errors, and syntactical inconsistencies. By integrating LanguageTool into our pipeline, we ensured that users receive accurate and grammatically correct responses.
We integrated Gemini AI, an advanced natural language processing engine, into our system to provide meaningful feedback to users. Gemini AI analyzes transcribed text, identifies strengths and weaknesses, and generates personalized feedback. Leveraging Gemini AI's capabilities, we offer actionable insights for improvement.
- Positives: Highlighting strengths or positive aspects.
- Negatives: Identifying areas for improvement or weaknesses.
- Suggestions for Improvement: Offering constructive recommendations.
- Action Words: Recommending impactful words for clarity and effectiveness in interviews.
BLEU Score: 0.47 F1 Score: 0.20
Before using this project, ensure that you have the following dependencies installed:
- tqdm
- TensorFlow
- PyTorch
- Gradio
- LanguageTool
- Gemini AI
- fastapi
- uvicorn
- pydantic
- scikit-learn
- requests
- os
- librosa
- whisper
- numpy
- pytube
- pathlib
- textwrap
- google.generativeai
- IPython
- transformers
- pefft
- trl
- locale
- loRA
- QLora
You can install the required dependencies by running:
pip install -r requirements.txt