Skip to content

I made this fun project to check how different Large Language Models (LLMs) perform in interview conditions. For the interviewer, I kept the same model but made changes in the candidate models.

License

Notifications You must be signed in to change notification settings

ktwillcode/AI-Agents-Interview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Interview Simulator (Agentic Simulation)

A fascinating experiment to evaluate how different Large Language Models (LLMs) perform in simulated job interviews. This project creates a controlled environment where multiple LLM models play the role of job candidates while being interviewed by a consistent interviewer model.

πŸ“‹ Table of Contents

πŸ”„ Process Flow

Overall Flow

flowchart TD
    Start([Start Simulation]) --> Config[Initialize Configuration]
    Config --> |Set Job Title| Setup[Setup Interview Environment]
    
    subgraph InitSetup [Initialization]
        Setup --> InitInterviewer[Initialize Interviewer Agent<br/>llama-3.1-8b-instant]
        Setup --> InitCandidates[Initialize Candidate Models]
    end
    
    InitCandidates --> |Model 1| C1[Candidate 1<br/>llama-3.1-8b-instant]
    InitCandidates --> |Model 2| C2[Candidate 2<br/>llama3-8b-8192]
    InitCandidates --> |Model 3| C3[Candidate 3<br/>mixtral-8x7b-32768]
    InitCandidates --> |Model 4| C4[Candidate 4<br/>gemma-7b-it]
    
    subgraph InterviewLoop [Interview Process]
        StartInt[Start Interview] --> GenQ[Generate Question]
        GenQ --> |Dynamic Question| GetResp[Get Candidate Response]
        GetResp --> |Store Response| UpdateHist[Update Interview History]
        UpdateHist --> |Check Questions| CheckCount{More Questions?}
        CheckCount --> |Yes| GenQ
        CheckCount --> |No| Evaluate
    end
    
    C1 & C2 & C3 & C4 --> StartInt
    
    subgraph Evaluation [Evaluation Process]
        Evaluate[Evaluate Candidate] --> GenScore[Generate Scores]
        GenScore --> GenStrength[Identify Strengths]
        GenStrength --> GenWeakness[Identify Weaknesses]
        GenWeakness --> GenTips[Generate Interview Tips]
        GenTips --> FinalEval[Final Evaluation Report]
    end
    
    subgraph Analysis [Comparative Analysis]
        FinalEval --> CompAnalysis[Compare All Candidates]
        CompAnalysis --> RankModels[Rank Model Performance]
        RankModels --> PatternAnalysis[Analyze Response Patterns]
        PatternAnalysis --> Recommendations[Generate Recommendations]
    end
    
    Recommendations --> SaveResults[Save Results to JSON]
    SaveResults --> End([End Simulation])
    
    style InitSetup fill:#e1f5fe,stroke:#01579b
    style InterviewLoop fill:#f3e5f5,stroke:#4a148c
    style Evaluation fill:#f1f8e9,stroke:#33691e
    style Analysis fill:#fff3e0,stroke:#e65100
    
    classDef process fill:#fff,stroke:#333,stroke-width:2px
    classDef decision fill:#fffde7,stroke:#f57f17,stroke-width:2px
    classDef endpoint fill:#006064,color:#fff,stroke:#00838f
    
    class Start,End endpoint
    class CheckCount decision
    class GenQ,GetResp,Evaluate,CompAnalysis process
Loading

1. System Setup Flow

flowchart LR
    A[Start] -->|Initialize| B[System Setup]
    B --> C[Load Models]
    B --> D[Configure Job Roles]
    C --> E[Setup Interviewer]
    C --> F[Setup Candidates]
    
    style A fill:#4CAF50,color:white
    style B fill:#2196F3,color:white
    style C fill:#9C27B0,color:white
    style D fill:#FF9800,color:white
    style E fill:#E91E63,color:white
    style F fill:#673AB7,color:white
Loading

2. Interview Process Flow

flowchart LR
    A[Start Interview] -->|Generate| B[Questions]
    B -->|Collect| C[Responses]
    C -->|Analyze| D[Feedback]
    D -->|Next Question| B
    D -->|Complete| E[Evaluation]
    
    style A fill:#4CAF50,color:white
    style B fill:#2196F3,color:white
    style C fill:#9C27B0,color:white
    style D fill:#FF9800,color:white
    style E fill:#E91E63,color:white
Loading

3. Evaluation Flow

flowchart LR
    A[Evaluation] -->|Generate| B[Scores]
    B -->|Identify| C[Strengths]
    B -->|Identify| D[Weaknesses]
    C --> E[Final Report]
    D --> E
    
    style A fill:#4CAF50,color:white
    style B fill:#2196F3,color:white
    style C fill:#9C27B0,color:white
    style D fill:#FF9800,color:white
    style E fill:#E91E63,color:white
Loading

🎯 Project Overview

This project simulates job interviews using different LLM models as candidates, while maintaining a consistent interviewer (Co-founder/CEO) model. The simulation:

  • πŸ“ Conducts structured interviews for various job positions
  • πŸ€– Uses different LLM models to simulate candidate responses
  • πŸ“Š Provides comprehensive evaluation and feedback
  • πŸ“ˆ Generates comparative analysis across different models
  • πŸ’‘ Offers practical interview improvement suggestions

πŸ€– Models Used

Interviewer

Model: groq/llama-3.1-8b-instant
Role: Co-founder and CEO
Purpose: Consistent evaluation across all interviews

Candidates

Model
groq/llama-3.1-8b-instant
groq/llama3-8b-8192
groq/mixtral-8x7b-32768
gemma-7b-it

πŸ”§ Installation

  1. Clone the repository:
git clone https://github.com/yourusername/llm-interview-simulator.git
cd llm-interview-simulator
  1. Create and activate virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
echo "GROQ_API_KEY=your_api_key_here" > .env

πŸ’» Usage

  1. Run the simulation:
python main.py
  1. Choose a job title:
job_titles = [
    "Marketing Associate",
    "Business Development Representative",
    "Product Manager",
    "Customer Success Representative",
    "Data Analyst",
    "AI Engineer"
]

πŸ“Š Features

Interview Process

  • πŸ”„ Dynamic question generation
  • πŸ’¬ Natural conversation flow
  • 🎯 Technical skill assessment
  • 🀝 Cultural fit evaluation

Evaluation Metrics

Metric Description
Decision Pass/Fail outcome
Score 0-100 numerical rating
Strengths Key positive attributes
Improvements Areas for development
Tips Interview improvement suggestions
Reasoning Detailed evaluation logic

πŸ“ Project Structure

llm-interview-simulator/
β”œβ”€β”€ πŸ“„ agents.py           # Agent definitions
β”œβ”€β”€ πŸ“„ tasks.py           # Interview tasks
β”œβ”€β”€ πŸ“„ interview_simulation.py  # Core logic
β”œβ”€β”€ πŸ“„ main.py           # Entry point
β”œβ”€β”€ πŸ“„ requirements.txt  # Dependencies
└── πŸ“„ README.md        # Documentation

Output Format

{
    "job_title": "AI Engineer",
    "interview_date": "2024-03-31 14:30:22",
    "candidates": {
        "candidate1": {
            "model": "groq/llama-3.1-8b-instant",
            "score": 85,
            "evaluation": "..."
        }
    },
    "comparative_analysis": "..."
}

🀝 Contributing

We welcome contributions! Areas for improvement:

  • πŸ“ New job positions
  • πŸ€– Additional LLM models
  • πŸ“Š Enhanced metrics
  • πŸ’‘ Feature additions

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • πŸ› οΈ Built with CrewAI framework
  • πŸ€– Powered by Groq's LLM models
  • πŸ’Ό Inspired by real-world interviews

⚠️ Disclaimer

This is an experimental project for research and educational purposes. The simulations should not be used as the sole basis for actual hiring decisions.

Key metrics displayed:

  • ⭐ Response quality scores
  • 🀝 Cultural fit assessment
  • 🧠 Technical knowledge evaluation
  • πŸ’¬ Communication style analysis

Made with ❀️ by KT

About

I made this fun project to check how different Large Language Models (LLMs) perform in interview conditions. For the interviewer, I kept the same model but made changes in the candidate models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages