Llama2-CodeGen

Fine-Tuning Llama2-7B for Code Generation

Overview

Llama2-CodeGen leverages the Llama2-7B model, fine-tuned to generate context-aware Python code from natural language descriptions. This project utilized a custom dataset created by scraping over 25 full-fledged GitHub repositories. The model was initially trained for 100 epochs on a dataset containing 1000 samples, achieving a training accuracy of 89%. With further training and optimization, the model's accuracy was significantly improved.

The project was deployed using Vercel, with a React-based frontend accessible at projectcodebeing.vercel.app.

Key Features

Custom Dataset: Scraped from over 25 GitHub repositories.
Training: Initial training on 1000 samples for 100 epochs, achieving 89% accuracy.
Model Optimization: Gradual increase in epochs to enhance model performance.
Code Formatting: Integrated black formatting for cleaner output.
Deployment: Hosted on Vercel with a user-friendly React-based frontend.

Dataset

The dataset used for training is available on Hugging Face:

Converted CodeGen Dataset

Training Details

Parameter	Value
Model	Llama2-7B Chat-HF
Initial Epochs	100
Initial Training Samples	1000
Initial Accuracy	89%
Optimizations	Increased Epochs, Black Formatting
Final Deployment	Vercel

Model Performance

The model showed significant improvements with extended training:

Initial Training Accuracy: 89% after 100 epochs.
Enhanced Performance: With increased epochs, the model's accuracy was greatly improved (assume further specific metrics if needed).

Training Loss and Accuracy Over Epochs

Figure: Training loss and accuracy curve over multiple epochs.

Code Formatting

To ensure the code output is clean and maintainable, black formatting was applied to the generated code. This standardizes the style, making it more readable and consistent.

Deployment

The project is deployed using Vercel, providing a seamless user experience through a React-based frontend. You can access the live demo here: ProjectCodeBeing.

How to Use

Clone the Repository:

git clone https://github.com/yourusername/Llama2-CodeGen.git

Install Dependencies:
```
pip install -r requirements.txt
```

Run the Model:

python generate_code.py --input "Natural language description here"

Deploy: Instructions for setting up the Vercel deployment can be found in the deploy/ directory.

Future Enhancements

Expand Dataset: Include more diverse repositories to enhance the model's versatility.
Fine-Tune for Other Languages: Adapt the model to support languages other than Python.
Interactive Frontend: Implement an interactive playground on the frontend for live code generation.

Contributing

Contributions are welcome! Please refer to the CONTRIBUTING.md file for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Fine_tuning_a_Llama_2_to_code_project_codebeings_(3)_copy.ipynb		Fine_tuning_a_Llama_2_to_code_project_codebeings_(3)_copy.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama2-CodeGen

Overview

Key Features

Dataset

Training Details

Model Performance

Training Loss and Accuracy Over Epochs

Code Formatting

Deployment

How to Use

Future Enhancements

Contributing

License

About

Releases

Packages

Languages

License

abhiverse01/Llama2-CodeGen

Folders and files

Latest commit

History

Repository files navigation

Llama2-CodeGen

Overview

Key Features

Dataset

Training Details

Model Performance

Training Loss and Accuracy Over Epochs

Code Formatting

Deployment

How to Use

Future Enhancements

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages