GitAI

The goal of GitAI is to extract knowledge from Git repositories with the use of AI/LLM (Large Language Models).

Motivation

Large organizations need to deal with massive number of git repositories (both internal and external). Those repositories can be hosted on different platforms (like GitHub and GitLab).

It is very difficult or even impossible to review all those repositories manually, especially if one needs to perform an exploratory search, not knowing the exact keywords that should be used.

Because of that the reusability of the knowledge (and code) hidden in the repositories is a constant challenge.

Solution

We propose the GitAI framework written in R.

It is applicable to multiple use cases related to extracting knowledge from Git repositories. At the same time, is IT infrastructure agnostic. It is designed to work with different backends, LLMs, embeddings models, and vector databases. Adapting to particular backends may need implementation of new classes, but the core functionality stays the same.

Workflow

Typical GitAI workflow looks like that:

Set up your project.
1. Set up your project scope (Git repositories).
2. Select content type of interest (files and file types).
3. Choose your LLM backend.
4. Define the LLM prompts.
5. (Optional) Choose embedding model and vector database provider.
Process content of all repositories with a single function call.
1. (Optional) If vector database is setup, the results will be stored there.
Use the information extracted from files content from git repositories.
1. (Optional) If results are stored in vector database, they can be searched using semantic search or used as a part of a RAG (Retrieval Augmented Generation) prompt.

Installation

You can install the development version of GitAI from GitHub with:

# install.packages("pak")
pak::pak("r-world-devs/GitAI")

Simplified example (without vector database usage)

library(GitAI)

Let’s set up a project fascinating_project that will extract some summaries from the content of the README.md files in the few selected git repositories.

options(ellmer_timeout_s = 120)
verbose_off()
my_project <- initialize_project("fascinating_project") |>
  set_github_repos(
    repos = c(
      "r-world-devs/GitStats", 
      "r-world-devs/GitAI", 
      "openpharma/DataFakeR"
    )
  ) |>
  add_files(files = "README.md") |>
  set_llm() |>
  set_prompt("Write one-sentence summary for a project based on given input.")

Now, let’s get the results and print them.

results <- process_repos(my_project)

purrr::walk(results, function(result) {
  result$text |> stringr::str_wrap(width = 80) |> cat("\n\n")
})
#> GitStats is an experimental R package that facilitates the extraction
#> and analysis of git data from GitHub and GitLab, providing insights into
#> repositories, commits, users, and R package usage in a structured format. 
#> 
#> GitAI is an R package that leverages AI and Large Language Models to extract
#> insights from GitHub or GitLab repositories, allowing users to define project
#> scopes, select relevant content, and process repositories efficiently in a
#> tidyverse-compliant manner. 
#> 
#> DataFakeR is an R package that enables users to generate synthetic datasets
#> while maintaining specified assumptions about the original data structure,
#> facilitating data simulation for testing and analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github		.github
R		R
devel		devel
inst/demo-app		inst/demo-app
man		man
renv		renv
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
.lintr		.lintr
DESCRIPTION		DESCRIPTION
GitAI.Rproj		GitAI.Rproj
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
project_metadata.yaml		project_metadata.yaml
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

GitAI

Motivation

Solution

Workflow

Installation

Simplified example (without vector database usage)

See also

About

Licenses found

Releases

Packages

Contributors 3

Languages

License

Licenses found

r-world-devs/GitAI

Folders and files

Latest commit

History

Repository files navigation

GitAI

Motivation

Solution

Workflow

Installation

Simplified example (without vector database usage)

See also

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages