rcoding.qmd

---
title: "chatGPT"
subtitle: "R 코딩개발"
author:
  - name: 이광춘
    url: https://www.linkedin.com/in/kwangchunlee/
    affiliation: 한국 R 사용자회
    affiliation-url: https://github.com/bit2r
title-block-banner: true
#title-block-banner: "#562457"
format:
  html:
    css: css/quarto.css
    theme: flatly
    code-fold: true
    code-overflow: wrap
    toc: true
    toc-depth: 3
    toc-title: 목차
    number-sections: true
    highlight-style: github    
    self-contained: false
filters:
   - lightbox
   - custom-callout.lua   
lightbox: auto
link-citations: yes
knitr:
  opts_chunk: 
    message: false
    warning: false
    collapse: true
    comment: "#>" 
    R.options:
      knitr.graphics.auto_pdf: true
editor_options: 
  chunk_output_type: console
---

# 코덱스(Codex)

[[Low-code and GPT-3: easier said than done with OpenAI Codex](https://medium.com/data-reply-it-datatech/low-code-and-gpt-3-easier-said-than-done-with-openai-codex-d3c1d4aebc8b)]{.aside}

**코덱스(Codex)**는 OpenAI에서 개발한 획기적인 AI 기술로, 사용자가 입력한 질문에 응답하여 사람과 유사한 텍스트를 생성할 수 있다. 
딥러닝 알고리즘을 사용하여 책, 기사 및 기타 형태의 서면 콘텐츠를 포함한 방대한 양의 텍스트 데이터를 분석하여 언어와 문맥에 대한 포괄적인 이해를 바탕으로 텍스트를 생성한다.코덱스는 챗봇, 가상 비서, 글쓰기 도구 등 다양한 애플리케이션과 통합하여 사용자에게 자연스럽고 유창한 텍스트 응답을 제공한다. 

코덱스의 프로그래밍 기능은 가장 강력한 애플리케이션 중 하나로 방대한 양의 코드와 문서를 분석하여 사용자가 제공한 
지시명령어(Prompt)에 응답하여 코드 스니펫(Snippet)을 생성할 수 있으므로 코드를 보다 효율적이고 정확하게 작성해야 하는 개발자에게 매우 유용한 도구로 자리잡아가고 있다.

코덱스는 파이썬, 자바, C++, R 등 다양한 프로그래밍 언어와 함께 사용할 수 있으며,
현재 코드 맥락(Context)에 따라 전체 함수나 클래스를 생성할 수 있으며, 구문적으로 정확하고 모범 사례(Best Practice)를 
따르는 코드를 제안함으로써 코드 작성에 필요한 시간과 노력을 줄여 개발자가 새로운 기능을 설계하거나 기존 기능을 개선하는 등 보다 복잡한 작업에 집중할 수 있게 한다.

코덱스의 프로그래밍 기능의 주요 이점 중 하나는 생성하는 코드의 의미와 목적을 이해한다는 점이다. 
즉, 작동할 뿐만 아니라 구조적으로 강건하고 읽기 쉬운 코드를 제안하여 오류를 줄이고 
코드베이스의 전반적인 품질을 개선하는 데 도움을 준다. 전반적으로 코덱스의 프로그래밍 기능은 개발자의 코드 작성 방식에 혁신을 가져올 잠재력을 가지고 있다. 

- 주석을 코드로 전환
- 맥락을 보고 다음 코드를 자동 작성
- 라이브러리, API 등 추천을 통해 새로운 지식 전달
- 주석 자동 추가
- 동일한 기능을 갖지면 효율성 좋은 코드로 변환


# `gpttools` 패키지

[`gpttools`](https://github.com/JamesHWade/gpttools)의 목표는 R 패키지 개발자가 초거대 언어 모델(LLM)을 프로젝트 
작업흐름에 보다 쉽게 통합할 수 있도록 [gptstudio](https://github.com/MichelNivard/gptstudio)를 확장하는 것이다. 
초거대 언어 모델은 지식 작업에 텍스트를 사용하는 데 있어 한 걸음 더 나아간 것으로 보이지만, 
초거대 언어 모델을 사용하는 데 따른 윤리적 영향도 신중하게 고려해야 한다. 
LLM(근간 모형, Foundation Model)의 윤리는 매우 활발하게 논의되고 있는 분야 중 하나다.

## 설치

[gpttools](https://github.com/JamesHWade/gpttools) GitHub 저장소에서 바로 설치한다.

``` r
require(remotes)
remotes::install_github("JamesHWade/gpttools")
```


# 사용방법

민감한 개인정보나 기타 데이터를 전송하지 않도록 주의한다. 

사용방법은 텍스트 혹은 코드를 커서로 선택하여 add-in 명령어를 사용하여 
OpenAI로 보내면 Codex가 이를 해석하여 최선의 답을 다시 전달해주는 간단한 구조로 되어 있다.


[See OpenAI's Terms of Use at <https://openai.com/terms> for more
details.]{.aside}

# 사전준비사항

1.   OpenAI 계정생성

2.  [OpenAI API key](https://beta.openai.com/account/api-keys) 생성

3.  RStudio 에서 `OPENAI_API_KEY` 값을 접근할 수 있도록 설정

직접적으로 추천하지는 않지만, 개발용도로 한번 쓰고 버릴 때 적합.

```{r}
#| eval: false
Sys.setenv(OPENAI_API_KEY = "<OPENAI_API_KEY>")
```

`.Renviron` 파일에 `OPENAI_API_KEY` 환경변수값을 참조하여 사용하는 방법을 추천.

```{r}
#| eval: false
require(usethis)
edit_r_environ(scope = "project")
```

GitHub/Gitlab을 사용할 때, `.Renviron` 파일 정보를 `.gitignore`에 추가하는 것을 잊지 않는다.

# 사용법

`gpttools`는 다음 4가지 addins 기능을 포함:

-   주석 달리(Comment code): uses code-davinci-edit-001 model from OpenAI to add
    comments to your code with the prompt: "add comments to each line of
    code, explaining what the code does"

-   roxygen 추가: uses text-davinci-003 model from OpenAI to add and fill
    out a roxygen skeleton to your highlight code (should be a function)
    with the prompt: "insert roxygen skeleton to document this function"

-   스크립트를 함수로 전환: uses code-davinci-edit-001 model from
    OpenAI to convert a highlighted script into a function with the
    prompt: "convert this R code into an R function"

-   `testthat` 단위테스트 작성: uses
    text-davinci-003 model from OpenAI to suggest a unit test for a
    selected function with the prompt: "Suggest a unit text for this
    function using the testthat package"

-   A freeform addins that let's you specify the prompt using the "edit"
    functionality of ChatGPT

`gpttools` addins은 `CMD/CTRL+SHIFT+P` 를 클릭하여 선택가능.


![](images/gpttools/gpttools-addins.png)

:::{.panel-tabset}

## 주석달기

<video src="https://user-images.githubusercontent.com/6314313/209890944-3d6a00fa-2d8c-4df7-8a11-f5a5ec3a1391.mov" data-canonical-src="https://user-images.githubusercontent.com/6314313/209890944-3d6a00fa-2d8c-4df7-8a11-f5a5ec3a1391.mov" controls="controls" muted="muted" class="d-block rounded-bottom-2 width-fit" style="max-height:640px;">

</video>

## Roxygen 추가

<video src="https://user-images.githubusercontent.com/6314313/209890939-ebd7afea-7d68-40b4-b482-b3fe51485ab1.mov" data-canonical-src="https://user-images.githubusercontent.com/6314313/209890939-ebd7afea-7d68-40b4-b482-b3fe51485ab1.mov" controls="controls" muted="muted" class="d-block rounded-bottom-2 width-fit" style="max-height:640px;">

</video>

## 스크립트 &rarr; 함수

<video src="https://user-images.githubusercontent.com/6314313/209890949-4da2bdd7-bcac-4769-9b11-7759b4abb760.mov" data-canonical-src="https://user-images.githubusercontent.com/6314313/209890949-4da2bdd7-bcac-4769-9b11-7759b4abb760.mov" controls="controls" muted="muted" class="d-block rounded-bottom-2 width-fit" style="max-height:640px;">

</video>

## 함수에 단위 테스트 추천

<video src="https://user-images.githubusercontent.com/6314313/209890959-fca623d9-5e8e-463c-ac64-80f3db9875d9.mov" data-canonical-src="https://user-images.githubusercontent.com/6314313/209890959-fca623d9-5e8e-463c-ac64-80f3db9875d9.mov" controls="controls" muted="muted" class="d-block rounded-bottom-2 width-fit" style="max-height:640px;">

</video>

:::

# 예측모형

펭귄데이터셋 암수성별을 분류예측모형을 개발하도록 지시명령어를 자세히 준비하여 chatGPT 
실행을 시킨다.

```{r}
#| eval: false
library(openai)

penguins_classification_instruction <- 
  glue::glue("# R language\n",
             "Build sex classification machine learning model withe palmer penguin datatset\n",
             "Use palmer penguins data package for dataset\n",
             "Use tidymodels framework\n",
             "Use random forest model\n",
             "Include evaluation metrics including accruacy, precision, reall")

build_model <- create_completion(
    model="code-davinci-002",
    prompt = penguins_classification_instruction,
    max_tokens=1024,
    openai_api_key = Sys.getenv("OPEN_AI_KEY")
)

parsed_code <- str_split(build_model$choices$text, "\n")[[1]]

write_lines(parsed_code, "palmer_penguins.Rmd")
```