diff --git a/README.md b/README.md
index d95c7430f9b..d7747585287 100644
--- a/README.md
+++ b/README.md
@@ -28,11 +28,11 @@ AutoGen is a framework that enables the development of LLM applications using mu
 
 ![AutoGen Overview](https://github.com/microsoft/autogen/blob/main/website/static/img/autogen_agentchat.png)
 
-- AutoGen enables building next-gen LLM applications based on **multi-agent conversations** with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
-- It supports **diverse conversation patterns** for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
+- AutoGen enables building next-gen LLM applications based on [multi-agent conversations](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
+- It supports [diverse conversation patterns](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#supporting-diverse-conversation-patterns) for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
   the number of agents, and agent conversation topology.
-- It provides a collection of working systems with different complexities. These systems span a **wide range of applications** from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
-- AutoGen provides **enhanced LLM inference**. It offers easy performance tuning, plus utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
+- It provides a collection of working systems with different complexities. These systems span a [wide range of applications](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#diverse-applications-implemented-with-autogen) from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
+- AutoGen provides [enhanced LLM inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification). It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
 
 AutoGen is powered by collaborative [research studies](https://microsoft.github.io/autogen/docs/Research) from Microsoft, Penn State University, and the University of Washington.
 
@@ -42,14 +42,14 @@ The easiest way to start playing is
 
     [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/autogen?quickstart=1)
 
- 2. Copy OAI_CONFIG_LIST_sample to /notebook folder, name to OAI_CONFIG_LIST, and set the correct config.
+ 2. Copy OAI_CONFIG_LIST_sample to ./notebook folder, name to OAI_CONFIG_LIST, and set the correct config.
  3. Start playing with the notebooks!
 
 
 
 ## Installation
 
-AutoGen requires **Python version >= 3.8**. It can be installed from pip:
+AutoGen requires **Python version >= 3.8, < 3.12**. It can be installed from pip:
 
 ```bash
 pip install pyautogen
@@ -72,7 +72,7 @@ For LLM inference configurations, check the [FAQs](https://microsoft.github.io/a
 
 ## Multi-Agent Conversation Framework
 
-Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans.
+Autogen enables the next-gen LLM applications with a generic [multi-agent conversation](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans.
 By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.
 
 Features of this use case include:
@@ -106,11 +106,13 @@ After the repo is cloned.
 The figure below shows an example conversation flow with AutoGen.
 ![Agent Chat Example](https://github.com/microsoft/autogen/blob/main/website/static/img/chat_example.png)
 
-Please find more [code examples](https://microsoft.github.io/autogen/docs/Examples/AutoGen-AgentChat) for this feature.
+Please find more [code examples](https://microsoft.github.io/autogen/docs/Examples/AgentChat) for this feature.
 
 ## Enhanced LLM Inferences
 
-Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers enhanced LLM inference with powerful functionalities like tuning, caching, error handling, and templating. For example, you can optimize generations by LLM with your own tuning data, success metrics, and budgets.
+Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers [enhanced LLM inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification) with powerful functionalities like caching, error handling, multi-config inference and templating.
+
+<!-- For example, you can optimize generations by LLM with your own tuning data, success metrics, and budgets.
 
 ```python
 # perform tuning for openai<1
@@ -127,7 +129,7 @@ config, analysis = autogen.Completion.tune(
 response = autogen.Completion.create(context=test_instance, **config)
 ```
 
-Please find more [code examples](https://microsoft.github.io/autogen/docs/Examples/AutoGen-Inference) for this feature.
+Please find more [code examples](https://microsoft.github.io/autogen/docs/Examples/Inference) for this feature. -->
 
 ## Documentation
 
diff --git a/autogen/oai/client.py b/autogen/oai/client.py
index 35705f2b0fc..b6035162104 100644
--- a/autogen/oai/client.py
+++ b/autogen/oai/client.py
@@ -17,7 +17,7 @@
 
     ERROR = None
 except ImportError:
-    ERROR = ImportError("please install openai>=1 and diskcache to use the autogen.oai subpackage.")
+    ERROR = ImportError("Please install openai>=1 and diskcache to use autogen.OpenAIWrapper.")
     OpenAI = object
 logger = logging.getLogger(__name__)
 if not logger.handlers:
diff --git a/autogen/oai/completion.py b/autogen/oai/completion.py
index 5f990e54d7c..88d53bca4c0 100644
--- a/autogen/oai/completion.py
+++ b/autogen/oai/completion.py
@@ -26,7 +26,10 @@
 
     ERROR = None
 except ImportError:
-    ERROR = ImportError("please install openai and diskcache to use the autogen.oai subpackage.")
+    ERROR = ImportError(
+        "(Deprecated) The autogen.Completion class requires openai<1 and diskcache. "
+        "Please switch to autogen.OpenAIWrapper for openai>=1."
+    )
     openai_Completion = object
 logger = logging.getLogger(__name__)
 if not logger.handlers:
@@ -567,6 +570,10 @@ def eval_func(responses, **data):
             dict: The optimized hyperparameter setting.
             tune.ExperimentAnalysis: The tuning results.
         """
+        logger.warning(
+            "tuning via Completion.tune is deprecated in pyautogen v0.2 and openai>=1. "
+            "flaml.tune supports tuning more generically."
+        )
         if ERROR:
             raise ERROR
         space = cls.default_search_space.copy()
@@ -775,6 +782,11 @@ def yes_or_no_filter(context, config, response):
                 - `config_id`: the index of the config in the config_list that is used to generate the response.
                 - `pass_filter`: whether the response passes the filter function. None if no filter is provided.
         """
+        logger.warning(
+            "Completion.create is deprecated in pyautogen v0.2 and openai>=1. "
+            "The new openai requires initiating a client for inference. "
+            "Please refer to https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification"
+        )
         if ERROR:
             raise ERROR
 
@@ -1159,6 +1171,12 @@ def start_logging(
                 while the compact history dict has a linear size.
             reset_counter (bool): whether to reset the counter of the number of API calls.
         """
+        logger.warning(
+            "logging via Completion.start_logging is deprecated in pyautogen v0.2. "
+            "logging via OpenAIWrapper will be added back in a future release."
+        )
+        if ERROR:
+            raise ERROR
         cls._history_dict = {} if history_dict is None else history_dict
         cls._history_compact = compact
         cls._count_create = 0 if reset_counter or cls._count_create is None else cls._count_create
diff --git a/notebook/agentchat_qdrant_RetrieveChat.ipynb b/notebook/agentchat_qdrant_RetrieveChat.ipynb
index 3a97007c5d9..b05848c1c5d 100644
--- a/notebook/agentchat_qdrant_RetrieveChat.ipynb
+++ b/notebook/agentchat_qdrant_RetrieveChat.ipynb
@@ -95,14 +95,14 @@
     "    {\n",
     "        'model': 'gpt-4',\n",
     "        'api_key': '<your Azure OpenAI API key here>',\n",
-    "        'api_base': '<your Azure OpenAI API base here>',\n",
+    "        'base_url': '<your Azure OpenAI API base here>',\n",
     "        'api_type': 'azure',\n",
     "        'api_version': '2023-06-01-preview',\n",
     "    },\n",
     "    {\n",
     "        'model': 'gpt-3.5-turbo',\n",
     "        'api_key': '<your Azure OpenAI API key here>',\n",
-    "        'api_base': '<your Azure OpenAI API base here>',\n",
+    "        'base_url': '<your Azure OpenAI API base here>',\n",
     "        'api_type': 'azure',\n",
     "        'api_version': '2023-06-01-preview',\n",
     "    },\n",
diff --git a/notebook/oai_openai_utils.ipynb b/notebook/oai_openai_utils.ipynb
index 82dc865ef8a..24973e3d908 100644
--- a/notebook/oai_openai_utils.ipynb
+++ b/notebook/oai_openai_utils.ipynb
@@ -38,7 +38,7 @@
     "assistant = AssistantAgent(\n",
     "    name=\"assistant\",\n",
     "    llm_config={\n",
-    "        \"request_timeout\": 600,\n",
+    "        \"timeout\": 600,\n",
     "        \"seed\": 42,\n",
     "        \"config_list\": config_list,\n",
     "        \"temperature\": 0,\n",
diff --git a/website/blog/2023-10-26-TeachableAgent/index.mdx b/website/blog/2023-10-26-TeachableAgent/index.mdx
index 434e0946b54..51c2e56a38b 100644
--- a/website/blog/2023-10-26-TeachableAgent/index.mdx
+++ b/website/blog/2023-10-26-TeachableAgent/index.mdx
@@ -51,7 +51,7 @@ from autogen.agentchat.contrib.teachable_agent import TeachableAgent
 # and OAI_CONFIG_LIST_sample
 filter_dict = {"model": ["gpt-4"]}  # GPT-3.5 is less reliable than GPT-4 at learning from user feedback.
 config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST", filter_dict=filter_dict)
-llm_config={"config_list": config_list, "request_timeout": 120}
+llm_config={"config_list": config_list, "timeout": 120}
 ```
 
 4. Create the agents
diff --git a/website/docs/Examples/AutoGen-AgentChat.md b/website/docs/Examples/AgentChat.md
similarity index 95%
rename from website/docs/Examples/AutoGen-AgentChat.md
rename to website/docs/Examples/AgentChat.md
index a9a813ae6c1..961c44d5836 100644
--- a/website/docs/Examples/AutoGen-AgentChat.md
+++ b/website/docs/Examples/AgentChat.md
@@ -1,4 +1,4 @@
-# AutoGen - Automated Multi Agent Chat
+# Automated Multi Agent Chat
 
 AutoGen offers conversable agents powered by LLM, tool or human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation via multi-agent conversation.
 Please find documentation about this feature [here](/docs/Use-Cases/agent_chat).
@@ -25,7 +25,7 @@ Links to notebook examples:
 
    - Automated Chess Game Playing & Chitchatting by GPT-4 Agents - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_chess.ipynb)
    - Automated Continual Learning from New Data - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_stream.ipynb)
-   - [OptiGuide](https://github.com/microsoft/optiguide) - Large Language Models for Supply Chain Optimization.
+   - [OptiGuide](https://github.com/microsoft/optiguide) - Coding, Tool Using, Safeguarding & Question Anwering for Supply Chain Optimization
 
 4. **Tool Use**
 
diff --git a/website/docs/Examples/AutoGen-Inference.md b/website/docs/Examples/Inference.md
similarity index 96%
rename from website/docs/Examples/AutoGen-Inference.md
rename to website/docs/Examples/Inference.md
index d68504a1c7c..ad608985ec4 100644
--- a/website/docs/Examples/AutoGen-Inference.md
+++ b/website/docs/Examples/Inference.md
@@ -1,4 +1,4 @@
-# AutoGen - Tune GPT Models
+# Tune GPT Models
 
 AutoGen also offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. The research study finds that tuning hyperparameters can significantly improve the utility of them.
 Please find documentation about this feature [here](/docs/Use-Cases/enhanced_inference).
diff --git a/website/docs/Getting-Started.md b/website/docs/Getting-Started.md
index fb16de2242a..63fc52f9455 100644
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@@ -8,11 +8,11 @@ AutoGen is a framework that enables development of LLM applications using multip
 
 ### Main Features
 
-* AutoGen enables building next-gen LLM applications based on **multi-agent conversations** with minimal effort. It simplifies the orchestration, automation and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcome their weaknesses.
-* It supports **diverse conversation patterns** for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
+- AutoGen enables building next-gen LLM applications based on [multi-agent conversations](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
+- It supports [diverse conversation patterns](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#supporting-diverse-conversation-patterns) for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
 the number of agents, and agent conversation topology.
-* It provides a collection of working systems with different complexities. These systems span a **wide range of applications** from various domains and complexities. They demonstrate how AutoGen can easily support different conversation patterns.
-* AutoGen provides **enhanced LLM inference**. It offers easy performance tuning, plus utilities like API unification & caching, and advanced usage patterns, such as error handling, multi-config inference, context programming etc.
+- It provides a collection of working systems with different complexities. These systems span a [wide range of applications](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#diverse-applications-implemented-with-autogen) from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
+- AutoGen provides [enhanced LLM inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification). It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
 
 AutoGen is powered by collaborative [research studies](/docs/Research) from Microsoft, Penn State University, and University of Washington.
 
@@ -40,7 +40,7 @@ user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stoc
 The figure below shows an example conversation flow with AutoGen.
 ![Agent Chat Example](/img/chat_example.png)
 
-* [Code examples](/docs/Examples/AutoGen-AgentChat).
+* [Code examples](/docs/Examples/AgentChat).
 * [Documentation](/docs/Use-Cases/agent_chat).
 
 #### Enhanced LLM Inferences
@@ -60,13 +60,13 @@ config, analysis = autogen.Completion.tune(
 response = autogen.Completion.create(context=test_instance, **config)
 ```
 
-* [Code examples](/docs/Examples/AutoGen-Inference).
+* [Code examples](/docs/Examples/Inference).
 * [Documentation](/docs/Use-Cases/enhanced_inference).
 
 ### Where to Go Next ?
 
 * Understand the use cases for [multi-agent conversation](/docs/Use-Cases/agent_chat) and [enhanced LLM inference](/docs/Use-Cases/enhanced_inference).
-* Find [code examples](/docs/Examples/AutoGen-AgentChat).
+* Find [code examples](/docs/Examples/AgentChat).
 * Read [SDK](/docs/reference/agentchat/conversable_agent/).
 * Learn about [research](/docs/Research) around AutoGen.
 * [Roadmap](https://github.com/orgs/microsoft/projects/989/views/3)
diff --git a/website/docs/Installation.md b/website/docs/Installation.md
index 2cacceda2c0..b9d892edb54 100644
--- a/website/docs/Installation.md
+++ b/website/docs/Installation.md
@@ -35,7 +35,7 @@ Now, you're ready to install AutoGen in the virtual environment you've just crea
 
 ## Python
 
-AutoGen requires **Python version >= 3.8**. It can be installed from pip:
+AutoGen requires **Python version >= 3.8, < 3.12**. It can be installed from pip:
 
 ```bash
 pip install pyautogen
@@ -49,6 +49,24 @@ or conda:
 conda install pyautogen -c conda-forge
 ``` -->
 
+### Migration guide to v0.2
+
+openai v1 is a total rewrite of the library with many breaking changes. For example, the inference requires instantiating a client, instead of using a global class method.
+Therefore, some changes are required for users of `pyautogen<0.2`.
+
+- `api_base` -> `base_url`, `request_timeout` -> `timeout` in `llm_config` and `config_list`. `max_retry_period` and `retry_wait_time` are deprecated. `max_retries` can be set for each client.
+- MathChat, RetrieveChat, and TeachableAgent are unsupported until they are tested in future release.
+- `autogen.Completion` and `autogen.ChatCompletion` are deprecated. The essential functionalities are moved to `autogen.OpenAIWrapper`:
+```python
+from autogen import OpenAIWrapper
+client = OpenAIWrapper(config_list=config_list)
+response = client.create(messages=[{"role": "user", "content": "2+2="}])
+print(client.extract_text_or_function_call(response))
+```
+- Inference parameter tuning and inference logging features are currently unavailable in `OpenAIWrapper`. Logging will be added in a future release.
+Inference parameter tuning can be done via [`flaml.tune`](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function).
+- `use_cache` is removed as a kwarg in `OpenAIWrapper.create()` for being automatically decided by `seed`: int | None.
+
 ### Optional Dependencies
 * docker
 
@@ -61,9 +79,9 @@ pip install docker
 
 * blendsearch
 
-AutoGen offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Please install with the [blendsearch] option to use it.
+`pyautogen<0.2` offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Please install with the [blendsearch] option to use it.
 ```bash
-pip install "pyautogen[blendsearch]"
+pip install "pyautogen[blendsearch]<0.2"
 ```
 
 Example notebooks:
@@ -72,9 +90,9 @@ Example notebooks:
 
 * retrievechat
 
-AutoGen supports retrieval-augmented generation tasks such as question answering and code generation with RAG agents. Please install with the [retrievechat] option to use it.
+`pyautogen<0.2` supports retrieval-augmented generation tasks such as question answering and code generation with RAG agents. Please install with the [retrievechat] option to use it.
 ```bash
-pip install "pyautogen[retrievechat]"
+pip install "pyautogen[retrievechat]<0.2"
 ```
 
 Example notebooks:
@@ -83,9 +101,9 @@ Example notebooks:
 
 * mathchat
 
-AutoGen offers an experimental agent for math problem solving. Please install with the [mathchat] option to use it.
+`pyautogen<0.2` offers an experimental agent for math problem solving. Please install with the [mathchat] option to use it.
 ```bash
-pip install "pyautogen[mathchat]"
+pip install "pyautogen[mathchat]<0.2"
 ```
 
 Example notebooks:
diff --git a/website/docs/Use-Cases/agent_chat.md b/website/docs/Use-Cases/agent_chat.md
index 9062c1b3690..d834b7f3248 100644
--- a/website/docs/Use-Cases/agent_chat.md
+++ b/website/docs/Use-Cases/agent_chat.md
@@ -99,7 +99,7 @@ The figure below shows six examples of applications built using AutoGen.
 
    - Automated Chess Game Playing & Chitchatting by GPT-4 Agents - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_chess.ipynb)
    - Automated Continual Learning from New Data - [View Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_stream.ipynb)
-   - [OptiGuide](https://github.com/microsoft/optiguide) - Large Language Models for Supply Chain Optimization.
+   - [OptiGuide](https://github.com/microsoft/optiguide) - Coding, Tool Using, Safeguarding & Question Anwering for Supply Chain Optimization
 
 4. **Tool Use**
 
diff --git a/website/docs/Use-Cases/enhanced_inference.md b/website/docs/Use-Cases/enhanced_inference.md
index 21ce42c3ca1..5b5a1e81101 100644
--- a/website/docs/Use-Cases/enhanced_inference.md
+++ b/website/docs/Use-Cases/enhanced_inference.md
@@ -114,6 +114,23 @@ When chat models are used and `prompt` is given as the input to `autogen.Complet
 
 `autogen.OpenAIWrapper.create()` can be used to create completions for both chat and non-chat models, and both OpenAI API and Azure OpenAI API.
 
+```python
+from autogen import OpenAIWrapper
+# OpenAI endpoint
+client = OpenAIWrapper()
+# ChatCompletion
+response = client.create(messages=[{"role": "user", "content": "2+2="}], model="gpt-3.5-turbo")
+# extract the response text
+print(client.extract_text_or_function_call(response))
+# Azure OpenAI endpoint
+client = OpenAIWrapper(api_key=..., base_url=..., api_version=..., api_type="azure")
+# Completion
+response = client.create(prompt="2+2=", model="gpt-3.5-turbo-instruct")
+# extract the response text
+print(client.extract_text_or_function_call(response))
+
+```
+
 For local LLMs, one can spin up an endpoint using a package like [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
 
 <!-- When only working with the chat-based models, `autogen.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages. -->
@@ -122,6 +139,18 @@ For local LLMs, one can spin up an endpoint using a package like [FastChat](http
 
 API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed" specified in `OpenAIWrapper.create()` or the constructor of `OpenAIWrapper`.
 
+```python
+client = OpenAIWrapper(seed=...)
+client.create(...)
+```
+
+```python
+client = OpenAIWrapper()
+client.create(seed=..., ...)
+```
+
+Caching is enabled by default with seed 41. To disable it please set `seed` to None.
+
 ## Error handling
 
 ### Runtime error
@@ -133,7 +162,7 @@ API call results are cached locally and reused when the same request is issued.
 - `retry_wait_time` (int): the time interval to wait (in seconds) before retrying a failed request.
 
 Moreover,  -->
-One can pass a list of configurations of different models/endpoints to mitigate the rate limits. For example,
+One can pass a list of configurations of different models/endpoints to mitigate the rate limits and other runtime error. For example,
 
 ```python
 client = OpenAIWrapper(
@@ -158,7 +187,7 @@ client = OpenAIWrapper(
 )
 ```
 
-It will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama2-chat-7B one by one,
+`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama2-chat-7B one by one,
 until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability.
 
 For convenience, we provide a number of utility functions to load config lists.
@@ -184,8 +213,10 @@ def valid_json_filter(response, **_):
             pass
     return False
 
-response = client.create(
+client = OpenAIWrapper(
     config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo-instruct"}, {"model": "text-davinci-003"}],
+)
+response = client.create(
     prompt="How to construct a json request to Bing API to search for 'latest AI news'? Return the JSON request.",
     filter_func=valid_json_filter,
 )