Ollama is an awesome piece of llama software that allows running AI models locally and interacting with them via an API. This API is wrapped nicely in this library.
Originally based on ollama api docs β commit
- It is a wrap around fetch to make it easier to use the ollama api.
- When prompting, it manages the context vector for continuous conversations.
- Use the stream or non-stream version of the prompt or chat method to get the response.
The only dependency (for node) is node-fetch
If you use node v17.5.0+ and the node --experimental-fetch filname.js flag, you no longer need to install fetch (finally)
IIEF & ES6 modules are available for node and the browser.
This library is written with ES6 modules. There is no require. Use this library in NodeJS by either using the .mjs file extension or "type": "module" in your package.json file.
npm install ollama-js-client
<script src="https://cdn.jsdelivr.net/npm/ollama-js-client/dist/browser/iife/ollama-js.global.js"></script>
<script
type="module"
src="https://cdn.jsdelivr.net/npm/ollama-js-client/dist/browser/index.js"
></script>
// use in the browser as a type="module" or in node with modules enabled (mjs)
import Ollama from "ollama-js-client";
// use global in the browser with script tag
const Ollama = window.OllamaJS;
-
Prompt with a text string.
-
"Chat" with an array of messages (like the OpenAI GPT API).
Prompt is a simplfied version of chat, that operates on a context vector that can be kept between calls (this library manages this automatically).
Chat has moved away from the context vector and now operates on an array of messages. This is a nicer way to interact with the API, allowing role (system,assistant,user) and the management of conversations in a more structured way.
-
Non-streaming - The whole response is returned at once.
-
Streaming - The response is returned in chunks.
Depending on the use case, you can use the stream or non-stream version of the prompt and chat method. If you don't mind waiting for the whole response, use the non-stream version. If you want to get the response in chunks, use the stream version.
Stream is useful when building UI, so the user can see the responses as they come in.
Using the chat or chat_request method and the OpenAI GPT API message format.
import Ollama from "ollama-js-client";
// default stream version
const ollama_instance = new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
});
ollama_instance.chat([
{ role: "system", content: "You are a llama AI" },
{ role: "assistant", content: "Hello, I am your llama AI friend." },
{ role: "user", content: "That's funny" },
], (error, response) => {
if (error) {
console.error(error);
} else {
console.log(response);
}
});
// non-stream
const response = await new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
}).chat_request([
{ role: "user", content: "Hello my ai friend" },
{ role: "assistant", content: "Hello, I am your llama AI friend." },
{ role: "user", content: "That's funny" },
]);
Using the prompt or prompt_stream method
Each prompt is a fetch request. The response (last response if stream) contains a context vector. The context vector is used to keep the conversation going. The context is sent with the next prompt request. The context is cleared with the clear method.
If you want to keep the conversation going with the context, you simple have to keep the reference to the instance.
import Ollama from "ollama-js-client";
const ollama_instance = new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
});
ollama_instance.prompt("Hello my llama AI friend");
// instance will keep the context vector
If you don't want to keep the context vector, you can just create a new instance for each prompt or clear the context vector with the .clear() method.
import Ollama from "ollama-js-client";
const response = await new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
}).prompt("Hello my ai friend");
// context vector is lost
// or clear the instance context vector
const ollama_instance = await new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
});
ollama_instance.prompt("Hello my llama AI friend");
ollama_instance.clear();
// context vector is lost/reset
This is the shortest way to use the module. It will create a new Ollama instance and call prompt on it. It will not carry the context vector between calls, as the "reference" to the instance is lost.
<model>@<url>
const response = await Ollama.from("llama3@http://127.0.0.1:11434/api/").prompt(
"Hello my ai friend"
);
console.log(response); // "I am not your friend, buddy."
Add more options to the constructor to get more control over the instance.
import Ollama from "ollama-js-client";
const response = await new Ollama({
model: "llama3",
url: "http://127.0.0.1:11434/api/",
options: {
temperature: 1,
top_p: 0.9,
top_k: 5,
system: "system prompt to (overrides what is defined in the Modelfile",
template:
"the full prompt or prompt template (overrides what is defined in the Modelfile)",
},
}).prompt("Hello my ai friend");
Keep the reference to the instance to carry the conversation vector (context) between calls.
import Ollama from "ollama-js-client";
const π¦ = new Ollama({
model:"llama3",
url:"http://127.0.0.1:11434/api/"
})
const response = await π¦.prompt("Hello my ai friend")
// next prompt will include the previous context vector
const next_response = await π¦.prompt("That's funny")
// clear the context vector
π¦.clear()
Use the stream version of the prompt method to get the response in chunks.
import Ollama from "ollama-js-client";
const π¦ = new Ollama({
model:"llama3",
url:"http://127.0.0.1:11434/api/",
})
const responded = []
// callback will be called with response chunks until error or response.done is true
const on_response = (error,response) => {
if (error) {
console.error(error)
}
else if (response.done) {
console.log(responded.join("\n"))
}
else {
responded.push(response)
}
}
π¦.prompt_stream("Hello",on_response)
Abort the request by calling the abort method on the instance.
import Ollama from "ollama-js-client";
const π¦ = new Ollama({
model:"llama3",
url:"http://127.0.0.1:11434/api/",
})
let responded = ""
// callback will be called with response chunks until error or response.done is true
const on_response = (error,response) => {
if (error) {
console.error(error)
}
else if (response.done) {
console.log(responded)
}
else {
responded += response
if (responded.length > 10) {
// abort the request on silly grounds
π¦.abort()
}
}
}
await π¦.prompt_stream("Hello",on_response)
<head>
<script src="https://cdn.jsdelivr.net/npm/ollama-js-client/browser/iife/ollama-js.global.js"></script>
</head>
<body>
<div id="output"></div>
<textarea id="input"></textarea>
<script>
const Ollama = window.OllamaJS;
function setup() {
const input = document.getElementById("input")
const output = document.getElementById("output")
const π¦ = new Ollama({
model:"llama3",
url:"http://127.0.0.1:11434/api/",
})
const on_response = (error,response) => {
if (error) {
console.error(error)
}
else if (response.done) {
// done!
}
else {
output.innerHTML += response
}
}
input.addEventListener("keyup",async (event) => {
if (event.key === "Enter") {
await π¦.prompt_stream(input.value,on_response)
input.value = ""
}
})
}
setup()
</script>
</body>
Getting JSON in return is awesome, but it is not always a pure JSON response, the JSONparser is a two step process, first it will try to parse the response as JSON, if that fails it will try to find the first valid json line and parse that until the end of a valid json object.
import JSONparser from 'ollama-js-client/JSONparser';
const π¦ = new Ollama({
model:"llama3",
url:"http://127.0.0.1:11434/api/",
})
const response = await π¦.prompt("Analyse this sentance and output the result in JSON: Hello there! It's nice to meet you. Is there anything I can help you with or would you like to chat? Please let me know if there's anything specific you'd like to talk about, and I'll do my best to assist you.")
const json = JSONparser(response.response)
console.log(json)
/*
{
type: 'response',
text: "Hello there! It's nice to meet you. Is there anything I can help you with or would you like to chat? Please let me know if there's anything specific you'd like to talk about, and I'll do my best to assist you.",
speaker: 'Assistant',
tone: 'neutral',
intent: {
helpfulness: 100,
positivity: 100,
respectfulness: 100,
honesty: 100
}
}
*/
The JSONStore is a simple way to store and retrieve data from a file in node or localStorage in the browser. It is the basic storage requirement for exploring the ollama api and keeping some state.
import JSONstore from 'ollama-js-client/JSONstore';
const store = new JSONstore("store.json");
store.set("key", "value");
store.get("key"); // "value"
store.has("key"); // true
store.delete("key");
// append to appendable values
// arrays can be appended to
// objects can be merged
// strings can be concatenated
// numbers can be added
store.append("my_array", ["initial value"]);
store.append("my_array", ["next value"]);
console.log(store.get("my_array")); // ["initial value", "next value"]
store.append("my_object", { key: "initial value" }});
store.append("my_object", { another_key: "next value" }});
console.log(store.get("my_object")); // { key: "initial value", another_key: "next value" }
store.append("my_string", "initial value");
store.append("my_string", "next value");
console.log(store.get("my_string")); // "initial valuenext value"
store.append("my_number", 1);
store.append("my_number", 2);
console.log(store.get("my_number")); // 3
// always save the store to persist changes
await store.save();
// in the browser; download the store as a file
store.download();
// clear the store in memory
store.clear();
// destory the store file (truncates)
await store.destory();
In node it is just a JSON file on disk, in the browser it is stored in localStorage and can be downloaded as a JSON file via store.download().
Happy llamaing!