Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add you.com integration #514

Merged
merged 13 commits into from
Oct 26, 2023
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token
HF_API_ROOT=https://api-inference.huggingface.co/models

# used to activate search with web functionality. disabled if none are defined. choose one of the following:
YDC_API_KEY=#your docs.you.com api key here
SERPER_API_KEY=#your serper.dev api key here
SERPAPI_KEY=#your serpapi key here

Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ node_modules
!.env.template
vite.config.js.timestamp-*
vite.config.ts.timestamp-*
SECRET_CONFIG
SECRET_CONFIG
.idea
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ npm run dev

Chat UI features a powerful Web Search feature. It works by:

1. Generating an appropriate Google query from the user prompt.
2. Performing Google search and extracting content from webpages.
1. Generating an appropriate search query from the user prompt.
2. Performing web search and extracting content from webpages.
3. Creating embeddings from texts using [transformers.js](https://huggingface.co/docs/transformers.js). Specifically, using [Xenova/gte-small](https://huggingface.co/Xenova/gte-small) model.
4. From these embeddings, find the ones that are closest to the user query using vector similarity search. Specifically, we use `inner product` distance.
5. Get the corresponding texts to those closest embeddings and perform [Retrieval-Augmented Generation](https://huggingface.co/papers/2005.11401) (i.e. expand user prompt by adding those texts so that a LLM can use this information).
Expand Down Expand Up @@ -122,7 +122,7 @@ PUBLIC_APP_DISCLAIMER=

### Web Search config

You can enable the web search by adding either `SERPER_API_KEY` ([serper.dev](https://serper.dev/)) or `SERPAPI_KEY` ([serpapi.com](https://serpapi.com/)) to your `.env.local`.
You can enable the web search by adding any of `YDC_API_KEY` ([docs.you.com](https://docs.you.com)) or `SERPER_API_KEY` ([serper.dev](https://serper.dev/)) or `SERPAPI_KEY` ([serpapi.com](https://serpapi.com/)) to your `.env.local`.
sam-h-bean marked this conversation as resolved.
Show resolved Hide resolved

### Custom models

Expand Down Expand Up @@ -209,7 +209,7 @@ The following is the default `webSearchQueryPromptTemplate`.
```prompt
{{userMessageToken}}
My question is: {{message.content}}.
Based on the conversation history (my previous questions are: {{previousMessages}}), give me an appropriate query to answer my question for google search. You should not say more than query. You should not say any words except the query. For the context, today is {{currentDate}}
Based on the conversation history (my previous questions are: {{previousMessages}}), give me an appropriate query to answer my question for web search. You should not say more than query. You should not say any words except the query. For the context, today is {{currentDate}}
{{userMessageEndToken}}
{{assistantMessageToken}}
```
Expand Down
24 changes: 14 additions & 10 deletions src/lib/server/websearch/runWebSearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import {
} from "$lib/server/websearch/sentenceSimilarity";
import type { Conversation } from "$lib/types/Conversation";
import type { MessageUpdate } from "$lib/types/MessageUpdate";
import { getWebSearchProvider } from "./searchWeb";

const MAX_N_PAGES_SCRAPE = 10 as const;
const MAX_N_PAGES_EMBED = 5 as const;
Expand Down Expand Up @@ -39,14 +40,15 @@ export async function runWebSearch(

try {
webSearch.searchQuery = await generateQuery(messages);
appendUpdate("Searching Google", [webSearch.searchQuery]);
const searchProvider = getWebSearchProvider();
appendUpdate(`Searching ${searchProvider}`, [webSearch.searchQuery]);
const results = await searchWeb(webSearch.searchQuery);
webSearch.results =
(results.organic_results &&
results.organic_results.map((el: { title: string; link: string }) => {
const { title, link } = el;
results.organic_results.map((el: { title: string; link: string; text?: string }) => {
const { title, link, text } = el;
const { hostname } = new URL(link);
return { title, link, hostname };
return { title, link, hostname, text };
})) ??
[];
webSearch.results = webSearch.results
Expand All @@ -58,12 +60,14 @@ export async function runWebSearch(
appendUpdate("Browsing results");
const promises = webSearch.results.map(async (result) => {
const { link } = result;
let text = "";
try {
text = await parseWeb(link);
appendUpdate("Browsing webpage", [link]);
} catch (e) {
// ignore errors
let text = result.text ?? "";
if (!text) {
try {
text = await parseWeb(link);
appendUpdate("Browsing webpage", [link]);
} catch (e) {
// ignore errors
}
}
const MAX_N_CHUNKS = 100;
const texts = chunk(text, CHUNK_CAR_LEN).slice(0, MAX_N_CHUNKS);
Expand Down
43 changes: 40 additions & 3 deletions src/lib/server/websearch/searchWeb.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,26 @@
import { SERPAPI_KEY, SERPER_API_KEY } from "$env/static/private";

import type { YouWebSearch } from "../../types/WebSearch";
import { WebSearchProvider } from "../../types/WebSearch";
import { SERPAPI_KEY, SERPER_API_KEY, YDC_API_KEY } from "$env/static/private";
import { getJson } from "serpapi";
import type { GoogleParameters } from "serpapi";

// get which SERP api is providing web results
export function getWebSearchProvider() {
return YDC_API_KEY ? WebSearchProvider.YOU : WebSearchProvider.GOOGLE;
}

// Show result as JSON
export async function searchWeb(query: string) {
if (SERPER_API_KEY) {
return await searchWebSerper(query);
}
if (YDC_API_KEY) {
return await searchWebYouApi(query);
}
if (SERPAPI_KEY) {
return await searchWebSerpApi(query);
}
throw new Error("No Serper.dev or SerpAPI key found");
throw new Error("No You.com or Serper.dev or SerpAPI key found");
}

export async function searchWebSerper(query: string) {
Expand Down Expand Up @@ -59,3 +68,31 @@ export async function searchWebSerpApi(query: string) {

return response;
}

export async function searchWebYouApi(query: string) {
const response = await fetch(`https://api.ydc-index.io/search?query=${query}`, {
method: "GET",
headers: {
"X-API-Key": YDC_API_KEY,
"Content-type": "application/json; charset=UTF-8",
},
});

if (!response.ok) {
throw new Error(`You.com API returned error code ${response.status} - ${response.statusText}`);
}

const data = (await response.json()) as YouWebSearch;
const formattedResultsWithSnippets = data.hits
.map(({ title, url, snippets }) => ({
title,
link: url,
text: snippets?.join("\n") || "",
hostname: new URL(url).hostname,
}))
.sort((a, b) => b.text.length - a.text.length); // desc order by text length

return {
organic_results: formattedResultsWithSnippets,
};
}
19 changes: 19 additions & 0 deletions src/lib/types/WebSearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,28 @@ export interface WebSearchSource {
title: string;
link: string;
hostname: string;
text?: string; // You.com provides text of webpage right away
}

export type WebSearchMessageSources = {
type: "sources";
sources: WebSearchSource[];
};

export interface YouWebSearch {
hits: YouSearchHit[];
latency: number;
}

interface YouSearchHit {
url: string;
title: string;
description: string;
snippets: string[];
}

// eslint-disable-next-line no-shadow
export enum WebSearchProvider {
GOOGLE = "Google",
YOU = "You.com",
}
9 changes: 7 additions & 2 deletions src/routes/+layout.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ import { UrlDependency } from "$lib/types/UrlDependency";
import { defaultModel, models, oldModels, validateModel } from "$lib/server/models";
import { authCondition, requiresUser } from "$lib/server/auth";
import { DEFAULT_SETTINGS } from "$lib/types/Settings";
import { SERPAPI_KEY, SERPER_API_KEY, MESSAGES_BEFORE_LOGIN } from "$env/static/private";
import {
SERPAPI_KEY,
SERPER_API_KEY,
MESSAGES_BEFORE_LOGIN,
YDC_API_KEY,
} from "$env/static/private";

export const load: LayoutServerLoad = async ({ locals, depends, url }) => {
const { conversations } = collections;
Expand Down Expand Up @@ -82,7 +87,7 @@ export const load: LayoutServerLoad = async ({ locals, depends, url }) => {
ethicsModalAcceptedAt: settings?.ethicsModalAcceptedAt ?? null,
activeModel: settings?.activeModel ?? DEFAULT_SETTINGS.activeModel,
hideEmojiOnSidebar: settings?.hideEmojiOnSidebar ?? false,
searchEnabled: !!(SERPAPI_KEY || SERPER_API_KEY),
searchEnabled: !!(SERPAPI_KEY || SERPER_API_KEY || YDC_API_KEY),
customPrompts: settings?.customPrompts ?? {},
},
models: models.map((model) => ({
Expand Down