diff --git a/AudioQnA/deprecated/README.md b/AudioQnA/deprecated/README.md deleted file mode 100644 index a910c1d51..000000000 --- a/AudioQnA/deprecated/README.md +++ /dev/null @@ -1,272 +0,0 @@ -# AudioQnA - -![audioqna](./assets/img/audioqna.jpg) - -In this example we will show you how to build an Audio Question and Answering application (AudioQnA). AudioQnA serves like a talking bot, enabling LLMs to talk with users. It basically accepts users' audio inputs, converts to texts and feed to LLMs, gets the text answers and converts back to audio outputs. - -What AudioQnA is delivering and why it stands out: - -- Fast ASR/TTS inference as microservices on Intel Xeon CPUs with optimization -- Multilingual Zero-shot voice cloning cross languages, customizable voice -- Fast LLM inference on Intel Gaudi through TGI with RAG and other features support - -There are four folders under the current example. - -`front_end/`: the UI users interact with -`serving/`: TGI LLM service endpoint -`langchain/`: pipeline the flow of text input -> RAG -> TGI LLM service -> text output -`audio/`: pipeline the flow of audio-to-text service -> langchain -> text-to-audio service -> ui - -## Start the Audio services - -### Build ASR and TTS services - -```shell -cd audio/docker - -# Build ASR Docker service -docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -f Dockerfile_asr -t intel/gen-ai-examples:audioqna-asr -# Build TTS Docker service -docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -f Dockerfile_tts -t intel/gen-ai-examples:audioqna-tts -``` - -### Usage - -```shell -# Start ASR service -docker run -d -e http_proxy=$http_proxy -e https_proxy=$https_proxy -p 8008:8008 intel/gen-ai-examples:audioqna-asr - -# Test ASR -wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -http_proxy= curl -F 'file=@sample.wav' http://localhost:8008/v1/audio/transcriptions - -# Start TTS service -# Predownload local models and mapped in -git clone https://huggingface.co/lj1995/GPT-SoVITS pretrained_tts_models -docker run -d -v ./pretrained_tts_models:/GPT-SoVITS/GPT_SoVITS/pretrained_models -e http_proxy=$http_proxy -e https_proxy=$https_proxy -p 9880:9880 intel/gen-ai-examples:audioqna-tts --default_refer_path /GPT-SoVITS/sample.wav --default_refer_text="Who is Pat Gelsinger?" --default_refer_language="en" --bf16 --return_text_stream - -# Upload/Change reference audio -# http_proxy= curl --location 'localhost:9880/upload_as_default' \ -# --form 'default_refer_file=@"sample.wav"' \ -# --form 'default_refer_text="Who is Pat Gelsinger?"' \ -# --form 'default_refer_language="en"' - -# Test TTS -http_proxy= curl --location 'localhost:9880/v1/audio/speech' \ ---header 'Content-Type: application/json' \ ---data '{ - "text": "You can have a look, but you should not touch this item.", - "text_language": "en" -}' \ ---output output.wav -``` - -## Prepare TGI Docker - -Getting started is straightforward with the official Docker container. Simply pull the image using: - -```bash -docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 -``` - -Alternatively, you can build the Docker image yourself using latest [TGI-Gaudi](https://github.com/huggingface/tgi-gaudi) code with the below command: - -```bash -bash ./serving/tgi_gaudi/build_docker.sh -``` - -## Launch TGI Gaudi Service - -### Launch a local server instance on 1 Gaudi card: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh -``` - -For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\ to the docker run command above with a valid Hugging Face Hub read token. - -Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token. - -```bash -export HUGGINGFACEHUB_API_TOKEN= -``` - -### Launch a local server instance on 8 Gaudi cards: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh 8 -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -### Customize TGI Gaudi Service - -The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters: - -- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1. -- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080. -- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3". - -You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_LLM_ENDPOINT`: - -```bash -export TGI_LLM_ENDPOINT="http://xxx.xxx.xxx.xxx:8080" -``` - -## Enable TEI for embedding model - -Text Embeddings Inference (TEI) is a toolkit designed for deploying and serving open-source text embeddings and sequence classification models efficiently. With TEI, users can extract high-performance features using various popular models. It supports token-based dynamic batching for enhanced performance. - -To launch the TEI service, you can use the following commands: - -```bash -model=BAAI/bge-large-en-v1.5 -revision=refs/pr/5 -volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run -docker run -p 9090:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision -export TEI_ENDPOINT="http://xxx.xxx.xxx.xxx:9090" -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:9090/embed \ - -X POST \ - -d '{"inputs":"What is Deep Learning?"}' \ - -H 'Content-Type: application/json' -``` - -Note: If you want to integrate the TEI service into the LangChain application, you'll need to restart the LangChain backend service after launching the TEI service. - -## Launch Redis and LangChain Backend Service - -Update the `HUGGINGFACEHUB_API_TOKEN` environment variable with your huggingface token in the `docker-compose.yml` - -```bash -cd langchain/docker -docker compose -f docker-compose.yml up -d -cd ../../ -``` - -> [!NOTE] -> If you have modified any files and want that change to be introduced in this step, add `--build` to the end of the command to build the container image instead of pulling it from dockerhub. - -## Ingest data into Redis (Optional) - -Each time the Redis container is launched, data should be ingested into the container using the commands: - -```bash -docker exec -it qna-rag-redis-server bash -cd /ws -python ingest.py -exit -``` - -Note: `ingest.py` will download the embedding model. Please set the proxy if necessary. - -# Start LangChain Server - -## Enable GuardRails using Meta's Llama Guard model (Optional) - -We offer content moderation support utilizing Meta's [Llama Guard](https://huggingface.co/meta-llama/LlamaGuard-7b) model. To activate GuardRails, kindly follow the instructions below to deploy the Llama Guard model on TGI Gaudi. - -```bash -volume=$PWD/data -model_id="meta-llama/LlamaGuard-7b" -docker run -p 8088:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HUGGING_FACE_HUB_TOKEN= -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi --model-id $model_id -export SAFETY_GUARD_ENDPOINT="http://xxx.xxx.xxx.xxx:8088" -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8088/generate \ - -X POST \ - -d '{"inputs":"How do you buy a tiger in the US?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -## Start the Backend Service - -Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service: - -```bash -docker exec -it qna-rag-redis-server bash -nohup python app/server.py & -``` - -The LangChain backend service listens to port 8000, you can customize it by changing the code in `docker/qna-app/app/server.py`. - -And then you can make requests like below to check the LangChain backend service status: - -```bash -# non-streaming endpoint -curl 127.0.0.1:8000/v1/rag/chat \ - -X POST \ - -d '{"query":"What is the total revenue of Nike in 2023?"}' \ - -H 'Content-Type: application/json' -``` - -```bash -# streaming endpoint -curl 127.0.0.1:8000/v1/rag/chat_stream \ - -X POST \ - -d '{"query":"What is the total revenue of Nike in 2023?"}' \ - -H 'Content-Type: application/json' -``` - -## Start the Frontend Service - -Please refer to frontend [README](./front_end/README.md). - -## Enable TGI Gaudi FP8 for higher throughput (Optional) - -The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. Note that currently only Llama2 series and Mistral series models support FP8 quantization. Please follow the below steps to enable FP8 quantization. - -### Prepare Metadata for FP8 Quantization - -Enter into the TGI Gaudi docker container, and then run the below commands: - -```bash -pip install git+https://github.com/huggingface/optimum-habana.git -git clone https://github.com/huggingface/optimum-habana.git -cd optimum-habana/examples/text-generation -pip install -r requirements_lm_eval.txt -QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_measure.txt --model_name_or_path Intel/neural-chat-7b-v3-3 --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1 -QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_quant.txt --model_name_or_path Intel/neural-chat-7b-v3-3 --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1 --fp8 -``` - -After finishing the above commands, the quantization metadata will be generated. Move the metadata directory ./hqt_output/ and copy the quantization JSON file to the host (under …/data). Please adapt the commands with your Docker ID and directory path. - -```bash -docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/hqt_output data/ -docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/quantization_config/maxabs_quant.json data/ -``` - -Then modify the `dump_stats_path` to "/data/hqt_output/measure" and update `dump_stats_xlsx_path` to /data/hqt_output/measure/fp8stats.xlsx" in maxabs_quant.json file. - -### Restart the TGI Gaudi server within all the metadata mapped - -```bash -docker run -p 8080:80 -e QUANT_CONFIG=/data/maxabs_quant.json -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id Intel/neural-chat-7b-v3-3 -``` - -Now the TGI Gaudi will launch the FP8 model by default and you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -# - -SCRIPT USAGE NOTICE:  By downloading and using any script file included with the associated software package (such as files with .bat, .cmd, or .JS extensions, Docker files, or any other type of file that, when executed, automatically downloads and/or installs files onto your system) (the “Script File”), it is your obligation to review the Script File to understand what files (e.g.,  other software, AI models, AI Datasets) the Script File will download to your system (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, even if they are installed through a silent install, you agree to any and all terms and conditions associated with such files, including but not limited to, license terms, notices, or disclaimers. diff --git a/AudioQnA/deprecated/docker/Dockerfile_asr b/AudioQnA/deprecated/docker/Dockerfile_asr deleted file mode 100644 index 51bbf8c24..000000000 --- a/AudioQnA/deprecated/docker/Dockerfile_asr +++ /dev/null @@ -1,15 +0,0 @@ -FROM python:3.11-slim - -ENV LANG=C.UTF-8 - -# Install system dependencies -RUN apt-get update \ - && apt-get install -y ffmpeg - -COPY ./asr /asr -RUN pip install --no-cache-dir -r /asr/requirements.txt - -WORKDIR /asr - -ENTRYPOINT ["python", "asr_server.py"] - diff --git a/AudioQnA/deprecated/docker/Dockerfile_tts b/AudioQnA/deprecated/docker/Dockerfile_tts deleted file mode 100644 index 8edfc822e..000000000 --- a/AudioQnA/deprecated/docker/Dockerfile_tts +++ /dev/null @@ -1,37 +0,0 @@ -FROM python:3.9-slim - -ENV LANG=C.UTF-8 -ENV PYTHONPATH=/home/user:/GPT-SoVITS/GPT_SoVITS - -# Install system dependencies -RUN apt-get update \ - && apt-get install -y ffmpeg \ - && apt-get install -y build-essential wget numactl git \ - && apt-get install -y libomp-dev google-perftools - -ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libiomp5.so:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 -ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000" -ENV OMP_NUM_THREADS=56 - - -RUN git clone https://github.com/RVC-Boss/GPT-SoVITS.git /GPT-SoVITS -b main - -RUN pip install --no-cache-dir -r /GPT-SoVITS/requirements.txt - -COPY ./tts/tts_server.py /GPT-SoVITS/ -COPY ./tts/config.py /GPT-SoVITS/ - -# Download the sample ref wav -RUN wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -P /GPT-SoVITS -RUN wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/welcome_cn.wav -P /GPT-SoVITS - - -#RUN useradd -m -s /bin/bash user && \ -# mkdir -p /home/user && \ -# chown -R user /home/user/ - -#USER user - -WORKDIR /GPT-SoVITS - -ENTRYPOINT ["python", "tts_server.py"] diff --git a/AudioQnA/deprecated/docker/asr/asr.py b/AudioQnA/deprecated/docker/asr/asr.py deleted file mode 100644 index 17b4f456c..000000000 --- a/AudioQnA/deprecated/docker/asr/asr.py +++ /dev/null @@ -1,124 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import contextlib -import os -import time -import urllib.request - -import numpy as np -import torch -from datasets import Audio, Dataset -from pydub import AudioSegment -from transformers import WhisperForConditionalGeneration, WhisperProcessor - - -class AudioSpeechRecognition: - """Convert audio to text.""" - - def __init__(self, model_name_or_path="openai/whisper-small", bf16=False, language="english", device="cpu"): - if device == "hpu": - # Explicitly link HPU with Torch - from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi - - adapt_transformers_to_gaudi() - - self.device = device - asr_model_name_or_path = os.environ.get("ASR_MODEL_PATH", model_name_or_path) - print("Downloading model: {}".format(asr_model_name_or_path)) - self.model = WhisperForConditionalGeneration.from_pretrained(asr_model_name_or_path).to(self.device) - self.processor = WhisperProcessor.from_pretrained(asr_model_name_or_path) - self.model.eval() - self.bf16 = bf16 - if self.bf16: - import intel_extension_for_pytorch as ipex - - self.model = ipex.optimize(self.model, dtype=torch.bfloat16) - self.language = language - - if device == "hpu": - # do hpu graph warmup with a long enough input audio - # whisper has a receptive field of 30 seconds - # here we select a relatively long audio (~15 sec) to quickly warmup - self._warmup_whisper_hpu_graph("https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav") - - def _audiosegment_to_librosawav(self, audiosegment): - # https://github.com/jiaaro/pydub/blob/master/API.markdown#audiosegmentget_array_of_samples - # This way is faster than librosa.load or HuggingFace Dataset wrapper - channel_sounds = audiosegment.split_to_mono()[:1] # only select the first channel - samples = [s.get_array_of_samples() for s in channel_sounds] - - fp_arr = np.array(samples).T.astype(np.float32) - fp_arr /= np.iinfo(samples[0].typecode).max - fp_arr = fp_arr.reshape(-1) - - return fp_arr - - def _warmup_whisper_hpu_graph(self, url): - print("[ASR] fetch warmup audio...") - urllib.request.urlretrieve( - url, - "warmup.wav", - ) - print("[ASR] warmup...") - waveform = AudioSegment.from_file("warmup.wav").set_frame_rate(16000) - waveform = self._audiosegment_to_librosawav(waveform) - # pylint: disable=E1101 - inputs = self.processor.feature_extractor( - waveform, return_tensors="pt", sampling_rate=16_000 - ).input_features.to(self.device) - _ = self.model.generate(inputs, language="chinese") - - def audio2text(self, audio_path): - """Convert audio to text. - - audio_path: the path to the input audio, e.g. ~/xxx.mp3 - """ - start = time.time() - - try: - waveform = AudioSegment.from_file(audio_path).set_frame_rate(16000) - waveform = self._audiosegment_to_librosawav(waveform) - except Exception as e: - print(f"[ASR] audiosegment to librosa wave fail: {e}") - audio_dataset = Dataset.from_dict({"audio": [audio_path]}).cast_column("audio", Audio(sampling_rate=16000)) - waveform = audio_dataset[0]["audio"]["array"] - - # pylint: disable=E1101 - inputs = self.processor.feature_extractor( - waveform, return_tensors="pt", sampling_rate=16_000 - ).input_features.to(self.device) - with torch.cpu.amp.autocast() if self.bf16 else contextlib.nullcontext(): - predicted_ids = self.model.generate(inputs, language=self.language) - # pylint: disable=E1101 - result = self.processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0] - if self.language in ["chinese", "mandarin"]: - from zhconv import convert - - result = convert(result, "zh-cn") - print(f"generated text in {time.time() - start} seconds, and the result is: {result}") - return result - - -if __name__ == "__main__": - asr = AudioSpeechRecognition(language="english") - - # Test multilanguage asr - urllib.request.urlretrieve( - "https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav", - "sample.wav", - ) - asr.language = "chinese" - text = asr.audio2text("sample.wav") - - urllib.request.urlretrieve( - "https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav", - "sample.wav", - ) - text = asr.audio2text("sample.wav") - - os.remove("sample.wav") diff --git a/AudioQnA/deprecated/docker/asr/asr_server.py b/AudioQnA/deprecated/docker/asr/asr_server.py deleted file mode 100644 index 4eadb1c8e..000000000 --- a/AudioQnA/deprecated/docker/asr/asr_server.py +++ /dev/null @@ -1,69 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import argparse -import os - -import uvicorn -from asr import AudioSpeechRecognition -from fastapi import FastAPI, File, UploadFile -from fastapi.responses import Response -from pydub import AudioSegment -from starlette.middleware.cors import CORSMiddleware - -app = FastAPI() -asr = None - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -@app.get("/v1/health") -async def health() -> Response: - """Health check.""" - return Response(status_code=200) - - -@app.post("/v1/audio/transcriptions") -async def audio_to_text(file: UploadFile = File(...)): - file_name = file.filename - print(f"Received file: {file_name}") - with open("tmp_audio_bytes", "wb") as fout: - content = await file.read() - fout.write(content) - audio = AudioSegment.from_file("tmp_audio_bytes") - audio = audio.set_frame_rate(16000) - # bytes to wav - file_name = file_name + ".wav" - audio.export(f"{file_name}", format="wav") - try: - asr_result = asr.audio2text(file_name) - except Exception as e: - print(e) - asr_result = e - finally: - os.remove(file_name) - os.remove("tmp_audio_bytes") - return {"asr_result": asr_result} - - -if __name__ == "__main__": - parser = argparse.ArgumentParser() - parser.add_argument("--host", type=str, default="0.0.0.0") - parser.add_argument("--port", type=int, default=8008) - parser.add_argument("--model_name_or_path", type=str, default="openai/whisper-tiny") - parser.add_argument("--bf16", default=False, action="store_true") - parser.add_argument("--language", type=str, default="english") - parser.add_argument("--device", type=str, default="cpu") - - args = parser.parse_args() - asr = AudioSpeechRecognition( - model_name_or_path=args.model_name_or_path, bf16=args.bf16, language=args.language, device=args.device - ) - - uvicorn.run(app, host=args.host, port=args.port) diff --git a/AudioQnA/deprecated/docker/asr/requirements.txt b/AudioQnA/deprecated/docker/asr/requirements.txt deleted file mode 100644 index 3f75cf82e..000000000 --- a/AudioQnA/deprecated/docker/asr/requirements.txt +++ /dev/null @@ -1,11 +0,0 @@ ---extra-index-url https://download.pytorch.org/whl/cpu -datasets -fastapi -ffmpeg-python -numpy -pydub -python-multipart -torch==2.2.0 -transformers -uvicorn -zhconv diff --git a/AudioQnA/deprecated/docker/tts/config.py b/AudioQnA/deprecated/docker/tts/config.py deleted file mode 100644 index 6530f2017..000000000 --- a/AudioQnA/deprecated/docker/tts/config.py +++ /dev/null @@ -1,101 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# -# -# This script is adapted from -# https://github.com/RVC-Boss/GPT-SoVITS/blob/main/api.py -# which is under the MIT license -# -# Copyright (c) 2024 RVC-Boss -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. - -import os -import sys - -import torch - -sovits_path = "" -gpt_path = "" -is_half_str = os.environ.get("is_half", "True") -is_half = True if is_half_str.lower() == "true" else False -is_share_str = os.environ.get("is_share", "False") -is_share = True if is_share_str.lower() == "true" else False - -cnhubert_path = "GPT_SoVITS/pretrained_models/chinese-hubert-base" -bert_path = "GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large" -pretrained_sovits_path = "GPT_SoVITS/pretrained_models/s2G488k.pth" -pretrained_gpt_path = "GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt" - -exp_root = "logs" -python_exec = sys.executable or "python" -if torch.cuda.is_available(): - infer_device = "cuda" -else: - infer_device = "cpu" - -webui_port_main = 9874 -webui_port_uvr5 = 9873 -webui_port_infer_tts = 9872 -webui_port_subfix = 9871 - -api_port = 9880 - -if infer_device == "cuda": - gpu_name = torch.cuda.get_device_name(0) - if ( - ("16" in gpu_name and "V100" not in gpu_name.upper()) - or "P40" in gpu_name.upper() - or "P10" in gpu_name.upper() - or "1060" in gpu_name - or "1070" in gpu_name - or "1080" in gpu_name - ): - is_half = False - -if infer_device == "cpu": - is_half = False - use_bf16 = False - - -class Config: - def __init__(self): - self.sovits_path = sovits_path - self.gpt_path = gpt_path - self.is_half = is_half - self.use_bf16 = use_bf16 - - self.cnhubert_path = cnhubert_path - self.bert_path = bert_path - self.pretrained_sovits_path = pretrained_sovits_path - self.pretrained_gpt_path = pretrained_gpt_path - - self.exp_root = exp_root - self.python_exec = python_exec - self.infer_device = infer_device - - self.webui_port_main = webui_port_main - self.webui_port_uvr5 = webui_port_uvr5 - self.webui_port_infer_tts = webui_port_infer_tts - self.webui_port_subfix = webui_port_subfix - - self.api_port = api_port diff --git a/AudioQnA/deprecated/docker/tts/requirements.txt b/AudioQnA/deprecated/docker/tts/requirements.txt deleted file mode 100644 index 92fe063e9..000000000 --- a/AudioQnA/deprecated/docker/tts/requirements.txt +++ /dev/null @@ -1,28 +0,0 @@ -chardet -# funasr==1.0.0 -cn2an -# gradio==3.38.0 -# gradio_client==0.8.1 -ffmpeg-python -g2p_en -jieba -jieba_fast -LangSegment>=0.2.0 -# tensorboard -librosa==0.9.2 -numba==0.56.4 -numpy -psutil -pyopenjtalk -pypinyin -pytorch-lightning -PyYAML -scipy -# modelscope==1.10.0 -sentencepiece -torchaudio -# onnxruntime -tqdm -transformers -# Faster_Whisper -wordsegment diff --git a/AudioQnA/deprecated/docker/tts/tts_server.py b/AudioQnA/deprecated/docker/tts/tts_server.py deleted file mode 100644 index 6a3014867..000000000 --- a/AudioQnA/deprecated/docker/tts/tts_server.py +++ /dev/null @@ -1,741 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# -# -# This script is adapted from -# https://github.com/RVC-Boss/GPT-SoVITS/blob/main/api.py -# which is under the MIT license -# -# Copyright (c) 2024 RVC-Boss -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. - -import argparse -import base64 -import contextlib -import logging -import os -import re -import signal -import subprocess -import sys -from io import BytesIO -from time import time as ttime - -import config as global_config -import LangSegment -import librosa -import numpy as np -import soundfile as sf -import torch -import uvicorn -from AR.models.t2s_lightning_module import Text2SemanticLightningModule -from fastapi import FastAPI, File, Form, HTTPException, Request, UploadFile -from fastapi.responses import JSONResponse, StreamingResponse -from feature_extractor import cnhubert -from module.mel_processing import spectrogram_torch -from module.models import SynthesizerTrn -from my_utils import load_audio -from starlette.middleware.cors import CORSMiddleware -from text import cleaned_text_to_sequence -from text.cleaner import clean_text -from transformers import AutoModelForMaskedLM, AutoTokenizer - - -class DefaultRefer: - def __init__(self, path, text, language): - self.path = args.default_refer_path - self.text = args.default_refer_text - self.language = args.default_refer_language - - def is_ready(self) -> bool: - return is_full(self.path, self.text, self.language) - - -def is_empty(*items): - for item in items: - if item is not None and item != "": - return False - return True - - -def is_full(*items): - for item in items: - if item is None or item == "": - return False - return True - - -def change_sovits_weights(sovits_path): - global vq_model, hps - dict_s2 = torch.load(sovits_path, map_location="cpu") - hps = dict_s2["config"] - hps = DictToAttrRecursive(hps) - hps.model.semantic_frame_rate = "25hz" - vq_model = SynthesizerTrn( - hps.data.filter_length // 2 + 1, - hps.train.segment_size // hps.data.hop_length, - n_speakers=hps.data.n_speakers, - **hps.model, - ) - if "pretrained" not in sovits_path: - del vq_model.enc_q - if is_half: - vq_model = vq_model.half().to(device) - else: - vq_model = vq_model.to(device) - vq_model.eval() - vq_model.load_state_dict(dict_s2["weight"], strict=False) - - -def change_gpt_weights(gpt_path): - global hz, max_sec, t2s_model, config - hz = 50 - dict_s1 = torch.load(gpt_path, map_location="cpu") - config = dict_s1["config"] - max_sec = config["data"]["max_sec"] - t2s_model = Text2SemanticLightningModule(config, "****", is_train=False) - t2s_model.load_state_dict(dict_s1["weight"]) - if is_half: - t2s_model = t2s_model.half() - t2s_model = t2s_model.to(device) - t2s_model.eval() - total = sum([param.nelement() for param in t2s_model.parameters()]) - logger.info("Number of parameter: %.2fM" % (total / 1e6)) - - -def get_bert_feature(text, word2ph): - with torch.no_grad(): - inputs = tokenizer(text, return_tensors="pt") - for i in inputs: - inputs[i] = inputs[i].to(device) - res = bert_model(**inputs, output_hidden_states=True) - res = torch.cat(res["hidden_states"][-3:-2], -1)[0].cpu()[1:-1] - assert len(word2ph) == len(text) - phone_level_feature = [] - for i in range(len(word2ph)): - repeat_feature = res[i].repeat(word2ph[i], 1) - phone_level_feature.append(repeat_feature) - phone_level_feature = torch.cat(phone_level_feature, dim=0) - return phone_level_feature.T - - -def clean_text_inf(text, language): - phones, word2ph, norm_text = clean_text(text, language) - phones = cleaned_text_to_sequence(phones) - return phones, word2ph, norm_text - - -def get_bert_inf(phones, word2ph, norm_text, language): - language = language.replace("all_", "") - if language == "zh": - bert = get_bert_feature(norm_text, word2ph).to(device) - else: - bert = torch.zeros( - (1024, len(phones)), - dtype=torch.float16 if is_half else torch.float32, - ).to(device) - - return bert - - -def get_phones_and_bert(text, language): - if language in {"en", "all_zh", "all_ja"}: - language = language.replace("all_", "") - if language == "en": - LangSegment.setfilters(["en"]) - formattext = " ".join(tmp["text"] for tmp in LangSegment.getTexts(text)) - else: - formattext = text - while " " in formattext: - formattext = formattext.replace(" ", " ") - phones, word2ph, norm_text = clean_text_inf(formattext, language) - if language == "zh": - bert = get_bert_feature(norm_text, word2ph).to(device) - else: - bert = torch.zeros( - (1024, len(phones)), - dtype=torch.float16 if is_half else torch.float32, - ).to(device) - elif language in {"zh", "ja", "auto"}: - textlist = [] - langlist = [] - LangSegment.setfilters(["zh", "ja", "en", "ko"]) - if language == "auto": - for tmp in LangSegment.getTexts(text): - if tmp["lang"] == "ko": - langlist.append("zh") - textlist.append(tmp["text"]) - else: - langlist.append(tmp["lang"]) - textlist.append(tmp["text"]) - else: - for tmp in LangSegment.getTexts(text): - if tmp["lang"] == "en": - langlist.append(tmp["lang"]) - else: - langlist.append(language) - textlist.append(tmp["text"]) - - phones_list = [] - bert_list = [] - norm_text_list = [] - for i in range(len(textlist)): - lang = langlist[i] - phones, word2ph, norm_text = clean_text_inf(textlist[i], lang) - bert = get_bert_inf(phones, word2ph, norm_text, lang) - phones_list.append(phones) - norm_text_list.append(norm_text) - bert_list.append(bert) - bert = torch.cat(bert_list, dim=1) - phones = sum(phones_list, []) - norm_text = "".join(norm_text_list) - - return phones, bert.to(torch.float16 if is_half else torch.float32), norm_text - - -class DictToAttrRecursive: - def __init__(self, input_dict): - for key, value in input_dict.items(): - if isinstance(value, dict): - setattr(self, key, DictToAttrRecursive(value)) - else: - setattr(self, key, value) - - -def get_spepc(hps, filename): - audio = load_audio(filename, int(hps.data.sampling_rate)) - audio = torch.FloatTensor(audio) - audio_norm = audio - audio_norm = audio_norm.unsqueeze(0) - spec = spectrogram_torch( - audio_norm, - hps.data.filter_length, - hps.data.sampling_rate, - hps.data.hop_length, - hps.data.win_length, - center=False, - ) - return spec - - -def pack_audio(audio_bytes, data, rate): - if media_type == "ogg": - audio_bytes = pack_ogg(audio_bytes, data, rate) - elif media_type == "aac": - audio_bytes = pack_aac(audio_bytes, data, rate) - else: - audio_bytes = pack_raw(audio_bytes, data, rate) - - return audio_bytes - - -def pack_ogg(audio_bytes, data, rate): - with sf.SoundFile(audio_bytes, mode="w", samplerate=rate, channels=1, format="ogg") as audio_file: - audio_file.write(data) - - return audio_bytes - - -def pack_raw(audio_bytes, data, rate): - audio_bytes.write(data.tobytes()) - - return audio_bytes - - -def pack_wav(audio_bytes, rate): - data = np.frombuffer(audio_bytes.getvalue(), dtype=np.int16) - wav_bytes = BytesIO() - sf.write(wav_bytes, data, rate, format="wav") - - return wav_bytes - - -def pack_aac(audio_bytes, data, rate): - process = subprocess.Popen( - [ - "ffmpeg", - "-f", - "s16le", - "-ar", - str(rate), - "-ac", - "1", - "-i", - "pipe:0", - "-c:a", - "aac", - "-b:a", - "192k", - "-vn", - "-f", - "adts", - "pipe:1", - ], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - ) - out, _ = process.communicate(input=data.tobytes()) - audio_bytes.write(out) - - return audio_bytes - - -def read_clean_buffer(audio_bytes): - audio_chunk = audio_bytes.getvalue() - audio_bytes.truncate(0) - audio_bytes.seek(0) - - return audio_bytes, audio_chunk - - -def cut_text(text, punc): - text = re.escape(text) - punc_list = [",", ".", ";", "?", "!", "、", ",", "。", "?", "!", ";", ":", "…"] - if len(punc_list) > 0: - punds = r"[" + "".join(punc_list) + r"]" - text = text.strip("\n") - items = re.split(f"({punds})", text) - mergeitems = ["".join(group) for group in zip(items[::2], items[1::2])] - if len(items) % 2 == 1: - mergeitems.append(items[-1]) - text = "\n".join(mergeitems) - - while "\n\n" in text: - text = text.replace("\n\n", "\n") - - return text - - -def only_punc(text): - return not any(t.isalnum() or t.isalpha() for t in text) - - -def get_tts_wav(ref_wav_path, prompt_text, prompt_language, text, text_language): - t0 = ttime() - prompt_text = prompt_text.strip("\n") - prompt_language, text = prompt_language, text.strip("\n") - zero_wav = np.zeros(int(hps.data.sampling_rate * 0.3), dtype=np.float16 if is_half else np.float32) - with torch.no_grad(): - wav16k, sr = librosa.load(ref_wav_path, sr=16000) - wav16k = torch.from_numpy(wav16k) - zero_wav_torch = torch.from_numpy(zero_wav) - if is_half: - wav16k = wav16k.half().to(device) - zero_wav_torch = zero_wav_torch.half().to(device) - else: - wav16k = wav16k.to(device) - zero_wav_torch = zero_wav_torch.to(device) - wav16k = torch.cat([wav16k, zero_wav_torch]) - ssl_content = ssl_model.model(wav16k.unsqueeze(0))["last_hidden_state"].transpose(1, 2) # .float() - codes = vq_model.extract_latent(ssl_content) - prompt_semantic = codes[0, 0] - t1 = ttime() - prompt_language = dict_language[prompt_language.lower()] - text_language = dict_language[text_language.lower()] - phones1, bert1, norm_text1 = get_phones_and_bert(prompt_text, prompt_language) - texts = text.split("\n") - audio_bytes = BytesIO() - - for text in texts: - if only_punc(text): - continue - - audio_opt = [] - phones2, bert2, norm_text2 = get_phones_and_bert(text, text_language) - bert = torch.cat([bert1, bert2], 1) - - all_phoneme_ids = torch.LongTensor(phones1 + phones2).to(device).unsqueeze(0) - bert = bert.to(device).unsqueeze(0) - all_phoneme_len = torch.tensor([all_phoneme_ids.shape[-1]]).to(device) - prompt = prompt_semantic.unsqueeze(0).to(device) - # import intel_extension_for_pytorch as ipex - # ipex.optimize(t2s_model.model) - # from torch import profiler - t2 = ttime() - with torch.no_grad(): - # with profiler.profile(record_shapes=True) as prof: - # with profiler.record_function("model_inference"): - with ( - torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16, cache_enabled=True) - if use_bf16 - else contextlib.nullcontext() - ): - pred_semantic, idx = t2s_model.model.infer_panel( - all_phoneme_ids, - all_phoneme_len, - prompt, - bert, - # prompt_phone_len=ph_offset, - top_k=config["inference"]["top_k"], - early_stop_num=hz * max_sec, - ) - # print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10)) - t3 = ttime() - pred_semantic = pred_semantic[:, -idx:].unsqueeze(0) - refer = get_spepc(hps, ref_wav_path) - if is_half: - refer = refer.half().to(device) - else: - refer = refer.to(device) - audio = ( - vq_model.decode(pred_semantic, torch.LongTensor(phones2).to(device).unsqueeze(0), refer) - .detach() - .cpu() - .numpy()[0, 0] - ) - audio_opt.append(audio) - audio_opt.append(zero_wav) - t4 = ttime() - audio_bytes = pack_audio( - audio_bytes, (np.concatenate(audio_opt, 0) * 32768).astype(np.int16), hps.data.sampling_rate - ) - logger.info("%.3f\t%.3f\t%.3f\t%.3f" % (t1 - t0, t2 - t1, t3 - t2, t4 - t3)) - if stream_mode == "normal": - audio_bytes, audio_chunk = read_clean_buffer(audio_bytes) - yield audio_chunk - - if not stream_mode == "normal": - if media_type == "wav": - audio_bytes = pack_wav(audio_bytes, hps.data.sampling_rate) - yield audio_bytes.getvalue() - - -def handle_control(command): - if command == "restart": - os.execl(g_config.python_exec, g_config.python_exec, *sys.argv) - elif command == "exit": - os.kill(os.getpid(), signal.SIGTERM) - exit(0) - - -def handle_change(path, text, language): - if is_empty(path, text, language): - return JSONResponse( - {"code": 400, "message": 'missing any of the following parameters: "path", "text", "language"'}, - status_code=400, - ) - - if path != "" or path is not None: - default_refer.path = path - if text != "" or text is not None: - default_refer.text = text - if language != "" or language is not None: - default_refer.language = language - - logger.info(f"current default reference audio path: {default_refer.path}") - logger.info(f"current default reference audio text: {default_refer.text}") - logger.info(f"current default reference audio language: {default_refer.language}") - logger.info(f"is_ready: {default_refer.is_ready()}") - - return JSONResponse({"code": 0, "message": "Success"}, status_code=200) - - -def text_stream_generator(result): - """Embed the unicode byte values to base64 and yield the text stream with data prefix. - - Accepts a generator of bytes - Returns a generator of string - """ - for bytes in result: - data = base64.b64encode(bytes) - yield f"data: {data}\n\n" - yield "data: [DONE]\n\n" - - -def handle(refer_wav_path, prompt_text, prompt_language, text, text_language, cut_punc): - if ( - refer_wav_path == "" - or refer_wav_path is None - or prompt_text == "" - or prompt_text is None - or prompt_language == "" - or prompt_language is None - ): - refer_wav_path, prompt_text, prompt_language = ( - default_refer.path, - default_refer.text, - default_refer.language, - ) - if not default_refer.is_ready(): - return JSONResponse({"code": 400, "message": "unspecified refer audio!"}, status_code=400) - - if cut_punc is None: - text = cut_text(text, default_cut_punc) - else: - text = cut_text(text, cut_punc) - - if not return_text_stream: - return StreamingResponse( - get_tts_wav(refer_wav_path, prompt_text, prompt_language, text, text_language), - media_type="audio/" + media_type, - ) - else: - result = get_tts_wav(refer_wav_path, prompt_text, prompt_language, text, text_language) - - return StreamingResponse(text_stream_generator(result), media_type="text/event-stream") - - -# -------------------------------- -# Initialization part -# -------------------------------- -now_dir = os.getcwd() -sys.path.append(now_dir) -sys.path.append("%s/GPT_SoVITS" % (now_dir)) - -dict_language = { - "中文": "all_zh", - "英文": "en", - "日文": "all_ja", - "中英混合": "zh", - "日英混合": "ja", - "多语种混合": "auto", - "all_zh": "all_zh", - "en": "en", - "all_ja": "all_ja", - "zh": "zh", - "ja": "ja", - "auto": "auto", -} - -logging.config.dictConfig(uvicorn.config.LOGGING_CONFIG) -logger = logging.getLogger("uvicorn") - -g_config = global_config.Config() - -parser = argparse.ArgumentParser(description="GPT-SoVITS api") - -parser.add_argument("-s", "--sovits_path", type=str, default=g_config.sovits_path, help="SoVITS model path") -parser.add_argument("-g", "--gpt_path", type=str, default=g_config.gpt_path, help="GPT model path") -parser.add_argument("-dr", "--default_refer_path", type=str, default="", help="default reference audio path") -parser.add_argument("-dt", "--default_refer_text", type=str, default="", help="default reference audio text") -parser.add_argument("-dl", "--default_refer_language", type=str, default="", help="default reference audio language") -parser.add_argument("-d", "--device", type=str, default=g_config.infer_device, help="cuda / cpu") -parser.add_argument("-a", "--bind_addr", type=str, default="0.0.0.0", help="default: 0.0.0.0") -parser.add_argument("-p", "--port", type=int, default=g_config.api_port, help="default: 9880") -parser.add_argument( - "-fp", "--full_precision", action="store_true", default=False, help="overwrite config.is_half, use fp32" -) -parser.add_argument( - "-hp", "--half_precision", action="store_true", default=False, help="overwrite config.is_half, use fp16" -) -# Here add an argument for specifying torch.bfloat16 inference on Xeon CPU -parser.add_argument("-bf16", "--bf16", action="store_true", default=False, help="use bfloat16") -parser.add_argument( - "-sm", "--stream_mode", type=str, default="close", help="streaming response, close / normal / keepalive" -) -parser.add_argument("-mt", "--media_type", type=str, default="wav", help="media type, wav / ogg / aac") -parser.add_argument("-cp", "--cut_punc", type=str, default="", help="text splitter, among ,.;?!、,。?!;:…") -parser.add_argument( - "-hb", "--hubert_path", type=str, default=g_config.cnhubert_path, help="overwrite config.cnhubert_path" -) -parser.add_argument("-b", "--bert_path", type=str, default=g_config.bert_path, help="overwrite config.bert_path") -# Here add an argument to decide whether to return text/event-stream base64 encoded bytes to frontend -# rather than audio bytes -parser.add_argument( - "-rts", - "--return_text_stream", - action="store_true", - default=False, - help="whether to return text/event-stream base64 encoded bytes to frontend", -) - -args = parser.parse_args() -sovits_path = args.sovits_path -gpt_path = args.gpt_path -device = args.device -port = args.port -host = args.bind_addr -cnhubert_base_path = args.hubert_path -bert_path = args.bert_path -default_cut_punc = args.cut_punc -return_text_stream = args.return_text_stream - -# Set default reference configuration -default_refer = DefaultRefer(args.default_refer_path, args.default_refer_text, args.default_refer_language) - -# Check model paths -if sovits_path == "": - sovits_path = g_config.pretrained_sovits_path - logger.warn(f"Unspecified SOVITS model path, fallback to current path: {sovits_path}") -if gpt_path == "": - gpt_path = g_config.pretrained_gpt_path - logger.warn(f"Unspecified GPT model path, fallback to current path: {gpt_path}") - -if default_refer.path == "" or default_refer.text == "" or default_refer.language == "": - default_refer.path, default_refer.text, default_refer.language = "", "", "" - logger.info("Unspecified default refer audio") -else: - logger.info(f"default refer audio path: {default_refer.path}") - logger.info(f"default refer audio text: {default_refer.text}") - logger.info(f"default refer audio language: {default_refer.language}") - -# deal with half precision -if device == "cuda": - is_half = g_config.is_half - use_bf16 = False - if args.full_precision: - is_half = False - if args.half_precision: - is_half = True - if args.full_precision and args.half_precision: - is_half = g_config.is_half # fallback to fp32 - logger.info(f"fp16 half: {is_half}") -else: - is_half = False - use_bf16 = g_config.use_bf16 - if args.full_precision: - use_bf16 = False - elif args.bf16: - use_bf16 = True - - logger.info(f"bf16 half: {use_bf16}") - -# stream response mode -if args.stream_mode.lower() in ["normal", "n"]: - stream_mode = "normal" - logger.info("stream response mode enabled") -else: - stream_mode = "close" - -# media type -if args.media_type.lower() in ["aac", "ogg"]: - media_type = args.media_type.lower() -elif stream_mode == "close": - media_type = "wav" -else: - media_type = "ogg" -logger.info(f"media type: {media_type}") - -# Initialize the model -cnhubert.cnhubert_base_path = cnhubert_base_path -tokenizer = AutoTokenizer.from_pretrained(bert_path) -bert_model = AutoModelForMaskedLM.from_pretrained(bert_path) -ssl_model = cnhubert.get_model() -if is_half: - bert_model = bert_model.half().to(device) - ssl_model = ssl_model.half().to(device) -else: - bert_model = bert_model.to(device) - ssl_model = ssl_model.to(device) -change_sovits_weights(sovits_path) -change_gpt_weights(gpt_path) - - -# -------------------------------- -# APIs -# -------------------------------- -app = FastAPI() - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -@app.post("/set_model") -async def set_model(request: Request): - json_post_raw = await request.json() - global gpt_path - gpt_path = json_post_raw.get("gpt_model_path") - global sovits_path - sovits_path = json_post_raw.get("sovits_model_path") - logger.info("gptpath" + gpt_path + ";vitspath" + sovits_path) - change_sovits_weights(sovits_path) - change_gpt_weights(gpt_path) - return "ok" - - -@app.post("/control") -async def control_req(request: Request): - json_post_raw = await request.json() - return handle_control(json_post_raw.get("command")) - - -@app.get("/control") -async def control(command: str = None): - return handle_control(command) - - -@app.post("/change_refer") -async def change_refer_req(request: Request): - json_post_raw = await request.json() - return handle_change( - json_post_raw.get("refer_wav_path"), json_post_raw.get("prompt_text"), json_post_raw.get("prompt_language") - ) - - -@app.get("/change_refer") -async def change_refer(refer_wav_path: str = None, prompt_text: str = None, prompt_language: str = None): - return handle_change(refer_wav_path, prompt_text, prompt_language) - - -@app.post("/v1/audio/speech") -async def tts_endpoint_req(request: Request): - json_post_raw = await request.json() - return handle( - json_post_raw.get("refer_wav_path"), - json_post_raw.get("prompt_text"), - json_post_raw.get("prompt_language"), - json_post_raw.get("text"), - json_post_raw.get("text_language"), - json_post_raw.get("cut_punc"), - ) - - -@app.get("/v1/audio/speech") -async def tts_endpoint( - refer_wav_path: str = None, - prompt_text: str = None, - prompt_language: str = None, - text: str = None, - text_language: str = None, - cut_punc: str = None, -): - return handle(refer_wav_path, prompt_text, prompt_language, text, text_language, cut_punc) - - -@app.post("/upload_as_default") -async def upload_audio( - default_refer_file: UploadFile = File(...), - default_refer_text: str = Form(...), - default_refer_language: str = Form(...), -): - if not default_refer_file or not default_refer_file or not default_refer_language: - return JSONResponse( - {"code": 400, "message": "reference audio, text and language must be provided!"}, status_code=400 - ) - name = default_refer_file.filename - - if name.endswith(".mp3") or name.endswith(".wav"): - # temp file location - tmp_file_location = f"/tmp/{name}" - with open(tmp_file_location, "wb+") as f: - f.write(default_refer_file.file.read()) - logger.info(f"reference audio saved at {tmp_file_location}!") - return handle_change(path=tmp_file_location, text=default_refer_text, language=default_refer_language) - else: - return JSONResponse({"code": 400, "message": "audio name invalid!"}, status_code=400) - - -if __name__ == "__main__": - uvicorn.run(app, host=host, port=port, workers=1) diff --git a/AudioQnA/deprecated/langchain/docker/Dockerfile b/AudioQnA/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index df06c732f..000000000 --- a/AudioQnA/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,38 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - -FROM langchain/langchain:latest - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev - -# RUN useradd -m -s /bin/bash user && \ -# mkdir -p /home/user && \ -# chown -R user /home/user/ - -# USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir --upgrade pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt - -ENV PYTHONPATH=$PYTHONPATH:/ws:/home/user:/home/user/qna-app/app - -WORKDIR /home/user/qna-app -COPY qna-app /home/user/qna-app - -ENTRYPOINT ["/usr/bin/sleep", "infinity"] diff --git a/AudioQnA/deprecated/langchain/docker/docker-compose.yml b/AudioQnA/deprecated/langchain/docker/docker-compose.yml deleted file mode 100644 index be552a2a8..000000000 --- a/AudioQnA/deprecated/langchain/docker/docker-compose.yml +++ /dev/null @@ -1,32 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -services: - redis-vector-db: - image: redis/redis-stack:7.2.0-v9 - container_name: redis-vector-db - ports: - - "6379:6379" - - "8001:8001" - qna-rag-redis-server: - build: - args: - https_proxy: ${https_proxy} - dockerfile: Dockerfile - image: intel/gen-ai-examples:qna-rag-redis-server - container_name: qna-rag-redis-server - environment: - - https_proxy=${https_proxy} - - HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} - - "REDIS_PORT=6379" - - "EMBED_MODEL=BAAI/bge-base-en-v1.5" - - "REDIS_SCHEMA=schema_dim_768.yml" - ulimits: - memlock: - soft: -1 # Set memlock to unlimited (no soft or hard limit) - hard: -1 - volumes: - - ../redis:/ws - - ../test:/test - network_mode: "host" diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/Dockerfile b/AudioQnA/deprecated/langchain/docker/qna-app/Dockerfile deleted file mode 100644 index caac655e2..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/Dockerfile +++ /dev/null @@ -1,25 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -FROM python:3.11-slim - -RUN pip install --no-cache-dir poetry==1.6.1 - -RUN poetry config virtualenvs.create false - -WORKDIR /code - -COPY ./pyproject.toml ./README.md ./poetry.lock* ./ - -COPY ./package[s] ./packages - -RUN poetry install --no-interaction --no-ansi --no-root - -COPY ./app ./app - -RUN poetry install --no-interaction --no-ansi - -EXPOSE 8080 - -CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"] diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/README.md b/AudioQnA/deprecated/langchain/docker/qna-app/README.md deleted file mode 100644 index c76e0d1af..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/README.md +++ /dev/null @@ -1,79 +0,0 @@ -# my-app - -## Installation - -Install the LangChain CLI if you haven't yet - -```bash -pip install -U langchain-cli -``` - -## Adding packages - -```bash -# adding packages from -# https://github.com/langchain-ai/langchain/tree/master/templates -langchain app add $PROJECT_NAME - -# adding custom GitHub repo packages -langchain app add --repo $OWNER/$REPO -# or with whole git string (supports other git providers): -# langchain app add git+https://github.com/hwchase17/chain-of-verification - -# with a custom api mount point (defaults to `/{package_name}`) -langchain app add $PROJECT_NAME --api_path=/my/custom/path/rag -``` - -Note: you remove packages by their api path - -```bash -langchain app remove my/custom/path/rag -``` - -## Setup LangSmith (Optional) - -LangSmith will help us trace, monitor and debug LangChain applications. -LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). -If you don't have access, you can skip this section - -```shell -export LANGCHAIN_TRACING_V2=true -export LANGCHAIN_API_KEY= -export LANGCHAIN_PROJECT= # if not specified, defaults to "default" -``` - -## Launch LangServe - -```bash -langchain serve -``` - -## Running in Docker - -This project folder includes a Dockerfile that allows you to easily build and host your LangServe app. - -### Building the Image - -To build the image, you simply: - -```shell -docker build . -t my-langserve-app -``` - -If you tag your image with something other than `my-langserve-app`, -note it for use in the next step. - -### Running the Image Locally - -To run the image, you'll need to include any environment variables -necessary for your application. - -In the below example, we inject the `OPENAI_API_KEY` environment -variable with the value set in my local environment -(`$OPENAI_API_KEY`) - -We also expose port 8080 with the `-p 8080:8080` option. - -```shell -docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -p 8080:8080 my-langserve-app -``` diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/app/__init__.py b/AudioQnA/deprecated/langchain/docker/qna-app/app/__init__.py deleted file mode 100644 index c495d1896..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/app/__init__.py +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/app/guardrails.py b/AudioQnA/deprecated/langchain/docker/qna-app/app/guardrails.py deleted file mode 100644 index f3a0b614a..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/app/guardrails.py +++ /dev/null @@ -1,90 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -unsafe_categories = """O1: Violence and Hate. -Should not -- Help people plan or engage in violence. -- Encourage people to engage in violence. -- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability. -- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics. -Can -- Provide information on violence and discrimination. -- Discuss the topics of hate and violence, and can discuss historical events involving violence. -O2: Sexual Content. -Should not -- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities. -Can -- Discuss topics related to sexuality and sexual education. -- Provide general information regarding sexual activities. -- Provide guidance on topics related to sexual health. -O3: Criminal Planning. -Should not -- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them. -- Encourage people to engage in criminal activities or condone criminal behavior. -Can -- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes. -- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems. -O4: Guns and Illegal Weapons. -Should not -- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them. -- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior. -Can -- Discuss firearms and the arguments for and against firearm ownership. -O5: Regulated or Controlled Substances. -Should not -- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis. -- Assist or encourage people to create such substances when it is illegal to do so. -Can -- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are). -- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation. -O6: Self-Harm. -Should not -- Encourage people to harm themselves. -- Romanticize or condone self-harm or suicide. -- Provide information regarding the methods of suicide or self-harm. -- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves. -Can -- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions). -Should -- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.""" - -unsafe_dict = { - "O1": "Violence and Hate", - "O2": "Sexual Content", - "O3": "Criminal Planning", - "O4": "Guns and Illegal Weapons", - "O5": "Regulated or Controlled Substances", - "O6": "Self-Harm", -} - - -def moderation_prompt_for_chat(chat): - # For simplicity, we assume the chat dictionary correctly alternates "user" and "assistant" roles - # Feel free to include error checking if needed - conversation = [turn["content"] for turn in chat] - is_agent = len(conversation) % 2 == 0 - role = "Agent" if len(conversation) % 2 == 0 else "User" - prompt = f"""[INST] Task: Check if there is unsafe content in '{role}' messages in conversations according our safety policy with the below categories. - - -{unsafe_categories} - - - - -""" - - # Alternate User/Agent turns, inserting 2 newlines between each - for i, m in enumerate(conversation): - role = "User" if i % 2 == 0 else "Agent" - prompt += f"{role}: {m}\n\n" - prompt += f""" - -Provide your safety assessment for {role} in the above conversation: -- First line must read 'safe' or 'unsafe'. -- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]""" - return prompt diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/app/prompts.py b/AudioQnA/deprecated/langchain/docker/qna-app/app/prompts.py deleted file mode 100644 index a798f3437..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/app/prompts.py +++ /dev/null @@ -1,54 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder - -# ========= Raw Q&A template prompt ========= -template = """### System:\n\n - You are an assistant chatbot. You answer questions. \ - If you don't know the answer, just say that you don't know. \ - Use three sentences maximum and keep the answer concise.\ -### User:\n{question}\n### Assistant:\n""" -prompt = ChatPromptTemplate.from_template(template) - - -# ========= contextualize prompt ========= -contextualize_q_system_prompt = """Given a chat history and the latest user question \ -which might reference context in the chat history, formulate a standalone question \ -which can be understood without the chat history. Do NOT answer the question, \ -just reformulate it if needed and otherwise return it as is.""" -contextualize_q_prompt = ChatPromptTemplate.from_messages( - [ - ("system", contextualize_q_system_prompt), - MessagesPlaceholder(variable_name="chat_history"), - ("human", "{question}"), - ] -) - - -# ========= Q&A with history prompt ========= -# qa_system_prompt = """You are an assistant for question-answering tasks. \ -# Use the following pieces of retrieved context to answer the question. \ -# If you don't know the answer, just say that you don't know. \ -# Use three sentences maximum and keep the answer concise.\ - -# {context}""" -# qa_prompt = ChatPromptTemplate.from_messages( -# [ -# ("system", qa_system_prompt), -# MessagesPlaceholder(variable_name="chat_history"), -# ("human", "{question}"), -# ] -# ) -template = """### System:\n\n - You are an assistant chatbot. You answer questions. \ -Use the following pieces of retrieved context to answer the question. \ -If you don't know the answer, just say that you don't know. \ -Use three sentences maximum and keep the answer concise.\ -{context} -### User:\n{question}\n### Assistant:\n""" -qa_prompt = ChatPromptTemplate.from_template(template) diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/app/server.py b/AudioQnA/deprecated/langchain/docker/qna-app/app/server.py deleted file mode 100644 index 5c6ce047c..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/app/server.py +++ /dev/null @@ -1,322 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - -from fastapi import APIRouter, FastAPI, File, Request, UploadFile -from fastapi.responses import JSONResponse, RedirectResponse, StreamingResponse -from guardrails import moderation_prompt_for_chat, unsafe_dict -from langchain.globals import set_debug, set_verbose -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.vectorstores import Redis -from langchain_core.messages import HumanMessage -from langchain_core.output_parsers import StrOutputParser -from langchain_core.runnables import RunnablePassthrough -from langserve import add_routes -from prompts import contextualize_q_prompt, prompt, qa_prompt -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL -from starlette.middleware.cors import CORSMiddleware -from utils import ( - create_kb_folder, - create_retriever_from_files, - create_retriever_from_links, - get_current_beijing_time, - post_process_text, - reload_retriever, -) - -set_verbose(True) -set_debug(True) - - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -class RAGAPIRouter(APIRouter): - - def __init__(self, upload_dir, entrypoint, safety_guard_endpoint, tei_endpoint=None) -> None: - super().__init__() - self.upload_dir = upload_dir - self.entrypoint = entrypoint - self.safety_guard_endpoint = safety_guard_endpoint - print( - f"[rag - router] Initializing API Router, params:\n \ - upload_dir={upload_dir}, entrypoint={entrypoint}" - ) - - # Define LLM - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - # for NeuralChatEndpoint: - """ - self.llm = NeuralChatEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - """ - if self.safety_guard_endpoint: - self.llm_guard = HuggingFaceEndpoint( - endpoint_url=safety_guard_endpoint, - max_new_tokens=100, - top_k=1, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - ) - print("[rag - router] LLM initialized.") - - # Define LLM Chain - if tei_endpoint: - # create embeddings using TEI endpoint service - self.embeddings = HuggingFaceHubEmbeddings(model=tei_endpoint) - else: - # create embeddings using local embedding model - self.embeddings = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - try: - rds = Redis.from_existing_index( - self.embeddings, - index_name=INDEX_NAME, - redis_url=REDIS_URL, - schema=INDEX_SCHEMA, - ) - retriever = rds.as_retriever(search_type="mmr") - except Exception as e: - print( - "[rag - chat] Initializing Redis RAG failure, will skip RAG and fallback to normal chat in the chain!" - ) - retriever = None - # Define contextualize chain - # self.contextualize_q_chain = contextualize_q_prompt | self.llm | StrOutputParser() - self.contextualize_q_chain = prompt | self.llm | StrOutputParser() - - # Define LLM chain - if retriever: - self.llm_chain = ( - RunnablePassthrough.assign(context=self.contextualized_question | retriever) | qa_prompt | self.llm - ) - else: - self.llm_chain = RunnablePassthrough.assign(context=self.contextualized_question) | prompt | self.llm - print("[rag - router] LLM chain initialized.") - - # Define chat history - self.chat_history = [] - - def contextualized_question(self, input: dict): - if input.get("chat_history"): - return self.contextualize_q_chain - else: - return input["question"] - - def handle_rag_chat(self, query: str): - response = self.llm_chain.invoke({"question": query, "chat_history": self.chat_history}) - # response = self.llm_chain.invoke({"question": query}) - result = response.split("")[0] - self.chat_history.extend([HumanMessage(content=query), response]) - # output guardrails - if self.safety_guard_endpoint: - response_output_guard = self.llm_guard( - moderation_prompt_for_chat("Agent", f"User: {query}\n Agent: {response}") - ) - if "unsafe" in response_output_guard: - policy_violation_level = response_output_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - return policy_violations + " are found in the output" - else: - return result.lstrip() - return result.lstrip() - - -upload_dir = os.getenv("RAG_UPLOAD_DIR", "./upload_dir") -tgi_llm_endpoint = os.getenv("TGI_LLM_ENDPOINT", "http://localhost:8080") -safety_guard_endpoint = os.getenv("SAFETY_GUARD_ENDPOINT") -tei_embedding_endpoint = os.getenv("TEI_ENDPOINT") -router = RAGAPIRouter(upload_dir, tgi_llm_endpoint, safety_guard_endpoint, tei_embedding_endpoint) - - -@router.post("/v1/rag/chat") -async def rag_chat(request: Request): - params = await request.json() - print(f"[rag - chat] POST request: /v1/rag/chat, params:{params}") - query = params["query"] - kb_id = params.get("knowledge_base_id", "default") - print(f"[rag - chat] history: {router.chat_history}") - - # prompt guardrails - if router.safety_guard_endpoint: - response_input_guard = router.llm_guard(moderation_prompt_for_chat("User", query)) - if "unsafe" in response_input_guard: - policy_violation_level = response_input_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - return f"Violated policies: {policy_violations}, please check your input." - - if kb_id == "default": - print("[rag - chat] use default knowledge base") - new_index_name = INDEX_NAME - elif kb_id.startswith("kb"): - new_index_name = INDEX_NAME + kb_id - print(f"[rag - chat] use knowledge base {kb_id}, index name is {new_index_name}") - else: - return JSONResponse(status_code=400, content={"message": "Wrong knowledge base id."}) - - try: - retriever = reload_retriever(router.embeddings, new_index_name) - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - except Exception as e: - print("[rag - chat] Initializing Redis RAG failure, will skip RAG and fallback to normal chat in the chain!") - return router.handle_rag_chat(query=query) - - -@router.post("/v1/rag/chat_stream") -async def rag_chat_stream(request: Request): - params = await request.json() - print(f"[rag - chat_stream] POST request: /v1/rag/chat_stream, params:{params}") - query = params["query"] - kb_id = params.get("knowledge_base_id", "default") - print(f"[rag - chat_stream] history: {router.chat_history}") - - # prompt guardrails - if router.safety_guard_endpoint: - response_input_guard = router.llm_guard(moderation_prompt_for_chat("User", query)) - if "unsafe" in response_input_guard: - policy_violation_level = response_input_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - - def generate_content(): - content = f"Violated policies: {policy_violations}, please check your input." - yield f"data: {content}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(generate_content(), media_type="text/event-stream") - - if kb_id == "default": - print("[rag - chat] use default knowledge base") - new_index_name = INDEX_NAME - elif kb_id.startswith("kb"): - new_index_name = INDEX_NAME + kb_id - print(f"[rag - chat] use knowledge base {kb_id}, index name is {new_index_name}") - else: - return JSONResponse(status_code=400, content={"message": "Wrong knowledge base id."}) - - try: - retriever = reload_retriever(router.embeddings, new_index_name) - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - except Exception as e: - print("[rag - chat] Initializing Redis RAG failure, will skip RAG and fallback to normal chat in the chain!") - - def stream_generator(): - chat_response = "" - for text in router.llm_chain.stream({"question": query, "chat_history": router.chat_history}): - # for text in router.llm_chain.stream({"question": query}): - chat_response += text - processed_text = post_process_text(text) - if text is not None: - yield processed_text - chat_response = chat_response.split("")[0] - print(f"[rag - chat_stream] stream response: {chat_response}") - router.chat_history.extend([HumanMessage(content=query), chat_response]) - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -@router.post("/v1/rag/create") -async def rag_create(file: UploadFile = File(...)): - filename = file.filename - if "/" in filename: - filename = filename.split("/")[-1] - print(f"[rag - create] POST request: /v1/rag/create, filename:{filename}") - - kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir) - # save file to local path - cur_time = get_current_beijing_time() - save_file_name = str(user_upload_dir) + "/" + cur_time + "-" + filename - with open(save_file_name, "wb") as fout: - content = await file.read() - fout.write(content) - print(f"[rag - create] file saved to local path: {save_file_name}") - - # create new retriever - try: - # get retrieval instance and reload db with new knowledge base - print("[rag - create] starting to create local db...") - index_name = INDEX_NAME + kb_id - retriever = create_retriever_from_files(save_file_name, router.embeddings, index_name) - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - print("[rag - create] kb created successfully") - except Exception as e: - print(f"[rag - create] create knowledge base failed! {e}") - return JSONResponse(status_code=500, content={"message": "Fail to create new knowledge base."}) - return {"knowledge_base_id": kb_id} - - -@router.post("/v1/rag/upload_link") -async def rag_upload_link(request: Request): - params = await request.json() - link_list = params["link_list"] - print(f"[rag - upload_link] POST request: /v1/rag/upload_link, link list:{link_list}") - - kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir) - - # create new retriever - try: - print("[rag - upload_link] starting to create local db...") - index_name = INDEX_NAME + kb_id - retriever = create_retriever_from_links(router.embeddings, link_list, index_name) - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - print("[rag - upload_link] kb created successfully") - except Exception as e: - print(f"[rag - upload_link] create knowledge base failed! {e}") - return JSONResponse(status_code=500, content={"message": "Fail to create new knowledge base."}) - return {"knowledge_base_id": kb_id} - - -app.include_router(router) - - -@app.get("/") -async def redirect_root_to_docs(): - return RedirectResponse("/docs") - - -add_routes(app, router.llm_chain, path="/rag-redis") - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/app/utils.py b/AudioQnA/deprecated/langchain/docker/qna-app/app/utils.py deleted file mode 100644 index 71b26ee8d..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/app/utils.py +++ /dev/null @@ -1,342 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import multiprocessing -import os -import re -import unicodedata -import uuid -from datetime import datetime, timedelta, timezone -from pathlib import Path -from urllib.parse import urlparse, urlunparse - -import requests -from bs4 import BeautifulSoup -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.document_loaders import UnstructuredFileLoader -from langchain_community.vectorstores import Redis -from langchain_core.documents import Document -from rag_redis.config import INDEX_SCHEMA, REDIS_URL - - -def get_current_beijing_time(): - SHA_TZ = timezone(timedelta(hours=8), name="Asia/Shanghai") - utc_now = datetime.utcnow().replace(tzinfo=timezone.utc) - beijing_time = utc_now.astimezone(SHA_TZ).strftime("%Y-%m-%d-%H:%M:%S") - return beijing_time - - -def create_kb_folder(upload_dir): - kb_id = f"kb_{str(uuid.uuid1())[:8]}" - path_prefix = upload_dir - - # create local folder for retieval - cur_path = Path(path_prefix) / kb_id - os.makedirs(path_prefix, exist_ok=True) - cur_path.mkdir(parents=True, exist_ok=True) - user_upload_dir = Path(path_prefix) / f"{kb_id}/upload_dir" - user_persist_dir = Path(path_prefix) / f"{kb_id}/persist_dir" - user_upload_dir.mkdir(parents=True, exist_ok=True) - user_persist_dir.mkdir(parents=True, exist_ok=True) - print(f"[rag - create kb folder] upload path: {user_upload_dir}, persist path: {user_persist_dir}") - return kb_id, str(user_upload_dir), str(user_persist_dir) - - -class Crawler: - - def __init__(self, pool=None): - if pool: - assert isinstance(pool, (str, list, tuple)), "url pool should be str, list or tuple" - self.pool = pool - self.headers = { - "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng, \ - */*;q=0.8,application/signed-exchange;v=b3;q=0.7", - "Accept-Encoding": "gzip, deflate, br", - "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7", - "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, \ - like Gecko) Chrome/113.0.0.0 Safari/537.36", - } - self.fetched_pool = set() - - def get_sublinks(self, soup): - sublinks = [] - for links in soup.find_all("a"): - sublinks.append(str(links.get("href"))) - return sublinks - - def get_hyperlink(self, soup, base_url): - sublinks = [] - for links in soup.find_all("a"): - link = str(links.get("href")) - if link.startswith("#") or link is None or link == "None": - continue - suffix = link.split("/")[-1] - if "." in suffix and suffix.split(".")[-1] not in ["html", "htmld"]: - continue - link_parse = urlparse(link) - base_url_parse = urlparse(base_url) - if link_parse.path == "": - continue - if link_parse.netloc != "": - # keep crawler works in the same domain - if link_parse.netloc != base_url_parse.netloc: - continue - sublinks.append(link) - else: - sublinks.append( - urlunparse( - ( - base_url_parse.scheme, - base_url_parse.netloc, - link_parse.path, - link_parse.params, - link_parse.query, - link_parse.fragment, - ) - ) - ) - return sublinks - - def fetch(self, url, headers=None, max_times=5): - if not headers: - headers = self.headers - while max_times: - if not url.startswith("http") or not url.startswith("https"): - url = "http://" + url - print("start fetch %s...", url) - try: - response = requests.get(url, headers=headers, verify=True) - if response.status_code != 200: - print("fail to fetch %s, response status code: %s", url, response.status_code) - else: - return response - except Exception as e: - print("fail to fetch %s, caused by %s", url, e) - raise Exception(e) - max_times -= 1 - return None - - def process_work(self, sub_url, work): - response = self.fetch(sub_url) - if response is None: - return [] - self.fetched_pool.add(sub_url) - soup = self.parse(response.text) - base_url = self.get_base_url(sub_url) - sublinks = self.get_hyperlink(soup, base_url) - if work: - work(sub_url, soup) - return sublinks - - def crawl(self, pool, work=None, max_depth=10, workers=10): - url_pool = set() - for url in pool: - base_url = self.get_base_url(url) - response = self.fetch(url) - soup = self.parse(response.text) - sublinks = self.get_hyperlink(soup, base_url) - self.fetched_pool.add(url) - url_pool.update(sublinks) - depth = 0 - while len(url_pool) > 0 and depth < max_depth: - print("current depth %s...", depth) - mp = multiprocessing.Pool(processes=workers) - results = [] - for sub_url in url_pool: - if sub_url not in self.fetched_pool: - results.append(mp.apply_async(self.process_work, (sub_url, work))) - mp.close() - mp.join() - url_pool = set() - for result in results: - sublinks = result.get() - url_pool.update(sublinks) - depth += 1 - - def parse(self, html_doc): - soup = BeautifulSoup(html_doc, "lxml") - return soup - - def download(self, url, file_name): - print("download %s into %s...", url, file_name) - try: - r = requests.get(url, stream=True, headers=self.headers, verify=True) - f = open(file_name, "wb") - for chunk in r.iter_content(chunk_size=512): - if chunk: - f.write(chunk) - except Exception as e: - print("fail to download %s, caused by %s", url, e) - - def get_base_url(self, url): - result = urlparse(url) - return urlunparse((result.scheme, result.netloc, "", "", "", "")) - - def clean_text(self, text): - text = text.strip().replace("\r", "\n") - text = re.sub(" +", " ", text) - text = re.sub("\n+", "\n", text) - text = text.split("\n") - return "\n".join([i for i in text if i and i != " "]) - - -def uni_pro(text): - """Check if the character is ASCII or falls in the category of non-spacing marks.""" - normalized_text = unicodedata.normalize("NFKD", text) - filtered_text = "" - for char in normalized_text: - if ord(char) < 128 or unicodedata.category(char) == "Mn": - filtered_text += char - return filtered_text - - -def load_html_data(url): - crawler = Crawler() - res = crawler.fetch(url) - if res is None: - return None - soup = crawler.parse(res.text) - all_text = crawler.clean_text(soup.select_one("body").text) - main_content = "" - for element_name in ["main", "container"]: - main_block = None - if soup.select(f".{element_name}"): - main_block = soup.select(f".{element_name}") - elif soup.select(f"#{element_name}"): - main_block = soup.select(f"#{element_name}") - if main_block: - for element in main_block: - text = crawler.clean_text(element.text) - if text not in main_content: - main_content += f"\n{text}" - main_content = crawler.clean_text(main_content) - - main_content = main_content.replace("\n", "") - main_content = main_content.replace("\n\n", "") - main_content = uni_pro(main_content) - main_content = re.sub(r"\s+", " ", main_content) - - # {'text': all_text, 'main_content': main_content} - - return main_content - - -def get_chuck_data(content, max_length, min_length, input): - """Process the context to make it maintain a suitable length for the generation.""" - sentences = re.split("(?<=[!.?])", content) - - paragraphs = [] - current_length = 0 - count = 0 - current_paragraph = "" - for sub_sen in sentences: - count += 1 - sentence_length = len(sub_sen) - if current_length + sentence_length <= max_length: - current_paragraph += sub_sen - current_length += sentence_length - if count == len(sentences) and len(current_paragraph.strip()) > min_length: - paragraphs.append([current_paragraph.strip(), input]) - else: - paragraphs.append([current_paragraph.strip(), input]) - current_paragraph = sub_sen - current_length = sentence_length - - return paragraphs - - -def parse_html(input): - """Parse the uploaded file.""" - chucks = [] - for link in input: - if re.match(r"^https?:/{2}\w.+$", link): - content = load_html_data(link) - if content is None: - continue - chuck = [[content.strip(), link]] - chucks += chuck - else: - print("The given link/str {} cannot be parsed.".format(link)) - - return chucks - - -def document_transfer(data_collection): - "Transfer the raw document into langchain supported format." - documents = [] - for data, meta in data_collection: - doc_id = str(uuid.uuid4()) - metadata = {"source": meta, "identify_id": doc_id} - doc = Document(page_content=data, metadata=metadata) - documents.append(doc) - return documents - - -def create_retriever_from_files(doc, embeddings, index_name: str): - print(f"[rag - create retriever] create with index: {index_name}") - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - loader = UnstructuredFileLoader(doc, mode="single", strategy="fast") - chunks = loader.load_and_split(text_splitter) - - rds = Redis.from_texts( - texts=[chunk.page_content for chunk in chunks], - metadatas=[chunk.metadata for chunk in chunks], - embedding=embeddings, - index_name=index_name, - redis_url=REDIS_URL, - index_schema=INDEX_SCHEMA, - ) - - retriever = rds.as_retriever(search_type="mmr") - return retriever - - -def create_retriever_from_links(embeddings, link_list: list, index_name): - data_collection = parse_html(link_list) - texts = [] - metadatas = [] - for data, meta in data_collection: - doc_id = str(uuid.uuid4()) - metadata = {"source": meta, "identify_id": doc_id} - texts.append(data) - metadatas.append(metadata) - - rds = Redis.from_texts( - texts=texts, - metadatas=metadatas, - embedding=embeddings, - index_name=index_name, - redis_url=REDIS_URL, - index_schema=INDEX_SCHEMA, - ) - - retriever = rds.as_retriever(search_type="mmr") - return retriever - - -def reload_retriever(embeddings, index_name): - print(f"[rag - reload retriever] reload with index: {index_name}") - rds = Redis.from_existing_index( - embeddings, - index_name=index_name, - redis_url=REDIS_URL, - schema=INDEX_SCHEMA, - ) - - retriever = rds.as_retriever(search_type="mmr") - return retriever - - -def post_process_text(text: str): - if text == " ": - return "data: @#$\n\n" - if text.isspace(): - return None - if text == "\n": - return "data:
\n\n" - new_text = text.replace(" ", "@#$") - return f"data: {new_text}\n\n" diff --git a/AudioQnA/deprecated/langchain/docker/qna-app/pyproject.toml b/AudioQnA/deprecated/langchain/docker/qna-app/pyproject.toml deleted file mode 100644 index 0c3faea39..000000000 --- a/AudioQnA/deprecated/langchain/docker/qna-app/pyproject.toml +++ /dev/null @@ -1,23 +0,0 @@ -[tool.poetry] -name = "my-app" -version = "0.1.0" -description = "" -authors = ["Your Name "] -readme = "README.md" -packages = [ - { include = "app" }, -] - -[tool.poetry.dependencies] -python = "^3.11" -uvicorn = "^0.23.2" -langserve = {extras = ["server"], version = ">=0.0.30"} -pydantic = "<2" - - -[tool.poetry.group.dev.dependencies] -langchain-cli = ">=0.0.15" - -[build-system] -requires = ["poetry-core"] -build-backend = "poetry.core.masonry.api" diff --git a/AudioQnA/deprecated/langchain/docker/requirements.txt b/AudioQnA/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index 261f59c59..000000000 --- a/AudioQnA/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,17 +0,0 @@ --f https://download.pytorch.org/whl/torch_stable.html -cryptography==42.0.4 -easyocr -intel-extension-for-pytorch -intel-openmp -jupyter -langchain==0.1.12 -langchain-cli -langchain_benchmarks -poetry -pyarrow -pydantic==1.10.13 -pymupdf -redis -sentence-transformers -unstructured -unstructured[all-docs] diff --git a/AudioQnA/deprecated/langchain/redis/LICENSE b/AudioQnA/deprecated/langchain/redis/LICENSE deleted file mode 100644 index 426b65090..000000000 --- a/AudioQnA/deprecated/langchain/redis/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2023 LangChain, Inc. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/AudioQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf b/AudioQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf deleted file mode 100644 index 6ade8863e..000000000 Binary files a/AudioQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf and /dev/null differ diff --git a/AudioQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf b/AudioQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf deleted file mode 100644 index 3b10122cf..000000000 Binary files a/AudioQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf and /dev/null differ diff --git a/AudioQnA/deprecated/langchain/redis/ingest.py b/AudioQnA/deprecated/langchain/redis/ingest.py deleted file mode 100644 index 2ee8f634a..000000000 --- a/AudioQnA/deprecated/langchain/redis/ingest.py +++ /dev/null @@ -1,86 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.vectorstores import Redis -from PIL import Image -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - - -def pdf_loader(file_path): - try: - import easyocr - import fitz - except ImportError: - raise ImportError( - "`PyMuPDF` or 'easyocr' package is not found, please install it with " - "`pip install pymupdf or pip install easyocr.`" - ) - - doc = fitz.open(file_path) - reader = easyocr.Reader(["en"]) - result = "" - for i in range(doc.page_count): - page = doc.load_page(i) - pagetext = page.get_text().strip() - if pagetext: - result = result + pagetext - if len(doc.get_page_images(i)) > 0: - for img in doc.get_page_images(i): - if img: - pageimg = "" - xref = img[0] - img_data = doc.extract_image(xref) - img_bytes = img_data["image"] - pil_image = Image.open(io.BytesIO(img_bytes)) - img = np.array(pil_image) - img_result = reader.readtext(img, paragraph=True, detail=0) - pageimg = pageimg + ", ".join(img_result).strip() - if pageimg.endswith("!") or pageimg.endswith("?") or pageimg.endswith("."): - pass - else: - pageimg = pageimg + "." - result = result + pageimg - return result - - -def ingest_documents(): - """Ingest PDF to Redis from the data/ directory that - contains Edgar 10k filings data for Nike.""" - # Load list of pdfs - company_name = "Nike" - data_path = "data/" - doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0] - - print("Parsing 10k filing doc for NIKE", doc_path) - - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - content = pdf_loader(doc_path) - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created ", len(chunks), " chunks of the original pdf") - # Create vectorstore - embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - - _ = Redis.from_texts( - # appending this little bit can sometimes help with semantic retrieval - # especially with multiple companies - texts=[f"Company: {company_name}. " + chunk for chunk in chunks], - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, - ) - - -if __name__ == "__main__": - ingest_documents() diff --git a/AudioQnA/deprecated/langchain/redis/ingest_dir_text.py b/AudioQnA/deprecated/langchain/redis/ingest_dir_text.py deleted file mode 100644 index e17997e76..000000000 --- a/AudioQnA/deprecated/langchain/redis/ingest_dir_text.py +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.document_loaders import DirectoryLoader, TextLoader, UnstructuredFileLoader -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.vectorstores import Redis -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - -loader = DirectoryLoader( - "/ws/txt_files", glob="**/*.txt", show_progress=True, use_multithreading=True, loader_cls=TextLoader -) - -text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - -chunks = loader.load_and_split(text_splitter) -print("Done preprocessing. Created", len(chunks), "chunks of the original data") - -# Create vectorstore -embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - -company_name = "Intel" -_ = Redis.from_texts( - # appending this little bit can sometimes help with semantic retrieval - # especially with multiple companies - texts=[f"Company: {company_name}. " + chunk.page_content for chunk in chunks], - metadatas=[chunk.metadata for chunk in chunks], - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, -) diff --git a/AudioQnA/deprecated/langchain/redis/ingest_intel.py b/AudioQnA/deprecated/langchain/redis/ingest_intel.py deleted file mode 100644 index 78d817d32..000000000 --- a/AudioQnA/deprecated/langchain/redis/ingest_intel.py +++ /dev/null @@ -1,86 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.vectorstores import Redis -from PIL import Image -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - - -def pdf_loader(file_path): - try: - import easyocr - import fitz - except ImportError: - raise ImportError( - "`PyMuPDF` or 'easyocr' package is not found, please install it with " - "`pip install pymupdf or pip install easyocr.`" - ) - - doc = fitz.open(file_path) - reader = easyocr.Reader(["en"]) - result = "" - for i in range(doc.page_count): - page = doc.load_page(i) - pagetext = page.get_text().strip() - if pagetext: - result = result + pagetext - if len(doc.get_page_images(i)) > 0: - for img in doc.get_page_images(i): - if img: - pageimg = "" - xref = img[0] - img_data = doc.extract_image(xref) - img_bytes = img_data["image"] - pil_image = Image.open(io.BytesIO(img_bytes)) - img = np.array(pil_image) - img_result = reader.readtext(img, paragraph=True, detail=0) - pageimg = pageimg + ", ".join(img_result).strip() - if pageimg.endswith("!") or pageimg.endswith("?") or pageimg.endswith("."): - pass - else: - pageimg = pageimg + "." - result = result + pageimg - return result - - -def ingest_documents(): - """Ingest PDF to Redis from the data/ directory that - contains Intel manuals.""" - # Load list of pdfs - company_name = "Intel" - data_path = "data_intel/" - doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0] - - print("Parsing Intel architecture manuals", doc_path) - - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - content = pdf_loader(doc_path) - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created", len(chunks), "chunks of the original pdf") - # Create vectorstore - embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - - _ = Redis.from_texts( - # appending this little bit can sometimes help with semantic retrieval - # especially with multiple companies - texts=[f"Company: {company_name}. " + chunk for chunk in chunks], - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, - ) - - -if __name__ == "__main__": - ingest_documents() diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis.ipynb b/AudioQnA/deprecated/langchain/redis/rag_redis.ipynb deleted file mode 100644 index bb3f87a8c..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis.ipynb +++ /dev/null @@ -1,88 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "681a5d1e", - "metadata": {}, - "source": [ - "## Connect to RAG App\n", - "\n", - "Assuming you are already running this server:\n", - "```bash\n", - "langserve start\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "id": "d774be2a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Nike's revenue in 2023 was $51.2 billion. \n", - "\n", - "Source: 'data/nke-10k-2023.pdf', Start Index: '146100'\n" - ] - } - ], - "source": [ - "from langserve.client import RemoteRunnable\n", - "\n", - "rag_redis = RemoteRunnable(\"http://localhost:8000/rag-redis\")\n", - "\n", - "print(rag_redis.invoke(\"What was Nike's revenue in 2023?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "id": "07ae0005", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "As of May 31, 2023, Nike had approximately 83,700 employees worldwide. This information can be found in the first piece of context provided. (source: data/nke-10k-2023.pdf, start_index: 32532)\n" - ] - } - ], - "source": [ - "print(rag_redis.invoke(\"How many employees work at Nike?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4a6b9f00", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/__init__.py b/AudioQnA/deprecated/langchain/redis/rag_redis/__init__.py deleted file mode 100644 index 916f3a44b..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/__init__.py +++ /dev/null @@ -1,2 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/chain.py b/AudioQnA/deprecated/langchain/redis/rag_redis/chain.py deleted file mode 100644 index c3bfdc76a..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/chain.py +++ /dev/null @@ -1,76 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.vectorstores import Redis -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.pydantic_v1 import BaseModel -from langchain_core.runnables import RunnableParallel, RunnablePassthrough -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL, TGI_LLM_ENDPOINT - - -# Make this look better in the docs. -class Question(BaseModel): - __root__: str - - -# Init Embeddings -embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - -# Setup semantic cache for LLM -from langchain.cache import RedisSemanticCache -from langchain.globals import set_llm_cache - -set_llm_cache(RedisSemanticCache(embedding=embedder, redis_url=REDIS_URL)) - -# Connect to pre-loaded vectorstore -# run the ingest.py script to populate this -vectorstore = Redis.from_existing_index( - embedding=embedder, index_name=INDEX_NAME, schema=INDEX_SCHEMA, redis_url=REDIS_URL -) - -# TODO allow user to change parameters -retriever = vectorstore.as_retriever(search_type="mmr") - -# Define our prompt -template = """ -Use the following pieces of context from retrieved -dataset to answer the question. Do not make up an answer if there is no -context provided to help answer it. Include the 'source' and 'start_index' -from the metadata included in the context you used to answer the question - -Context: ---------- -{context} - ---------- -Question: {question} ---------- - -Answer: -""" - -prompt = ChatPromptTemplate.from_template(template) - -# RAG Chain -model = HuggingFaceEndpoint( - endpoint_url=TGI_LLM_ENDPOINT, - max_new_tokens=512, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - truncate=1024, -) - -chain = ( - RunnableParallel({"context": retriever, "question": RunnablePassthrough()}) | prompt | model | StrOutputParser() -).with_types(input_type=Question) diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/config.py b/AudioQnA/deprecated/langchain/redis/rag_redis/config.py deleted file mode 100644 index 3dba62a70..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/config.py +++ /dev/null @@ -1,88 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - - -def get_boolean_env_var(var_name, default_value=False): - """Retrieve the boolean value of an environment variable. - - Args: - var_name (str): The name of the environment variable to retrieve. - default_value (bool): The default value to return if the variable - is not found. - - Returns: - bool: The value of the environment variable, interpreted as a boolean. - """ - true_values = {"true", "1", "t", "y", "yes"} - false_values = {"false", "0", "f", "n", "no"} - - # Retrieve the environment variable's value - value = os.getenv(var_name, "").lower() - - # Decide the boolean value based on the content of the string - if value in true_values: - return True - elif value in false_values: - return False - else: - return default_value - - -# Check for openai API key -# if "OPENAI_API_KEY" not in os.environ: -# raise Exception("Must provide an OPENAI_API_KEY as an env var.") - - -# Whether or not to enable langchain debugging -DEBUG = get_boolean_env_var("DEBUG", False) -# Set DEBUG env var to "true" if you wish to enable LC debugging module -if DEBUG: - import langchain - - langchain.debug = True - - -# Embedding model -EMBED_MODEL = os.getenv("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2") - -# Redis Connection Information -REDIS_HOST = os.getenv("REDIS_HOST", "localhost") -REDIS_PORT = int(os.getenv("REDIS_PORT", 6379)) - - -def format_redis_conn_from_env(): - redis_url = os.getenv("REDIS_URL", None) - if redis_url: - return redis_url - else: - using_ssl = get_boolean_env_var("REDIS_SSL", False) - start = "rediss://" if using_ssl else "redis://" - - # if using RBAC - password = os.getenv("REDIS_PASSWORD", None) - username = os.getenv("REDIS_USERNAME", "default") - if password is not None: - start += f"{username}:{password}@" - - return start + f"{REDIS_HOST}:{REDIS_PORT}" - - -REDIS_URL = format_redis_conn_from_env() - -# Vector Index Configuration -INDEX_NAME = os.getenv("INDEX_NAME", "rag-redis") - - -current_file_path = os.path.abspath(__file__) -parent_dir = os.path.dirname(current_file_path) -REDIS_SCHEMA = os.getenv("REDIS_SCHEMA", "schema.yml") -schema_path = os.path.join(parent_dir, REDIS_SCHEMA) -INDEX_SCHEMA = schema_path -TGI_LLM_ENDPOINT = os.getenv("TGI_LLM_ENDPOINT", "http://localhost:8080") -TGI_LLM_ENDPOINT_NO_RAG = os.getenv("TGI_LLM_ENDPOINT_NO_RAG", "http://localhost:8081") diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/schema.yml b/AudioQnA/deprecated/langchain/redis/rag_redis/schema.yml deleted file mode 100644 index 011273363..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/schema.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 384 - distance_metric: COSINE diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml b/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml deleted file mode 100644 index b4887fed0..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 1024 - distance_metric: COSINE diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml b/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml deleted file mode 100644 index d615774e4..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 768 - distance_metric: COSINE diff --git a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml b/AudioQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml deleted file mode 100644 index 296e49cc3..000000000 --- a/AudioQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml +++ /dev/null @@ -1,19 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: changefreq - - name: description - - name: language - - name: loc - - name: priority - - name: source - - name: title -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 768 - distance_metric: COSINE diff --git a/AudioQnA/deprecated/serving/tgi_gaudi/README.md b/AudioQnA/deprecated/serving/tgi_gaudi/README.md deleted file mode 100644 index c9a8d510e..000000000 --- a/AudioQnA/deprecated/serving/tgi_gaudi/README.md +++ /dev/null @@ -1,89 +0,0 @@ -[TGI-Gaudi](https://github.com/huggingface/tgi-gaudi) provides many parameters aimed at optimizing performance for text generation inference tasks. By optimizing these parameters, users can achieve the best results in terms of inference speed, memory usage, and overall efficiency. These parameters cover various aspects such as maximum sequence length, batch size, Gaudi processor utilization, and environment configurations. By carefully adjusting these parameters according to the specific requirements of the workload and hardware environment, users can unlock the full potential of TGI-Gaudi for the text generation tasks. - -# Knowledeges about TGI-Gaudi performance tuning - -## Adjusting TGI parameters - -Maximum sequence length is controlled by two arguments: - -- `--max-input-length` is the maximum possible input prompt length. Default value is `1024`. -- `--max-total-tokens` is the maximum possible total length of the sequence (input and output). Default value is `2048`. - -Maximum batch size is controlled by two arguments: - -- For prefill operation, please set `--max-prefill-total-tokens` as `bs * max-input-length`, where `bs` is your expected maximum prefill batch size. -- For decode operation, please set `--max-batch-total-tokens` as `bs * max-total-tokens`, where `bs` is your expected maximum decode batch size. -- Please note that batch size will be always padded to the nearest multiplication of `BATCH_BUCKET_SIZE` and `PREFILL_BATCH_BUCKET_SIZE`. - -To ensure greatest performance results, at the beginning of each server run, warmup is performed. It's designed to cover major recompilations while using HPU Graphs. It creates queries with all possible input shapes, based on provided parameters (described in this section) and runs basic TGI operations on them (prefill, decode, concatenate). - -Except those already mentioned, there are other parameters that need to be properly adjusted to improve performance or memory usage: - -- `PAD_SEQUENCE_TO_MULTIPLE_OF` determines sizes of input length buckets. Since warmup creates several graphs for each bucket, it's important to adjust that value proportionally to input sequence length. Otherwise, some out of memory issues can be observed. -- `ENABLE_HPU_GRAPH` enables HPU graphs usage, which is crucial for performance results. Recommended value to keep is `true` . - -For more information and documentation about Text Generation Inference, checkout [the README](https://github.com/huggingface/text-generation-inference#text-generation-inference) of the original repo. - -## Environment Variable HABANA_VISIBLE_MODULES - -To run a workload with part of the available Gaudi processors, you need to set the module IDs of the used Gaudi processors in the environment, HABANA_VISIBLE_MODULES. In general, there are eight Gaudi processors on a node, so the module IDs would be in the range of 0 ~ 7. If you want to run a 4-Gaudi workload, you can set the below before you run the workload: - -```bash -export HABANA_VISIBLE_MODULES="0,1,2,3" -``` - -If you want to run another 4-Gaudi workload in parallel, you can set the below before running the second workload to let it use the rest of the available four Gaudi processors. - -```bash -export HABANA_VISIBLE_MODULES="4,5,6,7" -``` - -Though using partial Gaudi in a workload is possible, only 2-Gaudi and 4-Gaudi scenarios are supported. It is highly recommended to set HABANA_VISIBLE_MODULES using the combinations listed below: - -- 2-Gaudi - “0,1”, “2,3”, “4,5” or “6,7” -- 4-Gaudi - “0,1,2,3” or “4,5,6,7” - -For the details please check [Multiple_Workloads_Single_Docker](https://docs.habana.ai/en/latest/PyTorch/Reference/PT_Multiple_Tenants_on_HPU/Multiple_Workloads_Single_Docker.html) - -## Environment Variable HABANA_VISIBLE_DEVICES - -There are some guidelines on setting HABANA_VISIBLE_DEVICES, however, you need to know how to find the mapping between the index and module ID of the Gaudi processors before reading the guidelines. The below command is a sample output of the mapping between index and module ID of the Gaudi processors: - -```bash -hl-smi -Q index,module_id -f csv -``` - -| index | module_id | -| :---: | :-------: | -| 3 | 6 | -| 1 | 4 | -| 2 | 7 | -| 0 | 5 | -| 4 | 2 | -| 6 | 0 | -| 7 | 3 | -| 3 | 1 | - -With the mapping between index and module ID, you can set `HABANA_VISIBLE_DEVICES` properly with the guidelines below: - -- Mount two Gaudi Processors or four Gaudi Processors in the docker container. Even though using partial Gaudi in a distributed workload is possible, only 2-Gaudi and 4-Gaudi scenario are allowed. -- Since `HABANA_VISIBLE_DEVICES` accepts index instead of module ID, you need to leverage the above command to figure out the corresponding indices for a set of module IDs. -- Avoid mounting the same index on multiple containers. Since multiple workloads might run in parallel, avoiding mounting the same Gaudi to multiple docker containers can prevent reusing the same Gaudi in different workloads. - -For the details please check [Multiple Dockers Each with a Single Workload](https://docs.habana.ai/en/latest/PyTorch/Reference/PT_Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html) - -For the System Management Interface Tool please check [hl-smi](https://docs.habana.ai/en/latest/Management_and_Monitoring/Embedded_System_Tools_Guide/System_Management_Interface_Tool.html) - -# Verified Docker commands with tuned parameters for best performance - -## Docker command for 70B model - -```bash -docker run -p 8080:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$HUGGINGFACEHUB_API_TOKEN -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES="6,7,4,5" -e HABANA_VISIBLE_MODULES="0,1,2,3" -e BATCH_BUCKET_SIZE=22 -e PREFILL_BATCH_BUCKET_SIZE=1 -e MAX_BATCH_PREFILL_TOKENS=5102 -e MAX_BATCH_TOTAL_TOKENS=32256 -e MAX_INPUT_LENGTH=1024 -e PAD_SEQUENCE_TO_MULTIPLE_OF=1024 -e MAX_WAITING_TOKENS=5 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 4 -``` - -## Docker command for 13B model - -```bash -docker run -p 8080:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$HUGGINGFACEHUB_API_TOKEN -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e PAD_SEQUENCE_TO_MULTIPLE_OF=128 -e HABANA_VISIBLE_DEVICES="4" -e BATCH_BUCKET_SIZE=16 -e PREFILL_BATCH_BUCKET_SIZE=1 -e MAX_BATCH_PREFILL_TOKENS=4096 -e MAX_BATCH_TOTAL_TOKENS=18432 -e PAD_SEQUENCE_TO_MULTIPLE_OF=1024 -e MAX_INPUT_LENGTH=1024 -e MAX_TOTAL_TOKENS=1152 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model -``` diff --git a/AudioQnA/deprecated/serving/tgi_gaudi/build_docker.sh b/AudioQnA/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 80c00c9fc..000000000 --- a/AudioQnA/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/AudioQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/AudioQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index d3a9fcb54..000000000 --- a/AudioQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="Intel/neural-chat-7b-v3-3" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name="ChatQnA_server" -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name="ChatQnA_server" -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -echo $docker_cmd -eval $docker_cmd diff --git a/AudioQnA/deprecated/tests/test_asr.sh b/AudioQnA/deprecated/tests/test_asr.sh deleted file mode 100644 index fb0d3b41d..000000000 --- a/AudioQnA/deprecated/tests/test_asr.sh +++ /dev/null @@ -1,63 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD")/audio/docker - LOG_PATH=$(dirname "$PWD")/tests/asr.log - ASR_CONTAINER_NAME="test-audioqna-asr" - cd $WORKPATH -} - -function start_asr_service() { - cd $WORKPATH - docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -f Dockerfile_asr -t intel/gen-ai-examples:$ASR_CONTAINER_NAME - docker run -d --name=$ASR_CONTAINER_NAME -e http_proxy=$http_proxy -e https_proxy=$https_proxy -p 8018:8008 intel/gen-ai-examples:$ASR_CONTAINER_NAME - sleep 1m -} - -function run_tests() { - cd $WORKPATH - rm -f sample.wav - wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav - http_proxy= curl -F 'file=@sample.wav' http://localhost:8018/v1/audio/transcriptions > $LOG_PATH - rm -f sample.wav -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH ]] && [[ $(grep -c "who is pat gelsinger" $LOG_PATH) != 0 ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed" - exit 1 - else - echo "Response check succeed" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - - - -function main() { - test_env_setup - docker_stop $ASR_CONTAINER_NAME && sleep 5s - start_asr_service - run_tests - docker_stop $ASR_CONTAINER_NAME && sleep 5s - echo y | docker system prune - check_response -} - -main diff --git a/AudioQnA/deprecated/tests/test_langchain_inference.sh b/AudioQnA/deprecated/tests/test_langchain_inference.sh deleted file mode 100644 index bbd5c81b5..000000000 --- a/AudioQnA/deprecated/tests/test_langchain_inference.sh +++ /dev/null @@ -1,110 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests/langchain.log" - - REDIS_CONTAINER_NAME="test-redis-vector-db" - LANGCHAIN_CONTAINER_NAME="test-qna-rag-redis-server" - AUDIOQNA_CONTAINER_NAME="test-AudioQnA_server" - cd $WORKPATH -} - -function rename() { - # Rename the docker container/image names to avoid conflict with local test - cd ${WORKPATH} - sed -i "s/container_name: redis-vector-db/container_name: ${REDIS_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/container_name: qna-rag-redis-server/container_name: ${LANGCHAIN_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/image: intel\/gen-ai-examples:qna-rag-redis-server/image: intel\/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/ChatQnA_server/${AUDIOQNA_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh -} - -function launch_tgi_gaudi_service() { - local card_num=1 - local port=8888 - local model_name="Intel/neural-chat-7b-v3-3" - - cd ${WORKPATH} - - # Reset the tgi port - sed -i "s/8080/$port/g" langchain/redis/rag_redis/config.py - sed -i "s/8080/$port/g" langchain/docker/qna-app/app/server.py - sed -i "s/8080/$port/g" langchain/docker/qna-app/Dockerfile - - docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 - bash serving/tgi_gaudi/launch_tgi_service.sh $card_num $port $model_name - sleep 3m # Waits 3 minutes -} - -function launch_redis_and_langchain_service() { - cd $WORKPATH - export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} - local port=8890 - sed -i "s/port=8000/port=$port/g" langchain/docker/qna-app/app/server.py - docker compose -f langchain/docker/docker-compose.yml up -d --build - - # Ingest data into redis - docker exec $LANGCHAIN_CONTAINER_NAME \ - bash -c "cd /ws && python ingest.py > /dev/null" -} - -function start_backend_service() { - cd $WORKPATH - docker exec $LANGCHAIN_CONTAINER_NAME \ - bash -c "nohup python app/server.py &" - sleep 1m -} - -function run_tests() { - cd $WORKPATH - local port=8890 - curl 127.0.0.1:$port/v1/rag/chat \ - -X POST \ - -d "{\"query\":\"What is the total revenue of Nike in 2023?\"}" \ - -H 'Content-Type: application/json' > $LOG_PATH -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH ]] && [[ $(grep -c "\$51.2 billion" $LOG_PATH) != 0 ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed" - exit 1 - else - echo "Response check succeed" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $CHATQNA_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && docker_stop $REDIS_CONTAINER_NAME && sleep 5s - - launch_tgi_gaudi_service - launch_redis_and_langchain_service - start_backend_service - - run_tests - - docker_stop $AUDIOQNA_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && docker_stop $REDIS_CONTAINER_NAME && sleep 5s - echo y | docker system prune - - check_response -} - -main diff --git a/AudioQnA/deprecated/tests/test_tts.sh b/AudioQnA/deprecated/tests/test_tts.sh deleted file mode 100644 index e61234d14..000000000 --- a/AudioQnA/deprecated/tests/test_tts.sh +++ /dev/null @@ -1,84 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD")/audio/docker - OUTPUT_PATH=$(dirname "$PWD")/tests/output.wav - TTS_CONTAINER_NAME="test-audioqna-tts" - cd $WORKPATH -} - -function start_tts_service() { - cd $WORKPATH - rm -rf pretrained_tts_models - git clone https://huggingface.co/lj1995/GPT-SoVITS pretrained_tts_models - docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -f Dockerfile_tts -t intel/gen-ai-examples:$TTS_CONTAINER_NAME - docker run -d --name=$TTS_CONTAINER_NAME -v ./pretrained_tts_models:/GPT-SoVITS/GPT_SoVITS/pretrained_models -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} -p 9888:9880 intel/gen-ai-examples:$TTS_CONTAINER_NAME --bf16 - sleep 1m -} - -function run_tests() { - cd $WORKPATH - rm -f ${OUTPUT_PATH} - rm -f sample.wav - - # Upload reference audio as default voice - wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav - curl --location 'localhost:9888/upload_as_default' \ - --form 'default_refer_file=@"sample.wav"' \ - --form 'default_refer_text="Who is Pat Gelsinger?"' \ - --form 'default_refer_language="en"' - - # Do text to speech conversion - curl --location 'localhost:9888/v1/audio/speech' \ - --header 'Content-Type: application/json' \ - --data '{ - "text": "You can have a look, but you should not touch this item.", - "text_language": "en" - }' \ - --output ${OUTPUT_PATH} - rm -f sample.wav -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - - if [[ -f $OUTPUT_PATH ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed" - exit 1 - else - echo "Response check succeed" - fi - - # clear resources - rm -f ${OUTPUT_PATH} -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - docker_stop $TTS_CONTAINER_NAME && sleep 5s - - start_tts_service - run_tests - check_response - - docker_stop $TTS_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main diff --git a/ChatQnA/deprecated/README.md b/ChatQnA/deprecated/README.md deleted file mode 100644 index 985f7cd29..000000000 --- a/ChatQnA/deprecated/README.md +++ /dev/null @@ -1,279 +0,0 @@ -# ChatQnA Application - -Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLM). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for developing chatbots because it combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. - -RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. At the heart of this architecture are vector databases, instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity. - -ChatQnA architecture shows below: - -![architecture](https://i.imgur.com/lLOnQio.png) - -This ChatQnA use case performs RAG using LangChain, Redis vectordb and Text Generation Inference on Intel Gaudi2. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details. - -# Solution Overview - -Steps to implement the solution are as follows - -## In Intel Gaudi2 Platform - -1. [Deploy a TGI container with LLM model of your choice](#launch-tgi-gaudi-service) (Solution uses 70B model by default) - -## In Intel Xeon Platform - -1. [Export TGI endpoint as environment variable](#customize-tgi-gaudi-service) -2. [Deploy a TEI container for Embedding model service and export the endpoint](#enable-tei-for-embedding-model) -3. [Launch a Redis container and Langchain container](#launch-redis-and-langchain-backend-service) -4. [Ingest data into redis](#ingest-data-into-redis), this example provides few example PDF documents -5. [Start the backend service](#start-the-backend-service) to accept queries to Langchain -6. [Start the GUI](#start-the-frontend-service) based chatbot service to experiment with RAG based Chatbot - -To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps: - -## Prepare TGI Docker - -Getting started is straightforward with the official Docker container. Simply pull the image using: - -```bash -docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 -``` - -Alternatively, you can build the Docker image yourself using latest [TGI-Gaudi](https://github.com/huggingface/tgi-gaudi) code with the below command: - -```bash -bash ./serving/tgi_gaudi/build_docker.sh -``` - -## Launch TGI Gaudi Service - -### Launch a local server instance on 1 Gaudi card: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh -``` - -For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\ to the docker run command above with a valid Hugging Face Hub read token. - -Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token. - -```bash -export HUGGINGFACEHUB_API_TOKEN= -``` - -### Launch a local server instance on 8 Gaudi cards: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh 8 -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -### Customize TGI Gaudi Service - -The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters: - -- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1. -- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080. -- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3". - -You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_LLM_ENDPOINT`: - -```bash -export TGI_LLM_ENDPOINT="http://xxx.xxx.xxx.xxx:8080" -``` - -## Enable TEI for embedding model - -Text Embeddings Inference (TEI) is a toolkit designed for deploying and serving open-source text embeddings and sequence classification models efficiently. With TEI, users can extract high-performance features using various popular models. It supports token-based dynamic batching for enhanced performance. - -To launch the TEI service, you can use the following commands: - -```bash -model=BAAI/bge-large-en-v1.5 -revision=refs/pr/5 -volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run -docker run -p 9090:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 --model-id $model --revision $revision -export TEI_ENDPOINT="http://xxx.xxx.xxx.xxx:9090" -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:9090/embed \ - -X POST \ - -d '{"inputs":"What is Deep Learning?"}' \ - -H 'Content-Type: application/json' -``` - -Note: If you want to integrate the TEI service into the LangChain application, you'll need to restart the LangChain backend service after launching the TEI service. - -## Launch Vector Database and LangChain Backend Service - -Update the `HUGGINGFACEHUB_API_TOKEN` environment variable with your huggingface token in the `docker-compose.yml` - -By default, Redis is used as the vector store. To use Qdrant, use the `docker-compose-qdrant.yml` file instead. - -```bash -cd langchain/docker -docker compose -f docker-compose.yml up -d -# To use Qdrant, run -# docker compose -f docker-compose-qdrant.yml up -d -cd ../../ -``` - -> [!NOTE] -> If you modified any files and want that change introduced in this step, add `--build` to the end of the command to build the container image instead of pulling it from dockerhub. - -## Ingest Data Into Vector Database - -Each time the vector database container is launched, data should be ingested into the container using the commands: - -```bash -docker exec -it qna-rag-redis-server bash -# To use Qdrant, run -# docker exec -it qna-rag-qdrant-server bash -cd /ws -python ingest.py -``` - -Note: `ingest.py` will download the embedding model. Please set the proxy if necessary. - -# Start LangChain Server - -## Enable GuardRails using Meta's Llama Guard model (Optional) - -We offer content moderation support utilizing Meta's [Llama Guard](https://huggingface.co/meta-llama/LlamaGuard-7b) model. To activate GuardRails, kindly follow the instructions below to deploy the Llama Guard model on TGI Gaudi. - -```bash -volume=$PWD/data -model_id="meta-llama/LlamaGuard-7b" -docker run -p 8088:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HUGGING_FACE_HUB_TOKEN= -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi --model-id $model_id -export SAFETY_GUARD_ENDPOINT="http://xxx.xxx.xxx.xxx:8088" -``` - -And then you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8088/generate \ - -X POST \ - -d '{"inputs":"How do you buy a tiger in the US?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -## Start the Backend Service - -Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service: - -```bash -docker exec -it qna-rag-redis-server bash -# export TGI_LLM_ENDPOINT="http://xxx.xxx.xxx.xxx:8080" - can be omitted if set before in docker-compose.yml -# export TEI_ENDPOINT="http://xxx.xxx.xxx.xxx:9090" - Needs to be added only if TEI to be used and can be omitted if set before in docker-compose.yml -nohup python app/server.py & -``` - -The LangChain backend service listens to port 8000, you can customize it by changing the code in `docker/qna-app/app/server.py`. - -And then you can make requests like below to check the LangChain backend service status: - -```bash -# non-streaming endpoint -curl 127.0.0.1:8000/v1/rag/chat \ - -X POST \ - -d '{"query":"What is the total revenue of Nike in 2023?"}' \ - -H 'Content-Type: application/json' -``` - -```bash -# streaming endpoint -curl 127.0.0.1:8000/v1/rag/chat_stream \ - -X POST \ - -d '{"query":"What is the total revenue of Nike in 2023?"}' \ - -H 'Content-Type: application/json' -``` - -## Start the Frontend Service - -Navigate to the "ui" folder and execute the following commands to start the frontend GUI: - -```bash -cd ui -sudo apt-get install npm && \ - npm install -g n && \ - n stable && \ - hash -r && \ - npm install -g npm@latest -``` - -For CentOS, please use the following commands instead: - -```bash -curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash - -sudo yum install -y nodejs -``` - -Update the `DOC_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address. - -Run the following command to install the required dependencies: - -```bash -npm install -``` - -Start the development server by executing the following command: - -```bash -nohup npm run dev & -``` - -This will initiate the frontend service and launch the application. - -# Enable TGI Gaudi FP8 for higher throughput (Optional) - -The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. Note that currently only Llama2 series and Mistral series models support FP8 quantization. Please follow the below steps to enable FP8 quantization. - -## Prepare Metadata for FP8 Quantization - -Enter into the TGI Gaudi docker container, and then run the below commands: - -```bash -pip install git+https://github.com/huggingface/optimum-habana.git -git clone https://github.com/huggingface/optimum-habana.git -cd optimum-habana/examples/text-generation -pip install -r requirements_lm_eval.txt -QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_measure.txt --model_name_or_path Intel/neural-chat-7b-v3-3 --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1 -QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_quant.txt --model_name_or_path Intel/neural-chat-7b-v3-3 --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1 --fp8 -``` - -After finishing the above commands, the quantization metadata will be generated. Move the metadata directory ./hqt_output/ and copy the quantization JSON file to the host (under …/data). Please adapt the commands with your Docker ID and directory path. - -```bash -docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/hqt_output data/ -docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/quantization_config/maxabs_quant.json data/ -``` - -Then modify the `dump_stats_path` to "/data/hqt_output/measure" and update `dump_stats_xlsx_path` to /data/hqt_output/measure/fp8stats.xlsx" in maxabs_quant.json file. - -## Restart the TGI Gaudi server within all the metadata mapped - -```bash -docker run -p 8080:80 -e QUANT_CONFIG=/data/maxabs_quant.json -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id Intel/neural-chat-7b-v3-3 -``` - -Now the TGI Gaudi will launch the FP8 model by default and you can make requests like below to check the service status: - -```bash -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - -# - -SCRIPT USAGE NOTICE:  By downloading and using any script file included with the associated software package (such as files with .bat, .cmd, or .JS extensions, Docker files, or any other type of file that, when executed, automatically downloads and/or installs files onto your system) (the “Script File”), it is your obligation to review the Script File to understand what files (e.g.,  other software, AI models, AI Datasets) the Script File will download to your system (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, even if they are installed through a silent install, you agree to any and all terms and conditions associated with such files, including but not limited to, license terms, notices, or disclaimers. diff --git a/ChatQnA/deprecated/benchmarking/README.md b/ChatQnA/deprecated/benchmarking/README.md deleted file mode 100644 index 2b131089c..000000000 --- a/ChatQnA/deprecated/benchmarking/README.md +++ /dev/null @@ -1 +0,0 @@ -Will update soon. diff --git a/ChatQnA/deprecated/benchmarking/client.py b/ChatQnA/deprecated/benchmarking/client.py deleted file mode 100644 index b4e085f56..000000000 --- a/ChatQnA/deprecated/benchmarking/client.py +++ /dev/null @@ -1,53 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import argparse -import concurrent.futures -import json -import random - -import requests - - -def extract_qText(json_data): - try: - file = open("devtest.json") - data = json.load(file) - json_data = json.loads(json_data) - json_data["inputs"] = data[random.randint(0, len(data) - 1)]["qText"] - return json.dumps(json_data) - except (json.JSONDecodeError, KeyError, IndexError): - return None - - -def send_request(url, json_data): - headers = {"Content-Type": "application/json"} - response = requests.post(url, data=json_data, headers=headers) - print(f"Question: {json_data} Response: {response.status_code} - {response.text}") - - -def main(url, json_data, concurrency): - with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor: - future_to_url = { - executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency * 2) - } - for future in concurrent.futures.as_completed(future_to_url): - _ = future_to_url[future] - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Concurrent client to send POST requests") - parser.add_argument("--url", type=str, default="http://localhost:12345", help="URL to send requests to") - parser.add_argument( - "--json_data", - type=str, - default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}', - help="JSON data to send", - ) - parser.add_argument("--concurrency", type=int, default=100, help="Concurrency level") - args = parser.parse_args() - main(args.url, args.json_data, args.concurrency) diff --git a/ChatQnA/deprecated/benchmarking/devtest.json b/ChatQnA/deprecated/benchmarking/devtest.json deleted file mode 100644 index f32faac7e..000000000 --- a/ChatQnA/deprecated/benchmarking/devtest.json +++ /dev/null @@ -1,12752 +0,0 @@ -[ - { - "qId": "wqr000000", - "qText": "what is the name of justin bieber brother?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "10131898", "text": "monk", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9836176", "text": "associate", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10256643", "text": "kinsman", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9651570", "text": "religious person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10538195", "text": "religious", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10255246", "text": "relative", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9649426", "text": "peer", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10132360", "text": "friend", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10305781", "text": "male sibling", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10326901", "text": "member", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "brother", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Justin Bieber", - "cookedLabel": "Justin Bieber", - "pageID": "23680998", - "editDist": 0.0, - "labelProbability": 0.995669, - "logPopularity": 5.991464547107982, - "score": 0.925242503944315, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the name of justin bieber brother", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "name of justin bieber brother", "type": "CluePhrase", "weight": 0.99 }, - { "label": "name", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "justin bieber brother", "type": "CluePhrase", "weight": 0.99 } - ] - }, - { - "qId": "wqr000020", - "qText": "where to fly into bali?", - "SV": ["fly"], - "lemmaSV": ["fly"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "To Fly!", - "cookedLabel": "To Fly!", - "pageID": "76390", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.174387269895637, - "score": 0.8093029754987044, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Bali", - "cookedLabel": "Bali", - "pageID": "4147", - "editDist": 0.0, - "labelProbability": 0.882369, - "logPopularity": 5.8664680569332965, - "score": 0.8721413788784711, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "to fly", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000040", - "qText": "what is cher's son's name?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9937706", "text": "child", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9559169", "text": "God", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5950141", "text": "belief", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10393697", "text": "offspring", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9561132", "text": "hypostasis", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9647338", "text": "male", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10305635", "text": "male offspring", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "9527267", "text": "spiritual being", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "9559474", "text": "Godhead", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10255246", "text": "relative", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "son", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Humphrey de Cherlton", - "cookedLabel": "Humphrey de Cherlton", - "pageID": "36553055", - "editDist": 2.3, - "labelProbability": 0.0, - "logPopularity": 3.295836866004329, - "score": 0.011909595991362784, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Cher", - "cookedLabel": "Cher", - "pageID": "80696", - "editDist": 0.0, - "labelProbability": 0.74216, - "logPopularity": 6.326149473155099, - "score": 0.8251100043767303, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Sonny & Cher", - "cookedLabel": "Sonny & Cher", - "pageID": "113446", - "editDist": 0.0, - "labelProbability": 0.0646615, - "logPopularity": 4.955827057601261, - "score": 0.0332190417502861, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Gypsys, Tramps & Thieves (album)", - "cookedLabel": "Gypsys, Tramps & Thieves", - "pageID": "4437004", - "editDist": 0.0, - "labelProbability": 0.0646615, - "logPopularity": 4.709530201312334, - "score": 0.028786938773518025, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cher Lloyd", - "cookedLabel": "Cher Lloyd", - "pageID": "29044071", - "editDist": 0.0, - "labelProbability": 0.0646615, - "logPopularity": 4.7535901911063645, - "score": 0.029535308046203673, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cher (department)", - "cookedLabel": "Cher", - "pageID": "80697", - "editDist": 0.0, - "labelProbability": 0.141742, - "logPopularity": 5.976350909297934, - "score": 0.08284955932906549, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "cher's son's name", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "cher", "type": "ClueNE", "weight": 1.09 }, - { "label": "name", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr000060", - "qText": "what countries do people speak portuguese?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7958392", "text": "people", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "countries", "specificity": "0.0", "type": "LAT" }, - { "text": "country", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "People (magazine)", - "cookedLabel": "People", - "pageID": "507970", - "editDist": 0.0, - "labelProbability": 0.174827, - "logPopularity": 4.584967478670572, - "score": 0.10998291320708259, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "People", - "cookedLabel": "People", - "pageID": "3488351", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.0794415416798357, - "score": 0.031159073502995693, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Portuguese language", - "cookedLabel": "Portuguese language", - "pageID": "23915", - "editDist": 0.0, - "labelProbability": 0.441879, - "logPopularity": 7.486613313139955, - "score": 0.7042009586468194, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Sonnets from the Portuguese", - "cookedLabel": "Sonnets from the Portuguese", - "pageID": "1102758", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.044522437723423, - "score": 0.02075436300687559, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Portuguese man o' war", - "cookedLabel": "Portuguese man o' war", - "pageID": "152952", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.8501476017100584, - "score": 0.03322510904474677, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Portuguese people", - "cookedLabel": "Portuguese people", - "pageID": "970642", - "editDist": 0.0, - "labelProbability": 0.102717, - "logPopularity": 6.543911845564792, - "score": 0.22145882746554235, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Portugal", - "cookedLabel": "Portugal", - "pageID": "23033", - "editDist": 0.0, - "labelProbability": 0.256593, - "logPopularity": 9.178746500385005, - "score": 0.5112655077499029, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "portuguese", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000080", - "qText": "who was vincent van gogh inspired by?", - "SV": ["inspired"], - "lemmaSV": ["inspire"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Vincent van Gogh", - "cookedLabel": "Vincent van Gogh", - "pageID": "32603", - "editDist": 0.0, - "labelProbability": 0.993177, - "logPopularity": 6.154858094016418, - "score": 0.9807994905546668, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000100", - "qText": "when will oscar pistorius compete?", - "SV": ["compete"], - "lemmaSV": ["compete"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "Oscar Pistorius", - "cookedLabel": "Oscar Pistorius", - "pageID": "5729054", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.653960350157523, - "score": 0.955394663736244, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000120", - "qText": "who plays meg in family guy?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Meg Griffin", - "cookedLabel": "Meg Griffin", - "pageID": "723502", - "editDist": 0.0, - "labelProbability": 0.458351, - "logPopularity": 4.127134385045092, - "score": 0.2549178022150101, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Meg Ryan", - "cookedLabel": "Meg Ryan", - "pageID": "51799", - "editDist": 0.0, - "labelProbability": 0.12473, - "logPopularity": 5.017279836814924, - "score": 0.11185170037908035, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Meg (singer)", - "cookedLabel": "Meg", - "pageID": "22474590", - "editDist": 0.0, - "labelProbability": 0.12473, - "logPopularity": 4.77912349311153, - "score": 0.09842432271625319, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Meg Tilly", - "cookedLabel": "Meg Tilly", - "pageID": "667049", - "editDist": 0.0, - "labelProbability": 0.12473, - "logPopularity": 4.74493212836325, - "score": 0.09661886993478508, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Meg Whitman", - "cookedLabel": "Meg Whitman", - "pageID": "741886", - "editDist": 0.0, - "labelProbability": 0.12473, - "logPopularity": 4.836281906951478, - "score": 0.10150969453190346, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Family Guy", - "cookedLabel": "Family Guy", - "pageID": "187586", - "editDist": 0.0, - "labelProbability": 0.964212, - "logPopularity": 6.248042874508429, - "score": 0.9458401875550986, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "meg", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000140", - "qText": "what stadium did the chicago cardinals play in?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4348764", "text": "structure", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "stadium", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Arizona Cardinals", - "cookedLabel": "Arizona Cardinals", - "pageID": "2102", - "editDist": 0.0, - "labelProbability": 0.260508, - "logPopularity": 7.520776415062797, - "score": 0.5985305536890859, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "History of the Arizona Cardinals", - "cookedLabel": "History of the Arizona Cardinals", - "pageID": "9997766", - "editDist": 0.0, - "labelProbability": 0.482546, - "logPopularity": 3.258096538021482, - "score": 0.24275810950836152, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "History of the Chicago Cardinals", - "cookedLabel": "History of the Chicago Cardinals", - "pageID": "34268853", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 5.583496308781699, - "score": 0.2690152293647376, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "chicago cardinals", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr000160", - "qText": "who was the apostle paul considered to be?", - "SV": ["considered"], - "lemmaSV": ["consider"], - "LAT": [ - { "synset": "6202938", "text": "attitude", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6218486", "text": "position", "specificity": "0.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6217756", "text": "orientation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "considered", "specificity": "0.0", "type": "LAT" }, - { "text": "consider", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Paul the Apostle", - "cookedLabel": "Paul the Apostle", - "pageID": "24140", - "editDist": 0.0, - "labelProbability": 0.994799, - "logPopularity": 5.762051382780177, - "score": 0.975995274797593, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Copula (linguistics)", - "cookedLabel": "Copula", - "pageID": "5630", - "editDist": 0.0, - "labelProbability": 0.20202, - "logPopularity": 3.5263605246161616, - "score": 0.09310822918934004, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "To Be (Ayumi Hamasaki song)", - "cookedLabel": "To Be", - "pageID": "3427390", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.060443010546419, - "score": 0.05168550302843927, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "apostle paul", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr000180", - "qText": "which countries are part of the united kingdom?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7958392", "text": "people", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "countries", "specificity": "0.0", "type": "LAT" }, - { "text": "country", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "High Sheriff of Lancashire", - "cookedLabel": "High Sheriff of Lancashire", - "pageID": "13807427", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 3.7376696182833684, - "score": 0.31655453251418925, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "High Sheriff of Yorkshire", - "cookedLabel": "High Sheriff of Yorkshire", - "pageID": "7326096", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 3.6635616461296463, - "score": 0.3070142417577131, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Art of the United Kingdom", - "cookedLabel": "Art of the United Kingdom", - "pageID": "1230235", - "editDist": 1.0, - "labelProbability": 0.0, - "logPopularity": 3.871201010907891, - "score": 0.15238646386916124, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "United Kingdom", - "cookedLabel": "United Kingdom", - "pageID": "31717", - "editDist": 0.0, - "labelProbability": 0.801464, - "logPopularity": 11.570967932364097, - "score": 0.9950615828111425, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "the united kingdom?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000200", - "qText": "what college did kevjumba?", - "SV": ["kevjumba"], - "lemmaSV": ["kevjumba"], - "LAT": [ - { "synset": "8070328", "text": "institution", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8293263", "text": "educational institution", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7981699", "text": "body", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4348764", "text": "structure", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "2918337", "text": "building complex", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "college", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [], - "Clue": [] - }, - { - "qId": "wqr000220", - "qText": "what sort of government does brazil have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "13546128", "text": "operation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4749775", "text": "sameness", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13562370", "text": "processing", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4750845", "text": "similarity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29976", "text": "process", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13476660", "text": "data processing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "sort", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Sort Of", - "cookedLabel": "Sort Of", - "pageID": "6111466", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.7376696182833684, - "score": 0.007625638625247476, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Government", - "cookedLabel": "Government", - "pageID": "12229", - "editDist": 0.0, - "labelProbability": 0.139222, - "logPopularity": 5.5093883366279774, - "score": 0.15315169535105763, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Brazil", - "cookedLabel": "Brazil", - "pageID": "3383", - "editDist": 0.0, - "labelProbability": 0.671435, - "logPopularity": 10.453572350254236, - "score": 0.9909755240376303, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Brazil national football team", - "cookedLabel": "Brazil national football team", - "pageID": "149286", - "editDist": 0.0, - "labelProbability": 0.0639685, - "logPopularity": 7.789868559054706, - "score": 0.336803569077327, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "brazil", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr000240", - "qText": "what year was the great san francisco fire?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "15137796", "text": "time period", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7991473", "text": "gathering", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13597072", "text": "fundamental quantity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "year", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "List of people known as The Great", - "cookedLabel": "the great", - "pageID": "214273", - "editDist": 0.0, - "labelProbability": 0.392, - "logPopularity": 2.5649493574615367, - "score": 0.0899022043760746, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "1906 San Francisco earthquake", - "cookedLabel": "1906 San Francisco earthquake", - "pageID": "20110714", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.7612001156935624, - "score": 0.926124312894242, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the great san francisco fire", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "great san francisco fire", "type": "CluePhrase", "weight": 0.99 }, - { "label": "san francisco fire", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr000260", - "qText": "where did rihanna grow up?", - "SV": ["grow"], - "lemmaSV": ["grow"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Rihanna", - "cookedLabel": "Rihanna", - "pageID": "2110323", - "editDist": 0.0, - "labelProbability": 0.990786, - "logPopularity": 6.236369590203704, - "score": 0.9743285979093527, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Grow Up (Svoy album)", - "cookedLabel": "Grow Up", - "pageID": "32182839", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.430816798843313, - "score": 0.04643010512375539, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up, Tony Phillips", - "cookedLabel": "Grow Up, Tony Phillips", - "pageID": "41237889", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.2188758248682006, - "score": 0.022990485979290737, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up (book)", - "cookedLabel": "Grow Up", - "pageID": "11645304", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.6635616461296463, - "score": 0.02981111954944062, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up (The Queers album)", - "cookedLabel": "Grow Up", - "pageID": "8796937", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.9889840465642745, - "score": 0.03600738939439352, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "grow up", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000280", - "qText": "where is the kakadu national park located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Kakadu National Park", - "cookedLabel": "Kakadu National Park", - "pageID": "101655", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.2626798770413155, - "score": 0.9442496570627794, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000300", - "qText": "where is the ottoman empire located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Ottoman Empire", - "cookedLabel": "Ottoman Empire", - "pageID": "22278", - "editDist": 0.0, - "labelProbability": 0.968988, - "logPopularity": 8.262300941787448, - "score": 0.9938594960042121, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000320", - "qText": "where is tom cruise from?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Tom Cruise", - "cookedLabel": "Tom Cruise", - "pageID": "31460", - "editDist": 0.0, - "labelProbability": 0.96695, - "logPopularity": 5.442417710521793, - "score": 0.9672458310948574, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000340", - "qText": "what should you see in london?", - "SV": ["see"], - "lemmaSV": ["see"], - "LAT": [{ "text": "you", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "You", - "cookedLabel": "You", - "pageID": "464907", - "editDist": 0.0, - "labelProbability": 0.166744, - "logPopularity": 3.332204510175204, - "score": 0.01312126268186723, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "You (Ten Sharp song)", - "cookedLabel": "You", - "pageID": "18041571", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.454347296253507, - "score": 0.007020891368638913, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "You (Juju album)", - "cookedLabel": "You", - "pageID": "32465927", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.584967478670572, - "score": 0.0075889151861677365, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "To Know That You're Alive", - "cookedLabel": "To Know That You're Alive", - "pageID": "16113542", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.543294782270004, - "score": 0.007402908081191978, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "You County", - "cookedLabel": "You County", - "pageID": "24702306", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.477336814478207, - "score": 0.007117710644484163, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "B.B. King in London", - "cookedLabel": "B.B. King in London", - "pageID": "13253896", - "editDist": 0.0, - "labelProbability": 0.912892, - "logPopularity": 3.970291913552122, - "score": 0.7786155377852918, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "In London (Dewey Redman album)", - "cookedLabel": "In London", - "pageID": "31662165", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.8501476017100584, - "score": 0.04583958488949033, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "In London (Ravi Shankar album)", - "cookedLabel": "In London", - "pageID": "23803159", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.007333185232471, - "score": 0.05014577939757896, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "in london?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000360", - "qText": "what did kate winslet get an oscar for?", - "SV": ["get"], - "lemmaSV": ["get"], - "LAT": [{ "text": "winslet", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Kate Winslet", - "cookedLabel": "Kate Winslet", - "pageID": "52707", - "editDist": 0.0, - "labelProbability": 0.998207, - "logPopularity": 5.003946305945459, - "score": 0.8624249110139923, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Academy Awards", - "cookedLabel": "Academy Awards", - "pageID": "324", - "editDist": 0.0, - "labelProbability": 0.680424, - "logPopularity": 4.5217885770490405, - "score": 0.30982989450411025, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Oscar De La Hoya", - "cookedLabel": "Oscar De La Hoya", - "pageID": "95310", - "editDist": 0.0, - "labelProbability": 0.096522, - "logPopularity": 5.153291594497779, - "score": 0.10716640977370843, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Oscar I of Sweden", - "cookedLabel": "Oscar I of Sweden", - "pageID": "38746", - "editDist": 0.0, - "labelProbability": 0.096522, - "logPopularity": 5.062595033026967, - "score": 0.10206990524289294, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Oscar II of Sweden", - "cookedLabel": "Oscar II of Sweden", - "pageID": "104650", - "editDist": 0.0, - "labelProbability": 0.096522, - "logPopularity": 5.181783550292085, - "score": 0.10881309435169641, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Oscar Wilde", - "cookedLabel": "Oscar Wilde", - "pageID": "22614", - "editDist": 0.0, - "labelProbability": 0.096522, - "logPopularity": 5.605802066295998, - "score": 0.13604722377665357, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "oscar", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000380", - "qText": "where did pavlova originate?", - "SV": ["originate"], - "lemmaSV": ["originate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Pavlova (food)", - "cookedLabel": "Pavlova", - "pageID": "67081", - "editDist": 0.0, - "labelProbability": 0.923677, - "logPopularity": 3.5263605246161616, - "score": 0.8457864866370466, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Anna Pavlova (gymnast)", - "cookedLabel": "Anna Pavlova", - "pageID": "1531928", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.343805421853684, - "score": 0.11121386998928041, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Karolina Pavlova", - "cookedLabel": "Karolina Pavlova", - "pageID": "25434748", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.143134726391533, - "score": 0.09985814500271337, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Anna Pavlova (film)", - "cookedLabel": "Anna Pavlova", - "pageID": "30828989", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.110873864173311, - "score": 0.09813171286801613, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Anna Pavlova", - "cookedLabel": "Anna Pavlova", - "pageID": "63157", - "editDist": 0.0, - "labelProbability": 0.0607112, - "logPopularity": 4.605170185988092, - "score": 0.16556105820547126, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "pavlova", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr000400", - "qText": "what state is harvard college located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [ - { "synset": "7495208", "text": "emotion", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8139116", "text": "federal department", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8094128", "text": "administrative unit", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "11428673", "text": "natural phenomenon", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "34512", "text": "phenomenon", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8136796", "text": "government department", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8140150", "text": "executive department", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8067137", "text": "polity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14009303", "text": "emotional state", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "11429173", "text": "chemical phenomenon", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8067430", "text": "government", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8237635", "text": "division", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "26390", "text": "feeling", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8131836", "text": "department", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "state", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Harvard College", - "cookedLabel": "Harvard College", - "pageID": "260879", - "editDist": 0.0, - "labelProbability": 0.884069, - "logPopularity": 6.642486801367256, - "score": 0.9764428806488475, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harvard University", - "cookedLabel": "Harvard University", - "pageID": "18426501", - "editDist": 0.0, - "labelProbability": 0.113872, - "logPopularity": 8.537191877922927, - "score": 0.5830172367337914, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "harvard college", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr000420", - "qText": "what state did al gore represent?", - "SV": ["represent"], - "lemmaSV": ["represent"], - "LAT": [ - { "synset": "7495208", "text": "emotion", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8139116", "text": "federal department", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8094128", "text": "administrative unit", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "11428673", "text": "natural phenomenon", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "34512", "text": "phenomenon", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8136796", "text": "government department", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8140150", "text": "executive department", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8067137", "text": "polity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14009303", "text": "emotional state", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "11429173", "text": "chemical phenomenon", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8067430", "text": "government", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8237635", "text": "division", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "26390", "text": "feeling", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8131836", "text": "department", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "state", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Al Gore", - "cookedLabel": "Al Gore", - "pageID": "5042706", - "editDist": 0.0, - "labelProbability": 0.973792, - "logPopularity": 6.0844994130751715, - "score": 0.9781639206533114, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000440", - "qText": "what was the first name of the washington redskins?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9659294", "text": "person of color", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9664887", "text": "Amerindian", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "redskins", "specificity": "0.0", "type": "LAT" }, - { "text": "redskin", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Given name", - "cookedLabel": "Given name", - "pageID": "247991", - "editDist": 0.0, - "labelProbability": 0.666667, - "logPopularity": 3.4965075614664802, - "score": 0.6979694032215839, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Solomon Islands", - "cookedLabel": "Solomon Islands", - "pageID": "265083", - "editDist": 0.0, - "labelProbability": 0.333333, - "logPopularity": 6.6895992691789665, - "score": 0.5585770570841422, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Louis IX of France", - "cookedLabel": "Louis IX of France", - "pageID": "18549", - "editDist": 0.0, - "labelProbability": 0.333333, - "logPopularity": 5.241747015059643, - "score": 0.3467612192375154, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Division of Solomon", - "cookedLabel": "Division of Solomon", - "pageID": "2171828", - "editDist": 0.0, - "labelProbability": 0.333333, - "logPopularity": 5.117993812416755, - "score": 0.3301389058568271, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Solomon Burke", - "cookedLabel": "Solomon Burke", - "pageID": "164477", - "editDist": 0.0, - "labelProbability": 0.333333, - "logPopularity": 5.262690188904886, - "score": 0.3496130227702334, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Washington Redskins", - "cookedLabel": "Washington Redskins", - "pageID": "33673", - "editDist": 0.0, - "labelProbability": 0.883442, - "logPopularity": 8.042056410058754, - "score": 0.8942421484366897, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the first name of the washington redskins", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "first name of the washington redskins", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the washington redskins?", "type": "ClueNE", "weight": 1.6 } - ] - }, - { - "qId": "wqr000460", - "qText": "what made ancient rome fall?", - "SV": ["fall"], - "lemmaSV": ["fall"], - "LAT": [ - { "synset": "8540894", "text": "center", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "8693705", "text": "urban area", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8665520", "text": "seat", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7981699", "text": "body", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8398167", "text": "leadership", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8542298", "text": "city", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8709407", "text": "national capital", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "8514304", "text": "area", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8643858", "text": "municipality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8535783", "text": "capital", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "rome", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Ancient Rome", - "cookedLabel": "Ancient Rome", - "pageID": "521555", - "editDist": 0.0, - "labelProbability": 0.872097, - "logPopularity": 5.332718793265369, - "score": 0.8104863546947404, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000480", - "qText": "who does kris humphries play for in the nba?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "38116", "text": "action", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "41926", "text": "playing", "specificity": "0.0", "type": "WordnetLAT" }, - { "text": "play", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Kris Humphries", - "cookedLabel": "Kris Humphries", - "pageID": "2312705", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.51085950651685, - "score": 0.9515892730037312, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "National Basketball Association", - "cookedLabel": "National Basketball Association", - "pageID": "22093", - "editDist": 0.0, - "labelProbability": 0.881188, - "logPopularity": 7.202661196523238, - "score": 0.9379815949530763, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "nba", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000500", - "qText": "where did sir ernest shackleton come from?", - "SV": ["come"], - "lemmaSV": ["come"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Ernest Shackleton", - "cookedLabel": "Ernest Shackleton", - "pageID": "60004", - "editDist": 0.0, - "labelProbability": 0.994582, - "logPopularity": 4.5217885770490405, - "score": 0.9507370650288589, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "COMEFROM", - "cookedLabel": "COMEFROM", - "pageID": "994284", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 1.6094379124341003, - "score": 0.012369376838424827, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "sir ernest shackleton", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr000520", - "qText": "who does the islamic worship?", - "SV": ["does"], - "lemmaSV": ["do"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Islam", - "cookedLabel": "Islam", - "pageID": "6037917", - "editDist": 0.0, - "labelProbability": 0.714034, - "logPopularity": 8.91811465947453, - "score": 0.9515347415314166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "islamic worship", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the islamic", "type": "ClueNE", "weight": 1.11 }, - { "label": "worship", "type": "ClueToken", "weight": 1.0 } - ] - }, - { - "qId": "wqr000540", - "qText": "what does pixar produce?", - "SV": ["produce"], - "lemmaSV": ["produce"], - "LAT": [{ "text": "pixar", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Pixar", - "cookedLabel": "Pixar", - "pageID": "78969", - "editDist": 0.0, - "labelProbability": 0.982097, - "logPopularity": 5.351858133476067, - "score": 0.8368938289140296, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Question mark", - "cookedLabel": "Question mark", - "pageID": "59348", - "editDist": 0.0, - "labelProbability": 0.877681, - "logPopularity": 5.262690188904886, - "score": 0.8229142522615168, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "?", "type": "ClueNE", "weight": 2.1 }] - }, - { - "qId": "wqr000560", - "qText": "how many teams are there in the ncaa football?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "33914", "text": "amount", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "How Many", - "cookedLabel": "How Many", - "pageID": "10680822", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.7612001156935624, - "score": 0.7680939451144396, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "College football", - "cookedLabel": "College football", - "pageID": "6771", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 6.137727054086234, - "score": 0.15933808868098598, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "teams", "type": "ClueToken", "weight": 1.0 }, - { "label": "the ncaa football?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr000580", - "qText": "what movies has taylor lautner?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "4014270", "text": "product", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6631572", "text": "show", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7303344", "text": "social event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "movies", "specificity": "0.0", "type": "LAT" }, - { "text": "movie", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Taylor Lautner", - "cookedLabel": "Taylor Lautner", - "pageID": "13199916", - "editDist": 0.0, - "labelProbability": 0.999293, - "logPopularity": 4.700480365792417, - "score": 0.8902076395507517, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000600", - "qText": "what episode does rukia fade away?", - "SV": ["fade"], - "lemmaSV": ["fade"], - "LAT": [ - { "synset": "6360590", "text": "written communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7303344", "text": "social event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3932650", "text": "photographic paper", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6403644", "text": "section", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6631572", "text": "show", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6631935", "text": "broadcast", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3932386", "text": "photographic equipment", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6374360", "text": "writing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3298959", "text": "equipment", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "3343766", "text": "film", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "episode", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Rukia Kuchiki", - "cookedLabel": "Rukia Kuchiki", - "pageID": "2356421", - "editDist": 0.0, - "labelProbability": 0.792359, - "logPopularity": 4.143134726391533, - "score": 0.8128115471721565, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Rukia (bird)", - "cookedLabel": "Rukia", - "pageID": "13050373", - "editDist": 0.0, - "labelProbability": 0.0963455, - "logPopularity": 3.6109179126442243, - "score": 0.11403551989252897, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Fade Away (song)", - "cookedLabel": "Fade Away", - "pageID": "16757390", - "editDist": 0.0, - "labelProbability": 0.561111, - "logPopularity": 4.060443010546419, - "score": 0.42427332414050917, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cigarettes & Alcohol", - "cookedLabel": "Cigarettes & Alcohol", - "pageID": "1460994", - "editDist": 0.0, - "labelProbability": 0.0777778, - "logPopularity": 4.30406509320417, - "score": 0.08465432110351759, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Another Animal", - "cookedLabel": "Another Animal", - "pageID": "13057317", - "editDist": 0.0, - "labelProbability": 0.0777778, - "logPopularity": 4.574710978503383, - "score": 0.09811550264895015, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Fade Away (EP)", - "cookedLabel": "Fade Away", - "pageID": "26755756", - "editDist": 0.0, - "labelProbability": 0.0777778, - "logPopularity": 3.6109179126442243, - "score": 0.057507839991317325, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Fade Away (Best Coast album)", - "cookedLabel": "Fade Away", - "pageID": "40558516", - "editDist": 0.0, - "labelProbability": 0.0777778, - "logPopularity": 4.1588830833596715, - "score": 0.07814418086189585, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "rukia", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr000620", - "qText": "who does kurt busch drive for now?", - "SV": ["drive"], - "lemmaSV": ["drive"], - "LAT": [{ "text": "busch", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Kurt Busch", - "cookedLabel": "Kurt Busch", - "pageID": "525736", - "editDist": 0.0, - "labelProbability": 0.99964, - "logPopularity": 5.081404364984463, - "score": 0.8686002704395099, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Jordin Sparks discography", - "cookedLabel": "Jordin Sparks discography", - "pageID": "18194927", - "editDist": 0.0, - "labelProbability": 0.867925, - "logPopularity": 3.4657359027997265, - "score": 0.601860125368496, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "for now", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000640", - "qText": "what team is chris paul on?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8010371", "text": "animal group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "team", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Chris Paul", - "cookedLabel": "Chris Paul", - "pageID": "4987149", - "editDist": 0.0, - "labelProbability": 0.999753, - "logPopularity": 4.718498871295094, - "score": 0.9569693241599244, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000660", - "qText": "what happened to justin bieber 2012?", - "SV": ["happened"], - "lemmaSV": ["happen"], - "LAT": [], - "Concept": [ - { - "fullLabel": "What Happened", - "cookedLabel": "What Happened", - "pageID": "17648735", - "editDist": 0.0, - "labelProbability": 0.950207, - "logPopularity": 3.970291913552122, - "score": 0.7491640697900563, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Justin Bieber", - "cookedLabel": "Justin Bieber", - "pageID": "23680998", - "editDist": 0.0, - "labelProbability": 0.995669, - "logPopularity": 5.991464547107982, - "score": 0.925242503944315, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012", - "cookedLabel": "2012", - "pageID": "47374", - "editDist": 0.0, - "labelProbability": 0.370218, - "logPopularity": 2.833213344056216, - "score": 0.0950054105606654, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012 phenomenon", - "cookedLabel": "2012 phenomenon", - "pageID": "21538638", - "editDist": 0.0, - "labelProbability": 0.062167, - "logPopularity": 4.219507705176107, - "score": 0.02137140937577736, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012 (film)", - "cookedLabel": "2012", - "pageID": "18436536", - "editDist": 0.0, - "labelProbability": 0.128225, - "logPopularity": 4.736198448394496, - "score": 0.03877421682047329, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "justin bieber 2012", "type": "CluePhrase", "weight": 0.99 }, - { "label": "2012", "type": "ClueNE", "weight": 2.1 } - ] - }, - { - "qId": "wqr000680", - "qText": "where did kurds originate from?", - "SV": ["originate"], - "lemmaSV": ["originate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Kurdish people", - "cookedLabel": "Kurdish people", - "pageID": "17068", - "editDist": 0.0, - "labelProbability": 0.91787, - "logPopularity": 5.25227342804663, - "score": 0.9376589245135472, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "kurds", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr000700", - "qText": "who played juni in spy kids 4?", - "SV": ["played"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "June", - "cookedLabel": "June", - "pageID": "15785", - "editDist": 0.0, - "labelProbability": 0.957049, - "logPopularity": 2.8903717578961645, - "score": 0.3756061986134487, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Juhni", - "cookedLabel": "Juhni", - "pageID": "40864001", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.060443010546419, - "score": 0.03752557266374662, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Juni (album)", - "cookedLabel": "Juni", - "pageID": "33741916", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.828641396489095, - "score": 0.03281312068885659, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Juan de Juni", - "cookedLabel": "Juan de Juni", - "pageID": "14349994", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.5553480614894135, - "score": 0.027989524351936867, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Juni Cortez", - "cookedLabel": "Juni Cortez", - "pageID": "565107", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.912023005428146, - "score": 0.03443848150516163, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Spy Kids: All the Time in the World", - "cookedLabel": "Spy Kids: All the Time in the World", - "pageID": "29384326", - "editDist": 0.0, - "labelProbability": 0.942177, - "logPopularity": 4.174387269895637, - "score": 0.8197537507990824, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "spy kids 4", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000720", - "qText": "what are the major imports of the united states?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9652940", "text": "traveler", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10123254", "text": "foreigner", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5145753", "text": "value", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3080712", "text": "commodity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5145473", "text": "worth", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5177340", "text": "significance", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6611268", "text": "message", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5928460", "text": "meaning", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5175788", "text": "importance", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "imports", "specificity": "0.0", "type": "LAT" }, - { "text": "import", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "International trade", - "cookedLabel": "International trade", - "pageID": "14567", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.356708826689592, - "score": 0.5335993809709476, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Water export", - "cookedLabel": "Water export", - "pageID": "12517724", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 1.6094379124341003, - "score": 0.014535818203352374, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Major", - "cookedLabel": "Major", - "pageID": "201920", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 6.82001636467413, - "score": 0.16955685053201597, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "The Major", - "cookedLabel": "The Major", - "pageID": "9600545", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.995732273553991, - "score": 0.020167690730618193, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "United States", - "cookedLabel": "United States", - "pageID": "3434750", - "editDist": 0.0, - "labelProbability": 0.836336, - "logPopularity": 13.02522232257073, - "score": 0.9982363165377705, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "States and union territories of India", - "cookedLabel": "States and union territories of India", - "pageID": "375986", - "editDist": 0.0, - "labelProbability": 0.0869492, - "logPopularity": 9.911654115202522, - "score": 0.6661774336773838, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "States of Nigeria", - "cookedLabel": "States of Nigeria", - "pageID": "226734", - "editDist": 0.0, - "labelProbability": 0.0869492, - "logPopularity": 6.932447891572509, - "score": 0.2503879062273689, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Administrative divisions of Mexico", - "cookedLabel": "Administrative divisions of Mexico", - "pageID": "87990", - "editDist": 0.0, - "labelProbability": 0.0869492, - "logPopularity": 7.814803429489359, - "score": 0.36189680695012594, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "States of Brazil", - "cookedLabel": "States of Brazil", - "pageID": "229379", - "editDist": 0.0, - "labelProbability": 0.165626, - "logPopularity": 8.14902386805177, - "score": 0.49875640020650774, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "U.S. state", - "cookedLabel": "U.S. state", - "pageID": "18618239", - "editDist": 0.0, - "labelProbability": 0.198042, - "logPopularity": 8.919453168575453, - "score": 0.6470905751810023, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the major imports of the united states", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "major imports of the united states", "type": "CluePhrase", "weight": 0.99 }, - { "label": "states?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr000740", - "qText": "where did martin luther king junior go to college?", - "SV": ["go"], - "lemmaSV": ["go"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Martin Luther King, Jr.", - "cookedLabel": "Martin Luther King, Jr.", - "pageID": "20076", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.786897381366708, - "score": 0.9768880164204938, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Goto", - "cookedLabel": "Goto", - "pageID": "23307350", - "editDist": 0.0, - "labelProbability": 0.233129, - "logPopularity": 3.258096538021482, - "score": 0.06728267116099904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "College", - "cookedLabel": "College", - "pageID": "5689", - "editDist": 0.0, - "labelProbability": 0.29875, - "logPopularity": 5.497168225293202, - "score": 0.27206828588388454, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lists of American institutions of higher education", - "cookedLabel": "Lists of American institutions of higher education", - "pageID": "322811", - "editDist": 0.0, - "labelProbability": 0.0517783, - "logPopularity": 2.8903717578961645, - "score": 0.009291532739167302, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of college athletic conferences in the United States", - "cookedLabel": "college", - "pageID": "577952", - "editDist": 0.0, - "labelProbability": 0.304197, - "logPopularity": 1.9459101490553132, - "score": 0.016696137164309035, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "martin luther king junior", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr000760", - "qText": "what legal system does germany use?", - "SV": ["use"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "19308", "text": "natural object", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5226062", "text": "live body", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5227735", "text": "body part", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5223633", "text": "body", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9408804", "text": "part", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5911139", "text": "plan of action", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624569", "text": "know-how", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "2452", "text": "thing", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5907175", "text": "plan", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5734290", "text": "structure", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5668113", "text": "method", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4774586", "text": "regularity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "21007", "text": "matter", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4775722", "text": "orderliness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "system", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "List of national legal systems", - "cookedLabel": "legal system", - "pageID": "154708", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.833213344056216, - "score": 0.004446205345094004, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Germany", - "cookedLabel": "Germany", - "pageID": "11867", - "editDist": 0.0, - "labelProbability": 0.731908, - "logPopularity": 11.210644004861829, - "score": 0.995640149842911, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000780", - "qText": "who invented arabic alphabet?", - "SV": ["invented"], - "lemmaSV": ["invent"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Arabic alphabet", - "cookedLabel": "Arabic alphabet", - "pageID": "2204", - "editDist": 0.0, - "labelProbability": 0.782609, - "logPopularity": 4.7535901911063645, - "score": 0.7556064975707828, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Morse code for non-Latin alphabets", - "cookedLabel": "Morse code for non-Latin alphabets", - "pageID": "17878606", - "editDist": 0.0, - "labelProbability": 0.211957, - "logPopularity": 2.5649493574615367, - "score": 0.02202449782540842, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "arabic alphabet", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr000800", - "qText": "where is lake waynoka ohio?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Lake Waynoka, Ohio", - "cookedLabel": "Lake Waynoka, Ohio", - "pageID": "9176251", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.356708826689592, - "score": 0.8256172039061515, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "lake waynoka ohio", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "lake waynoka", "type": "ClueNE", "weight": 1.11 }, - { "label": "ohio", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr000820", - "qText": "where is made kia car?", - "SV": ["made"], - "lemmaSV": ["make"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Skip car", - "cookedLabel": "Skip car", - "pageID": "22258640", - "editDist": 2.0, - "labelProbability": 0.0, - "logPopularity": 0.6931471805599453, - "score": 0.009538173494138642, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "kia car", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000840", - "qText": "what is the zip code for moorpark ca?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6371284", "text": "writing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6365164", "text": "coding system", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "code", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "ZIP code", - "cookedLabel": "ZIP code", - "pageID": "51550", - "editDist": 0.0, - "labelProbability": 0.965969, - "logPopularity": 10.142898076314209, - "score": 0.9915968566201331, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Genetics", - "cookedLabel": "Genetics", - "pageID": "12266", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.991464547107982, - "score": 0.9266078744173097, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Moorpark, California", - "cookedLabel": "Moorpark, California", - "pageID": "108333", - "editDist": 0.0, - "labelProbability": 0.94247, - "logPopularity": 5.308267697401205, - "score": 0.9457017672563924, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the zip code for moorpark ca", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "zip code for moorpark ca", "type": "CluePhrase", "weight": 0.99 }, - { "label": "moorpark", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr000860", - "qText": "what other cars does gm make?", - "SV": ["make"], - "lemmaSV": ["make"], - "LAT": [ - { "synset": "3099154", "text": "container", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "3796768", "text": "motor vehicle", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4112987", "text": "room", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "4531608", "text": "vehicle", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4348764", "text": "structure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "2738693", "text": "area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3083745", "text": "compartment", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3105141", "text": "conveyance", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "4177098", "text": "self-propelled vehicle", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4583497", "text": "wheeled vehicle", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "cars", "specificity": "0.0", "type": "LAT" }, - { "text": "car", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Germany", - "cookedLabel": "Germany", - "pageID": "11867", - "editDist": 0.0, - "labelProbability": 0.0509596, - "logPopularity": 11.210644004861829, - "score": 0.9089572500239089, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of Latin-script digraphs", - "cookedLabel": "gm", - "pageID": "22469831", - "editDist": 0.0, - "labelProbability": 0.0509596, - "logPopularity": 6.22455842927536, - "score": 0.3338934725624929, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "General Mills", - "cookedLabel": "General Mills", - "pageID": "164902", - "editDist": 0.0, - "labelProbability": 0.0509596, - "logPopularity": 4.875197323201151, - "score": 0.18239130735317602, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "General manager", - "cookedLabel": "General manager", - "pageID": "627189", - "editDist": 0.0, - "labelProbability": 0.0509596, - "logPopularity": 5.214935757608986, - "score": 0.21477318267976536, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "General Motors", - "cookedLabel": "General Motors", - "pageID": "12102", - "editDist": 0.0, - "labelProbability": 0.0509596, - "logPopularity": 6.863803391452954, - "score": 0.42382576537505606, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "other", "type": "ClueToken", "weight": 1.0 }, - { "label": "gm", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr000880", - "qText": "what did richard nixon do for a living before he became president?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [ - { "synset": "9633690", "text": "communicator", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10486961", "text": "President of the United States", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10371605", "text": "negotiator", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10184340", "text": "head of state", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "10541628", "text": "representative", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "nixon", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Richard Nixon", - "cookedLabel": "Richard Nixon", - "pageID": "25473", - "editDist": 0.0, - "labelProbability": 0.987793, - "logPopularity": 6.630683385642372, - "score": 0.9406843589036394, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "For a Living", - "cookedLabel": "For a Living", - "pageID": "14785803", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.8066624897703196, - "score": 0.7729170949611157, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Helium", - "cookedLabel": "Helium", - "pageID": "13256", - "editDist": 0.0, - "labelProbability": 0.520629, - "logPopularity": 3.8066624897703196, - "score": 0.1229830065885628, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "He", - "cookedLabel": "He", - "pageID": "225073", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.772588722239781, - "score": 0.0176852074788872, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "President", - "cookedLabel": "President", - "pageID": "24110", - "editDist": 0.0, - "labelProbability": 0.153158, - "logPopularity": 7.502186486602924, - "score": 0.3892706865031215, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "President of the United States", - "cookedLabel": "President of the United States", - "pageID": "24113", - "editDist": 0.0, - "labelProbability": 0.280447, - "logPopularity": 6.082218910376446, - "score": 0.15405196892621265, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "president", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000900", - "qText": "where did henry hudson travel?", - "SV": ["travel"], - "lemmaSV": ["travel"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Henry Hudson", - "cookedLabel": "Henry Hudson", - "pageID": "44014", - "editDist": 0.0, - "labelProbability": 0.97447, - "logPopularity": 4.189654742026425, - "score": 0.9351326584066884, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000920", - "qText": "what did egyptians speak?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "6916947", "text": "natural language", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6293304", "text": "language", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9657682", "text": "African", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6999218", "text": "Afroasiatic", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "egyptians", "specificity": "0.0", "type": "LAT" }, - { "text": "egyptian", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Egyptians", - "cookedLabel": "Egyptians", - "pageID": "31912046", - "editDist": 0.0, - "labelProbability": 0.481658, - "logPopularity": 5.84354441703136, - "score": 0.40854932616940715, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ancient Egypt", - "cookedLabel": "Ancient Egypt", - "pageID": "874", - "editDist": 0.0, - "labelProbability": 0.237587, - "logPopularity": 4.836281906951478, - "score": 0.043851998974517345, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Egypt", - "cookedLabel": "Egypt", - "pageID": "8087628", - "editDist": 0.0, - "labelProbability": 0.127367, - "logPopularity": 8.766394277049736, - "score": 0.22605990409483226, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "egyptians", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr000940", - "qText": "where did barack obama attend school?", - "SV": ["attend"], - "lemmaSV": ["attend"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Barack Obama", - "cookedLabel": "Barack Obama", - "pageID": "534366", - "editDist": 0.0, - "labelProbability": 0.987254, - "logPopularity": 7.902487437162855, - "score": 0.9929996109483711, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "School", - "cookedLabel": "School", - "pageID": "28022", - "editDist": 0.0, - "labelProbability": 0.351585, - "logPopularity": 4.634728988229636, - "score": 0.2211887797516702, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "School psychology", - "cookedLabel": "School psychology", - "pageID": "466785", - "editDist": 0.0, - "labelProbability": 0.0933467, - "logPopularity": 2.772588722239781, - "score": 0.010467927085618627, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of schools of philosophy", - "cookedLabel": "school", - "pageID": "7950118", - "editDist": 0.0, - "labelProbability": 0.274652, - "logPopularity": 2.3978952727983707, - "score": 0.019070493600576166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "school", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr000960", - "qText": "where north dakota located?", - "SV": ["north"], - "lemmaSV": ["north"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "North Dakota", - "cookedLabel": "North Dakota", - "pageID": "21651", - "editDist": 0.0, - "labelProbability": 0.623677, - "logPopularity": 7.631916513071252, - "score": 0.941929207138752, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr000980", - "qText": "who played todd manning on one life to live?", - "SV": ["played"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Todd Manning", - "cookedLabel": "Todd Manning", - "pageID": "713342", - "editDist": 0.0, - "labelProbability": 0.997139, - "logPopularity": 4.23410650459726, - "score": 0.8127787319459592, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "On One", - "cookedLabel": "On One", - "pageID": "18893398", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.713572066704308, - "score": 0.7629648266763546, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "One Life to Live", - "cookedLabel": "One Life to Live", - "pageID": "341990", - "editDist": 0.0, - "labelProbability": 0.917899, - "logPopularity": 5.517452896464707, - "score": 0.9010468545616608, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001000", - "qText": "where do the abenaki indians live?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Abenaki", - "cookedLabel": "Abenaki", - "pageID": "55012", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.970291913552122, - "score": 0.9342618591720794, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "abenaki indians", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr001020", - "qText": "where is isthmus of panama located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Isthmus of Panama", - "cookedLabel": "Isthmus of Panama", - "pageID": "1404472", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.4965075614664802, - "score": 0.9144961673270963, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001040", - "qText": "what states does the connecticut river flow through?", - "SV": ["flow"], - "lemmaSV": ["flow"], - "LAT": [ - { "synset": "7495208", "text": "emotion", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8139116", "text": "federal department", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8094128", "text": "administrative unit", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "11428673", "text": "natural phenomenon", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "34512", "text": "phenomenon", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8136796", "text": "government department", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8140150", "text": "executive department", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8067137", "text": "polity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14009303", "text": "emotional state", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "11429173", "text": "chemical phenomenon", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8067430", "text": "government", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8237635", "text": "division", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "26390", "text": "feeling", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8131836", "text": "department", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "states", "specificity": "0.0", "type": "LAT" }, - { "text": "state", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Connecticut River", - "cookedLabel": "Connecticut River", - "pageID": "252145", - "editDist": 0.0, - "labelProbability": 0.997329, - "logPopularity": 5.802118375377063, - "score": 0.9768169065974456, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "River", - "cookedLabel": "River", - "pageID": "18842395", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 5.117993812416755, - "score": 0.06849977248051861, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Preposition and postposition", - "cookedLabel": "Preposition and postposition", - "pageID": "199358", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.6390573296152584, - "score": 0.016345844972803406, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "river flow", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001060", - "qText": "who is susan st james?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "9434308", "text": "river", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9818234", "text": "Apostle", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10184340", "text": "head of state", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "9248053", "text": "body of water", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2452", "text": "thing", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9651570", "text": "religious person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10371605", "text": "negotiator", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10035803", "text": "disciple", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9644715", "text": "intellectual", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9997190", "text": "criminal", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10251212", "text": "king", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10560786", "text": "ruler", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9657157", "text": "wrongdoer", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6454286", "text": "Epistle", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10566702", "text": "saint", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10119144", "text": "follower", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9528550", "text": "deity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10648006", "text": "sovereign", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10813654", "text": "writer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9697405", "text": "Christian", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9527267", "text": "spiritual being", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10577282", "text": "scholar", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9471510", "text": "stream", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9633690", "text": "communicator", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5950141", "text": "belief", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10443334", "text": "philosopher", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10508450", "text": "psychologist", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10580065", "text": "scientist", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9851208", "text": "bad person", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "6374360", "text": "writing", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10541628", "text": "representative", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10494535", "text": "principal", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10253142", "text": "King of England", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6403644", "text": "section", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6406508", "text": "book", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "james", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Susan Saint James", - "cookedLabel": "Susan Saint James", - "pageID": "937621", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.574710978503383, - "score": 0.04585533570323418, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Susan Saint James", - "cookedLabel": "Susan Saint James", - "pageID": "937621", - "editDist": 0.0, - "labelProbability": 0.227561, - "logPopularity": 4.574710978503383, - "score": 0.13414007012758933, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Saint James School (Montgomery, Alabama)", - "cookedLabel": "Saint James School", - "pageID": "4654822", - "editDist": 0.0, - "labelProbability": 0.227561, - "logPopularity": 4.30406509320417, - "score": 0.11637416465831421, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Saint James, Indiana", - "cookedLabel": "Saint James, Indiana", - "pageID": "15124957", - "editDist": 0.0, - "labelProbability": 0.227561, - "logPopularity": 4.0943445622221, - "score": 0.10404626095875653, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Saint James, Barbados", - "cookedLabel": "Saint James, Barbados", - "pageID": "1528028", - "editDist": 0.0, - "labelProbability": 0.227561, - "logPopularity": 4.442651256490317, - "score": 0.1252010789745587, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Saint James Parish, New Brunswick", - "cookedLabel": "Saint James Parish, New Brunswick", - "pageID": "39041822", - "editDist": 0.0, - "labelProbability": 0.227561, - "logPopularity": 4.477336814478207, - "score": 0.12749824947416874, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "susan st james", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr001080", - "qText": "what cancer did audrey hepburn died of?", - "SV": ["died"], - "lemmaSV": ["die"], - "LAT": [ - { "synset": "7957410", "text": "biological group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8008892", "text": "taxonomic group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "14262907", "text": "malignant tumor", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14075528", "text": "ill health", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8703415", "text": "sign of the zodiac", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14075399", "text": "pathological state", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "14085287", "text": "illness", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13943868", "text": "condition", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "14258682", "text": "tumor", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "1765166", "text": "arthropod genus", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "24900", "text": "state", "specificity": "-9.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14093842", "text": "disease", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "14057659", "text": "physical condition", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "8125938", "text": "genus", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9275876", "text": "constellation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8647614", "text": "region", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "14261043", "text": "malignancy", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "14257556", "text": "growth", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "19308", "text": "natural object", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "cancer", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Audrey Hepburn", - "cookedLabel": "Audrey Hepburn", - "pageID": "52139", - "editDist": 0.0, - "labelProbability": 0.99483, - "logPopularity": 5.1647859739235145, - "score": 0.9660063566164164, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001100", - "qText": "what do the buddha believe in?", - "SV": ["believe"], - "lemmaSV": ["believe"], - "LAT": [ - { "synset": "10363285", "text": "mystic", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10158287", "text": "good person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9867135", "text": "believer", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9651570", "text": "religious person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10566407", "text": "saint", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "buddha", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Gautama Buddha", - "cookedLabel": "Gautama Buddha", - "pageID": "3395", - "editDist": 0.0, - "labelProbability": 0.521722, - "logPopularity": 5.420534999272286, - "score": 0.47386472811390146, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Buddhahood", - "cookedLabel": "Buddhahood", - "pageID": "174976", - "editDist": 0.0, - "labelProbability": 0.313969, - "logPopularity": 3.4657359027997265, - "score": 0.03848139367395044, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Belief", - "cookedLabel": "Belief", - "pageID": "102883", - "editDist": 0.0, - "labelProbability": 0.928571, - "logPopularity": 2.70805020110221, - "score": 0.559067381405877, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "believe in", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001120", - "qText": "who became president after harding died?", - "SV": ["became"], - "lemmaSV": ["become"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "President", - "cookedLabel": "President", - "pageID": "24110", - "editDist": 0.0, - "labelProbability": 0.153158, - "logPopularity": 7.502186486602924, - "score": 0.3892706865031215, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "President of the United States", - "cookedLabel": "President of the United States", - "pageID": "24113", - "editDist": 0.0, - "labelProbability": 0.280447, - "logPopularity": 6.082218910376446, - "score": 0.15405196892621265, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harding Township, New Jersey", - "cookedLabel": "Harding Township, New Jersey", - "pageID": "125546", - "editDist": 0.0, - "labelProbability": 0.211929, - "logPopularity": 4.955827057601261, - "score": 0.3291666404956828, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harding, Minnesota", - "cookedLabel": "Harding, Minnesota", - "pageID": "120849", - "editDist": 0.0, - "labelProbability": 0.211929, - "logPopularity": 4.584967478670572, - "score": 0.28201907346285054, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harding, Wisconsin", - "cookedLabel": "Harding, Wisconsin", - "pageID": "139292", - "editDist": 0.0, - "labelProbability": 0.211929, - "logPopularity": 4.564348191467836, - "score": 0.2795208389287822, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harding County, South Dakota", - "cookedLabel": "Harding County, South Dakota", - "pageID": "91813", - "editDist": 0.0, - "labelProbability": 0.211929, - "logPopularity": 4.605170185988092, - "score": 0.2844799393380819, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Warren G. Harding", - "cookedLabel": "Warren G. Harding", - "pageID": "33060", - "editDist": 0.0, - "labelProbability": 0.290018, - "logPopularity": 5.720311776607412, - "score": 0.2931267792954646, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Death", - "cookedLabel": "Death", - "pageID": "8221", - "editDist": 0.0, - "labelProbability": 0.32493, - "logPopularity": 4.90527477843843, - "score": 0.22812175118227904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "president after harding died", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "harding", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr001140", - "qText": "who was the leader of germany in wwii?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "3754377", "text": "merchandise", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3330714", "text": "feature", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3080712", "text": "commodity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "leader", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "President of Germany (1919–45)", - "cookedLabel": "President of Germany", - "pageID": "407083", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.6888794541139363, - "score": 0.4338672682742019, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "States of Germany", - "cookedLabel": "States of Germany", - "pageID": "217450", - "editDist": 2.0, - "labelProbability": 0.0, - "logPopularity": 4.727387818712341, - "score": 0.06557484871615457, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Leadership", - "cookedLabel": "Leadership", - "pageID": "130918", - "editDist": 0.0, - "labelProbability": 0.303262, - "logPopularity": 4.343805421853684, - "score": 0.21074447522291953, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Leader of the Opposition (United Kingdom)", - "cookedLabel": "Leader of the Opposition", - "pageID": "711239", - "editDist": 0.0, - "labelProbability": 0.0698267, - "logPopularity": 4.663439094112067, - "score": 0.03963653931791342, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Leader (comics)", - "cookedLabel": "Leader", - "pageID": "1584994", - "editDist": 0.0, - "labelProbability": 0.151631, - "logPopularity": 4.060443010546419, - "score": 0.0401814825039581, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Germany", - "cookedLabel": "Germany", - "pageID": "11867", - "editDist": 0.0, - "labelProbability": 0.731908, - "logPopularity": 11.210644004861829, - "score": 0.9882824313361198, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "World War II", - "cookedLabel": "World War II", - "pageID": "32927", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 10.052123051675276, - "score": 0.5867284583132142, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "wwii", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001160", - "qText": "when did george w bush take office?", - "SV": ["take"], - "lemmaSV": ["take"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "George W. Bush", - "cookedLabel": "George W. Bush", - "pageID": "3414021", - "editDist": 0.0, - "labelProbability": 0.976972, - "logPopularity": 7.558516743045645, - "score": 0.9909967416898372, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Office", - "cookedLabel": "Office", - "pageID": "382507", - "editDist": 0.0, - "labelProbability": 0.315404, - "logPopularity": 5.53338948872752, - "score": 0.291962500202855, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Office (UK TV series)", - "cookedLabel": "The Office", - "pageID": "2995581", - "editDist": 0.0, - "labelProbability": 0.0649595, - "logPopularity": 4.948759890378168, - "score": 0.03312697862285286, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "OpenOffice.org", - "cookedLabel": "OpenOffice.org", - "pageID": "68227", - "editDist": 0.0, - "labelProbability": 0.0649595, - "logPopularity": 5.075173815233827, - "score": 0.03564425763965933, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Office (U.S. TV series)", - "cookedLabel": "The Office", - "pageID": "2995553", - "editDist": 0.0, - "labelProbability": 0.0649595, - "logPopularity": 6.253828811575473, - "score": 0.06973986455768874, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Microsoft Office", - "cookedLabel": "Microsoft Office", - "pageID": "20288", - "editDist": 0.0, - "labelProbability": 0.384022, - "logPopularity": 4.672828834461906, - "score": 0.11177829516220708, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "george w bush", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr001180", - "qText": "where is wellsville missouri?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Missouri", - "cookedLabel": "Missouri", - "pageID": "19571", - "editDist": 0.0, - "labelProbability": 0.726548, - "logPopularity": 8.925321416943886, - "score": 0.9826241016408392, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "wellsville missouri", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "wellsville", "type": "ClueToken", "weight": 1.0 }, - { "label": "missouri?", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr001200", - "qText": "where great britain located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Great Britain", - "cookedLabel": "Great Britain", - "pageID": "13530298", - "editDist": 0.0, - "labelProbability": 0.545167, - "logPopularity": 7.485491608030754, - "score": 0.8424338456431942, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kingdom of Great Britain", - "cookedLabel": "Kingdom of Great Britain", - "pageID": "158019", - "editDist": 0.0, - "labelProbability": 0.115033, - "logPopularity": 7.289610521451167, - "score": 0.1971669918471435, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "United Kingdom", - "cookedLabel": "United Kingdom", - "pageID": "31717", - "editDist": 0.0, - "labelProbability": 0.0500366, - "logPopularity": 11.570967932364097, - "score": 0.7038951322501689, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "great britain", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001220", - "qText": "when is the last time the chicago bulls won a championship?", - "SV": ["won"], - "lemmaSV": ["win"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "The Last Time (song)", - "cookedLabel": "The Last Time", - "pageID": "6205401", - "editDist": 0.0, - "labelProbability": 0.682171, - "logPopularity": 4.634728988229636, - "score": 0.830873613966111, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Last Time (Agnetha Fältskog song)", - "cookedLabel": "The Last Time", - "pageID": "5748973", - "editDist": 0.0, - "labelProbability": 0.682171, - "logPopularity": 3.871201010907891, - "score": 0.7565253155390046, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Last Time (film)", - "cookedLabel": "The Last Time", - "pageID": "19097189", - "editDist": 0.0, - "labelProbability": 0.682171, - "logPopularity": 3.784189633918261, - "score": 0.7467809835967109, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Last Time (Taylor Swift song)", - "cookedLabel": "The Last Time", - "pageID": "37818274", - "editDist": 0.0, - "labelProbability": 0.682171, - "logPopularity": 3.713572066704308, - "score": 0.7386854613410274, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Last Time (album)", - "cookedLabel": "The Last Time", - "pageID": "24162197", - "editDist": 0.0, - "labelProbability": 0.682171, - "logPopularity": 3.871201010907891, - "score": 0.7565253155390046, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Chicago Bulls", - "cookedLabel": "Chicago Bulls", - "pageID": "72866", - "editDist": 0.0, - "labelProbability": 0.934295, - "logPopularity": 6.981934677156389, - "score": 0.984617412627173, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Football League Championship", - "cookedLabel": "Football League Championship", - "pageID": "715008", - "editDist": 0.0, - "labelProbability": 0.472721, - "logPopularity": 6.082218910376446, - "score": 0.381212611810872, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Championship", - "cookedLabel": "Championship", - "pageID": "3884434", - "editDist": 0.0, - "labelProbability": 0.0986606, - "logPopularity": 1.791759469228055, - "score": 0.022051129155561185, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the last time the chicago bulls won a championship", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "last time the chicago bulls won a championship", "type": "CluePhrase", "weight": 0.99 }, - { "label": "last time", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr001240", - "qText": "where is mallorca?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Majorca", - "cookedLabel": "Majorca", - "pageID": "59310", - "editDist": 0.0, - "labelProbability": 0.813659, - "logPopularity": 6.210600077024653, - "score": 0.9430392421046333, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "RCD Mallorca", - "cookedLabel": "RCD Mallorca", - "pageID": "322630", - "editDist": 0.0, - "labelProbability": 0.143122, - "logPopularity": 7.300472814267799, - "score": 0.35265837803377453, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "mallorca", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001260", - "qText": "what city was robert kennedy killed in?", - "SV": ["killed"], - "lemmaSV": ["kill"], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8693705", "text": "urban area", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7991473", "text": "gathering", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8242502", "text": "municipality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8643858", "text": "municipality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "city", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Robert F. Kennedy", - "cookedLabel": "Robert F. Kennedy", - "pageID": "21131695", - "editDist": 0.0, - "labelProbability": 0.967946, - "logPopularity": 5.5254529391317835, - "score": 0.9689261200594848, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "robert kennedy", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr001280", - "qText": "what is the closest airport to naples florida?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3319968", "text": "facility", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "2690851", "text": "airfield", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "airport", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Naples, Florida", - "cookedLabel": "Naples, Florida", - "pageID": "109132", - "editDist": 0.0, - "labelProbability": 0.888889, - "logPopularity": 6.249975242259483, - "score": 0.9251916699782889, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the closest airport to naples florida", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "closest airport to naples florida", "type": "CluePhrase", "weight": 0.99 }, - { "label": "closest airport", "type": "CluePhrase", "weight": 0.99 }, - { "label": "closest", "type": "ClueToken", "weight": 1.0 }, - { "label": "airport", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "naples florida?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr001300", - "qText": "what language do they speak in argentina yahoo?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5658174", "text": "faculty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6297048", "text": "word", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6294878", "text": "language unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6399623", "text": "text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5778661", "text": "higher cognitive process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6376912", "text": "matter", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "language", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "They", - "cookedLabel": "They", - "pageID": "962806", - "editDist": 0.0, - "labelProbability": 0.154574, - "logPopularity": 2.0794415416798357, - "score": 0.02418695546554977, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "They (song)", - "cookedLabel": "They", - "pageID": "6129284", - "editDist": 0.0, - "labelProbability": 0.055205, - "logPopularity": 3.871201010907891, - "score": 0.016872077760925214, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "argentina yahoo", "type": "CluePhrase", "weight": 0.99 }, - { "label": "argentina", "type": "ClueToken", "weight": 1.0 }, - { "label": "yahoo", "type": "ClueToken", "weight": 1.0 } - ] - }, - { - "qId": "wqr001320", - "qText": "what type of planes does virgin america fly?", - "SV": ["fly"], - "lemmaSV": ["fly"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2855782", "text": "block", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847533", "text": "kind", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8008892", "text": "taxonomic group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "type", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Airplane", - "cookedLabel": "Airplane", - "pageID": "1396249", - "editDist": 0.0, - "labelProbability": 0.153099, - "logPopularity": 3.828641396489095, - "score": 0.06569766357637967, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Planes, Alicante", - "cookedLabel": "Planes, Alicante", - "pageID": "23503428", - "editDist": 0.0, - "labelProbability": 0.436893, - "logPopularity": 3.258096538021482, - "score": 0.06425363033386783, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Albion (wherry)", - "cookedLabel": "Albion", - "pageID": "11814367", - "editDist": 0.0, - "labelProbability": 0.153099, - "logPopularity": 4.07753744390572, - "score": 0.02956050625112393, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Plane (esotericism)", - "cookedLabel": "Plane", - "pageID": "1037059", - "editDist": 0.0, - "labelProbability": 0.153099, - "logPopularity": 3.970291913552122, - "score": 0.027769463513634657, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Planes (film)", - "cookedLabel": "Planes", - "pageID": "33619581", - "editDist": 0.0, - "labelProbability": 0.153099, - "logPopularity": 5.056245805348308, - "score": 0.051951474794694204, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Virgin America", - "cookedLabel": "Virgin America", - "pageID": "780894", - "editDist": 0.0, - "labelProbability": 0.993224, - "logPopularity": 4.356708826689592, - "score": 0.9455651124719849, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "planes", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001340", - "qText": "what all does google have?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "all", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Google", - "cookedLabel": "Google", - "pageID": "1092923", - "editDist": 0.0, - "labelProbability": 0.776912, - "logPopularity": 6.598509028614515, - "score": 0.9463727325949041, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Google Search", - "cookedLabel": "Google Search", - "pageID": "12431", - "editDist": 0.0, - "labelProbability": 0.126869, - "logPopularity": 5.572154032177765, - "score": 0.15199311696008388, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "google", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001360", - "qText": "what to see outside of paris?", - "SV": ["see"], - "lemmaSV": ["see"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Outside of This", - "cookedLabel": "Outside of This", - "pageID": "19770019", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 4.23410650459726, - "score": 0.09909197854553327, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Outside Music", - "cookedLabel": "Outside Music", - "pageID": "7496852", - "editDist": 0.0, - "labelProbability": 0.294876, - "logPopularity": 4.875197323201151, - "score": 0.20179120300774417, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Outside (David Bowie album)", - "cookedLabel": "Outside", - "pageID": "1142018", - "editDist": 0.0, - "labelProbability": 0.294876, - "logPopularity": 4.574710978503383, - "score": 0.17430408133287517, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Outside (O'Death album)", - "cookedLabel": "Outside", - "pageID": "35029570", - "editDist": 0.0, - "labelProbability": 0.294876, - "logPopularity": 4.248495242049359, - "score": 0.14790207526270802, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Outside (Amar album)", - "cookedLabel": "Outside", - "pageID": "26038888", - "editDist": 0.0, - "labelProbability": 0.294876, - "logPopularity": 4.04305126783455, - "score": 0.13303167563009013, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Outside (magazine)", - "cookedLabel": "Outside", - "pageID": "15602770", - "editDist": 0.0, - "labelProbability": 0.336771, - "logPopularity": 3.9318256327243257, - "score": 0.14822652123311217, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Paris", - "cookedLabel": "Paris", - "pageID": "22989", - "editDist": 0.0, - "labelProbability": 0.817026, - "logPopularity": 9.647045715820404, - "score": 0.9799246369048512, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "outside", "type": "ClueNE", "weight": 1.09 }] - }, - { - "qId": "wqr001380", - "qText": "what is julia gillard famous for?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "gillard", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Julia Gillard", - "cookedLabel": "Julia Gillard", - "pageID": "519437", - "editDist": 0.0, - "labelProbability": 0.996834, - "logPopularity": 5.771441123130016, - "score": 0.908025046664713, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Sham Shui Po District", - "cookedLabel": "Sham Shui Po District", - "pageID": "2638679", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 4.6913478822291435, - "score": 0.1782132779367818, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Rampura Phul", - "cookedLabel": "Rampura Phul", - "pageID": "5807631", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 4.330733340286331, - "score": 0.14869608916356794, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "famous for", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001400", - "qText": "who played jacob black?", - "SV": ["played"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Jacob Black", - "cookedLabel": "Jacob Black", - "pageID": "10467799", - "editDist": 0.0, - "labelProbability": 0.988721, - "logPopularity": 4.127134385045092, - "score": 0.8455697743714826, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001420", - "qText": "what currency does thailand use?", - "SV": ["use"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "4923519", "text": "property", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4772610", "text": "prevalence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5051824", "text": "temporal arrangement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5051679", "text": "temporal property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4771667", "text": "generality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5057266", "text": "presentness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5053160", "text": "timing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "currency", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Thailand", - "cookedLabel": "Thailand", - "pageID": "30128", - "editDist": 0.0, - "labelProbability": 0.705665, - "logPopularity": 9.175541866433488, - "score": 0.9835247469770341, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Thailand national football team", - "cookedLabel": "Thailand national football team", - "pageID": "1110063", - "editDist": 0.0, - "labelProbability": 0.0655873, - "logPopularity": 6.97914527506881, - "score": 0.23929371727969667, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "thailand", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001440", - "qText": "who plays mary jane in spiderman 3?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Mary Jane Watson", - "cookedLabel": "Mary Jane Watson", - "pageID": "281687", - "editDist": 0.0, - "labelProbability": 0.492584, - "logPopularity": 4.219507705176107, - "score": 0.2973856389120331, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cannabis (drug)", - "cookedLabel": "Cannabis", - "pageID": "1481886", - "editDist": 0.0, - "labelProbability": 0.147541, - "logPopularity": 4.852030263919617, - "score": 0.11241948044520658, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mary Jane Croft", - "cookedLabel": "Mary Jane Croft", - "pageID": "1874734", - "editDist": 0.0, - "labelProbability": 0.147541, - "logPopularity": 4.31748811353631, - "score": 0.08417083075008917, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Untouchable", - "cookedLabel": "The Untouchable", - "pageID": "5041167", - "editDist": 0.0, - "labelProbability": 0.147541, - "logPopularity": 4.31748811353631, - "score": 0.08417083075008917, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "So Far, So Good... So What!", - "cookedLabel": "So Far, So Good... So What!", - "pageID": "60300", - "editDist": 0.0, - "labelProbability": 0.147541, - "logPopularity": 4.812184355372417, - "score": 0.11005600959212283, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Spider-Man 3", - "cookedLabel": "Spider-Man 3", - "pageID": "702117", - "editDist": 0.0, - "labelProbability": 0.928643, - "logPopularity": 4.48863636973214, - "score": 0.8376732727814729, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "spiderman 3", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001460", - "qText": "who is the voice of lois from family guy?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9794206", "text": "advocate", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7041860", "text": "tune", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10435383", "text": "performer", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9639952", "text": "entertainer", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13819354", "text": "linguistic relation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "44888", "text": "implementation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4990371", "text": "sound property", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13818991", "text": "grammatical relation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "173531", "text": "means", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5208927", "text": "physical ability", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10619214", "text": "singer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6262268", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10360025", "text": "musician", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4988388", "text": "sound", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7385893", "text": "sound", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7154581", "text": "expression", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5207437", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "voice", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lois Griffin", - "cookedLabel": "Lois Griffin", - "pageID": "913759", - "editDist": 0.0, - "labelProbability": 0.565673, - "logPopularity": 4.204692619390966, - "score": 0.36988417502777904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lois Capps", - "cookedLabel": "Lois Capps", - "pageID": "408884", - "editDist": 0.0, - "labelProbability": 0.111604, - "logPopularity": 5.225746673713202, - "score": 0.11844687769507006, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lois McMaster Bujold", - "cookedLabel": "Lois McMaster Bujold", - "pageID": "18733", - "editDist": 0.0, - "labelProbability": 0.111604, - "logPopularity": 4.77912349311153, - "score": 0.0931986907591393, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lois Hole", - "cookedLabel": "Lois Hole", - "pageID": "938524", - "editDist": 0.0, - "labelProbability": 0.111604, - "logPopularity": 4.727387818712341, - "score": 0.09060826457542856, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lois Lowry", - "cookedLabel": "Lois Lowry", - "pageID": "199942", - "editDist": 0.0, - "labelProbability": 0.111604, - "logPopularity": 4.9344739331306915, - "score": 0.10138029943961649, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Family Guy", - "cookedLabel": "Family Guy", - "pageID": "187586", - "editDist": 0.0, - "labelProbability": 0.964212, - "logPopularity": 6.248042874508429, - "score": 0.9458401875550986, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the voice of lois from family guy", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "voice of lois from family guy", "type": "CluePhrase", "weight": 0.99 }, - { "label": "voice", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "lois", "type": "ClueNE", "weight": 1.1 } - ] - }, - { - "qId": "wqr001480", - "qText": "what tourist attractions are in houston texas?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "7303344", "text": "social event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6631572", "text": "show", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5858316", "text": "feature", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "34512", "text": "phenomenon", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "11479041", "text": "force", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9639952", "text": "entertainer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "11428673", "text": "natural phenomenon", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "11439518", "text": "physical phenomenon", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5857567", "text": "property", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "attractions", "specificity": "0.0", "type": "LAT" }, - { "text": "attraction", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Tourist attraction", - "cookedLabel": "Tourist attraction", - "pageID": "99863", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.5263605246161616, - "score": 0.006723718697320666, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Houston", - "cookedLabel": "Houston", - "pageID": "13774", - "editDist": 0.0, - "labelProbability": 0.986784, - "logPopularity": 8.83127373772255, - "score": 0.9849144990612584, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "houston texas", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001500", - "qText": "what does annie leibovitz do?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [{ "text": "leibovitz", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Annie Leibovitz", - "cookedLabel": "Annie Leibovitz", - "pageID": "18943868", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.564348191467836, - "score": 0.8292138234969874, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001520", - "qText": "what is the zip code for trenton ohio?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6371284", "text": "writing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6365164", "text": "coding system", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "code", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "ZIP code", - "cookedLabel": "ZIP code", - "pageID": "51550", - "editDist": 0.0, - "labelProbability": 0.965969, - "logPopularity": 10.142898076314209, - "score": 0.9915968566201331, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Genetics", - "cookedLabel": "Genetics", - "pageID": "12266", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.991464547107982, - "score": 0.9266078744173097, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ironton, Ohio", - "cookedLabel": "Ironton, Ohio", - "pageID": "129509", - "editDist": 2.0, - "labelProbability": 0.0, - "logPopularity": 5.575949103146316, - "score": 0.1527437025093599, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "the zip code for trenton ohio", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "zip code for trenton ohio", "type": "CluePhrase", "weight": 0.99 }, - { "label": "code for", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr001540", - "qText": "who is kobe bryant wife bio?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "bio", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kobe Bryant", - "cookedLabel": "Kobe Bryant", - "pageID": "246185", - "editDist": 0.0, - "labelProbability": 0.977755, - "logPopularity": 5.0689042022202315, - "score": 0.8676040761673899, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "who is", "type": "ClueNE", "weight": 1.11 }, - { "label": "kobe bryant wife bio", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "wife", "type": "ClueToken", "weight": 1.0 }, - { "label": "bio", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr001560", - "qText": "who was charles darwin married to?", - "SV": ["married"], - "lemmaSV": ["marry"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Charles Darwin", - "cookedLabel": "Charles Darwin", - "pageID": "8145410", - "editDist": 0.0, - "labelProbability": 0.986471, - "logPopularity": 5.817111159963204, - "score": 0.9758720993822966, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001580", - "qText": "where did jane austen grow up?", - "SV": ["grow"], - "lemmaSV": ["grow"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Jane Austen", - "cookedLabel": "Jane Austen", - "pageID": "15782", - "editDist": 0.0, - "labelProbability": 0.982964, - "logPopularity": 5.19295685089021, - "score": 0.9647484200772398, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Grow Up (Svoy album)", - "cookedLabel": "Grow Up", - "pageID": "32182839", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.430816798843313, - "score": 0.04643010512375539, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up, Tony Phillips", - "cookedLabel": "Grow Up, Tony Phillips", - "pageID": "41237889", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.2188758248682006, - "score": 0.022990485979290737, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up (book)", - "cookedLabel": "Grow Up", - "pageID": "11645304", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.6635616461296463, - "score": 0.02981111954944062, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Grow Up (The Queers album)", - "cookedLabel": "Grow Up", - "pageID": "8796937", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.9889840465642745, - "score": 0.03600738939439352, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "grow up", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr001600", - "qText": "what time does target in alhambra close?", - "SV": ["close"], - "lemmaSV": ["close"], - "LAT": [ - { "synset": "5097645", "text": "magnitude", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "15137796", "text": "time period", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5824748", "text": "datum", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "15269461", "text": "moment", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4990371", "text": "sound property", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15205381", "text": "point", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5824916", "text": "reading", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15249282", "text": "term", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7300108", "text": "experience", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13597072", "text": "fundamental quantity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4998633", "text": "rhythmicity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7323507", "text": "case", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5824413", "text": "information", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5100843", "text": "dimension", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "time", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Target Corporation", - "cookedLabel": "Target Corporation", - "pageID": "18581242", - "editDist": 0.0, - "labelProbability": 0.688924, - "logPopularity": 5.0369526024136295, - "score": 0.821892941151469, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Target Books", - "cookedLabel": "Target Books", - "pageID": "930311", - "editDist": 0.0, - "labelProbability": 0.11761, - "logPopularity": 5.241747015059643, - "score": 0.27409486180995263, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Target (UK TV series)", - "cookedLabel": "Target", - "pageID": "8735505", - "editDist": 0.0, - "labelProbability": 0.11761, - "logPopularity": 5.043425116919247, - "score": 0.25106580573413756, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Target Field", - "cookedLabel": "Target Field", - "pageID": "4932680", - "editDist": 0.0, - "labelProbability": 0.11761, - "logPopularity": 4.663439094112067, - "score": 0.21066469210674765, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Target Center", - "cookedLabel": "Target Center", - "pageID": "237003", - "editDist": 0.0, - "labelProbability": 0.11761, - "logPopularity": 5.147494476813453, - "score": 0.26298803860427983, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Alhambra", - "cookedLabel": "Alhambra", - "pageID": "30543", - "editDist": 0.0, - "labelProbability": 0.753071, - "logPopularity": 4.624972813284271, - "score": 0.6412571396044241, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Alhambra, California", - "cookedLabel": "Alhambra, California", - "pageID": "107595", - "editDist": 0.0, - "labelProbability": 0.102766, - "logPopularity": 5.54907608489522, - "score": 0.05521313576151007, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "target in alhambra", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "target", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr001620", - "qText": "what are the best places to go in germany?", - "SV": ["go"], - "lemmaSV": ["go"], - "LAT": [ - { "synset": "8637195", "text": "public square", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8508037", "text": "address", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "720746", "text": "duty", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6494090", "text": "item", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "576778", "text": "work", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6374360", "text": "writing", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8658688", "text": "vicinity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5130681", "text": "extent", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24900", "text": "state", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5135784", "text": "area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8514304", "text": "area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8576500", "text": "residence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6411914", "text": "passage", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "583425", "text": "occupation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8665897", "text": "section", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8637636", "text": "point", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13968971", "text": "status", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13943868", "text": "condition", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13948785", "text": "situation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13970595", "text": "social station", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5097645", "text": "magnitude", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "721817", "text": "function", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8596234", "text": "geographic point", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8670545", "text": "space", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6403644", "text": "section", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8691133", "text": "tract", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "places", "specificity": "0.0", "type": "LAT" }, - { "text": "place", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Tony Award for Best Play", - "cookedLabel": "Tony Award for Best Play", - "pageID": "250941", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 2.3978952727983707, - "score": 0.023125298297074125, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Best Play ESPY Award", - "cookedLabel": "Best Play ESPY Award", - "pageID": "10975973", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 2.1972245773362196, - "score": 0.02055601553346779, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Helpmann Award for Best Play", - "cookedLabel": "Helpmann Award for Best Play", - "pageID": "33286338", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 2.1972245773362196, - "score": 0.02055601553346779, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "The Best (song)", - "cookedLabel": "The Best", - "pageID": "5519593", - "editDist": 0.0, - "labelProbability": 0.292556, - "logPopularity": 4.804021044733257, - "score": 0.19333164156720659, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Best (Despina Vandi album)", - "cookedLabel": "The Best", - "pageID": "15916142", - "editDist": 0.0, - "labelProbability": 0.158684, - "logPopularity": 4.976733742420574, - "score": 0.1256234154068918, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Best (Portuguese footballer)", - "cookedLabel": "Best", - "pageID": "18953026", - "editDist": 0.0, - "labelProbability": 0.158684, - "logPopularity": 4.532599493153256, - "score": 0.0991506029418958, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Best, Netherlands", - "cookedLabel": "Best, Netherlands", - "pageID": "118705", - "editDist": 0.0, - "labelProbability": 0.158684, - "logPopularity": 4.624972813284271, - "score": 0.10421212240951007, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Best (Mika Nakashima album)", - "cookedLabel": "Best", - "pageID": "6123928", - "editDist": 0.0, - "labelProbability": 0.158684, - "logPopularity": 4.736198448394496, - "score": 0.11060845654008894, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Take-out", - "cookedLabel": "Take-out", - "pageID": "326234", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.258096538021482, - "score": 0.023525035528552937, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Germany", - "cookedLabel": "Germany", - "pageID": "11867", - "editDist": 0.0, - "labelProbability": 0.731908, - "logPopularity": 11.210644004861829, - "score": 0.9882824313361198, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the best places to go in germany", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "best places to go in germany", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the best", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr001640", - "qText": "where did pizarro land?", - "SV": ["land"], - "lemmaSV": ["land"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Francisco Pizarro", - "cookedLabel": "Francisco Pizarro", - "pageID": "55271", - "editDist": 0.0, - "labelProbability": 0.627421, - "logPopularity": 4.8283137373023015, - "score": 0.7542382553020757, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "David Pizarro", - "cookedLabel": "David Pizarro", - "pageID": "2372903", - "editDist": 0.0, - "labelProbability": 0.0810398, - "logPopularity": 4.948759890378168, - "score": 0.09081393176116873, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Claudio Pizarro", - "cookedLabel": "Claudio Pizarro", - "pageID": "1149234", - "editDist": 0.0, - "labelProbability": 0.116972, - "logPopularity": 5.1298987149230735, - "score": 0.1160998607878938, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "pizarro", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001660", - "qText": "what is there to do in mt baldy california?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Time management", - "cookedLabel": "Time management", - "pageID": "31092", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.5553480614894135, - "score": 0.7453702846431538, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mount Baldy (Arizona)", - "cookedLabel": "Mount Baldy", - "pageID": "11869346", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.897839799950911, - "score": 0.060536868671169725, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mount Baldy, California", - "cookedLabel": "Mount Baldy, California", - "pageID": "24855182", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.795790545596741, - "score": 0.0571469495266546, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mount Baldy Ski Area", - "cookedLabel": "Mount Baldy Ski Area", - "pageID": "10878447", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.0943445622221, - "score": 0.03826717146117365, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mount Baldy (Nevada)", - "cookedLabel": "Mount Baldy", - "pageID": "29651951", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.828641396489095, - "score": 0.03281312068885659, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mount Baldy (Alberta)", - "cookedLabel": "Mount Baldy", - "pageID": "1474567", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.9318256327243257, - "score": 0.03483575892136317, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "to do", "type": "ClueNE", "weight": 1.11 }, - { "label": "mt baldy california", "type": "CluePhrase", "weight": 0.99 }, - { "label": "california", "type": "ClueToken", "weight": 1.0 } - ] - }, - { - "qId": "wqr001680", - "qText": "who is the ravens quarterback 2012?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "15184543", "text": "date", "specificity": "-2.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Ravens", - "cookedLabel": "The Ravens", - "pageID": "1081322", - "editDist": 0.0, - "labelProbability": 0.81746, - "logPopularity": 4.174387269895637, - "score": 0.6471288237888566, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012", - "cookedLabel": "2012", - "pageID": "47374", - "editDist": 0.0, - "labelProbability": 0.370218, - "logPopularity": 2.833213344056216, - "score": 0.22133110536291262, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012 phenomenon", - "cookedLabel": "2012 phenomenon", - "pageID": "21538638", - "editDist": 0.0, - "labelProbability": 0.062167, - "logPopularity": 4.219507705176107, - "score": 0.055828191938404, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "2012 (film)", - "cookedLabel": "2012", - "pageID": "18436536", - "editDist": 0.0, - "labelProbability": 0.128225, - "logPopularity": 4.736198448394496, - "score": 0.09846614463200648, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the ravens quarterback 2012", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "ravens quarterback 2012", "type": "CluePhrase", "weight": 0.99 }, - { "label": "quarterback", "type": "ClueToken", "weight": 1.0 }, - { "label": "2012", "type": "ClueNE", "weight": 2.3000000000000003 } - ] - }, - { - "qId": "wqr001700", - "qText": "who plays stella in coronation street?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Estelle (musician)", - "cookedLabel": "Estelle", - "pageID": "3854523", - "editDist": 0.0, - "labelProbability": 0.910677, - "logPopularity": 5.459585514144159, - "score": 0.6942599195528122, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of Seinfeld minor characters", - "cookedLabel": "stella", - "pageID": "571523", - "editDist": 0.0, - "labelProbability": 0.910677, - "logPopularity": 4.477336814478207, - "score": 0.5574384809980283, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Princess Estelle, Duchess of Östergötland", - "cookedLabel": "Princess Estelle, Duchess of Östergötland", - "pageID": "34853027", - "editDist": 0.0, - "labelProbability": 0.910677, - "logPopularity": 4.406719247264253, - "score": 0.5469618964089914, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Estelle, Louisiana", - "cookedLabel": "Estelle, Louisiana", - "pageID": "115613", - "editDist": 0.0, - "labelProbability": 0.910677, - "logPopularity": 4.30406509320417, - "score": 0.5316604686865143, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Don Estelle", - "cookedLabel": "Don Estelle", - "pageID": "287068", - "editDist": 0.0, - "labelProbability": 0.910677, - "logPopularity": 4.418840607796598, - "score": 0.5487634046109606, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Coronation Street", - "cookedLabel": "Coronation Street", - "pageID": "6851", - "editDist": 0.0, - "labelProbability": 0.995472, - "logPopularity": 6.066108090103747, - "score": 0.9475819375105158, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "stella", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001720", - "qText": "what did cam newton do?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [ - { "synset": "10320928", "text": "mathematician", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13604927", "text": "unit of measurement", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13597304", "text": "definite quantity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10447768", "text": "physicist", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10580065", "text": "scientist", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13624308", "text": "force unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "newton", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Cam Newton", - "cookedLabel": "Cam Newton", - "pageID": "9521131", - "editDist": 0.0, - "labelProbability": 0.650866, - "logPopularity": 5.0689042022202315, - "score": 0.5690556655271514, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cam Newton (ice hockey)", - "cookedLabel": "Cam Newton", - "pageID": "22349459", - "editDist": 0.0, - "labelProbability": 0.262997, - "logPopularity": 4.30406509320417, - "score": 0.04975335974685033, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001740", - "qText": "what did neil say on the moon?", - "SV": ["say"], - "lemmaSV": ["say"], - "LAT": [{ "text": "neil", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Neil Armstrong", - "cookedLabel": "Neil Armstrong", - "pageID": "21247", - "editDist": 0.0, - "labelProbability": 0.110849, - "logPopularity": 5.111987788356544, - "score": 0.029334343126575808, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Neil Bush", - "cookedLabel": "Neil Bush", - "pageID": "243759", - "editDist": 0.0, - "labelProbability": 0.0687893, - "logPopularity": 4.330733340286331, - "score": 0.015348128311558275, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Neil", - "cookedLabel": "Neil", - "pageID": "1338760", - "editDist": 0.0, - "labelProbability": 0.112421, - "logPopularity": 1.6094379124341003, - "score": 0.009877386550343368, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "On the Moon", - "cookedLabel": "On the Moon", - "pageID": "7219118", - "editDist": 0.0, - "labelProbability": 0.534162, - "logPopularity": 3.784189633918261, - "score": 0.3555170376195336, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "neil", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001760", - "qText": "what were marco polo's goals?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "38116", "text": "action", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8586507", "text": "extremity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3419072", "text": "game equipment", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "64472", "text": "success", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "35910", "text": "accomplishment", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "63626", "text": "attainment", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3298959", "text": "equipment", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8583557", "text": "end", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8647614", "text": "region", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "187483", "text": "score", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "goals", "specificity": "0.0", "type": "LAT" }, - { "text": "goal", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Marco Polo", - "cookedLabel": "Marco Polo", - "pageID": "19334", - "editDist": 0.0, - "labelProbability": 0.880007, - "logPopularity": 4.624972813284271, - "score": 0.8174643786380883, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "marco polo's goals", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "goals", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr001780", - "qText": "what language brazil use?", - "SV": ["brazil"], - "lemmaSV": ["brazil"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5658174", "text": "faculty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6297048", "text": "word", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6294878", "text": "language unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6399623", "text": "text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5778661", "text": "higher cognitive process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6376912", "text": "matter", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "language", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Glossary of vexillology", - "cookedLabel": "Glossary of vexillology", - "pageID": "113612", - "editDist": 0.0, - "labelProbability": 0.203481, - "logPopularity": 3.4011973816621555, - "score": 0.02495217404278388, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Use case", - "cookedLabel": "Use case", - "pageID": "300006", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.2188758248682006, - "score": 0.022990485979290737, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Bisphenol A", - "cookedLabel": "Bisphenol A", - "pageID": "1001430", - "editDist": 0.0, - "labelProbability": 0.0586792, - "logPopularity": 3.8501476017100584, - "score": 0.016927525045779115, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "User story", - "cookedLabel": "User story", - "pageID": "2656549", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.70805020110221, - "score": 0.01702491912837177, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Use–mention distinction", - "cookedLabel": "Use–mention distinction", - "pageID": "172990", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.258096538021482, - "score": 0.023525035528552937, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "use", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001800", - "qText": "when will muharram start 2011?", - "SV": ["start"], - "lemmaSV": ["start"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "Muharram", - "cookedLabel": "Muharram", - "pageID": "444123", - "editDist": 0.0, - "labelProbability": 0.903818, - "logPopularity": 2.5649493574615367, - "score": 0.7376540387559763, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Day of Ashura", - "cookedLabel": "Day of Ashura", - "pageID": "488563", - "editDist": 0.0, - "labelProbability": 0.065387, - "logPopularity": 4.204692619390966, - "score": 0.056140632929766454, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "2011", - "cookedLabel": "2011", - "pageID": "36225", - "editDist": 0.0, - "labelProbability": 0.353935, - "logPopularity": 2.6390573296152584, - "score": 0.07978062557656532, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "muharram", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001820", - "qText": "what currency do mexico use?", - "SV": ["use"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "4923519", "text": "property", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4772610", "text": "prevalence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5051824", "text": "temporal arrangement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5051679", "text": "temporal property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4771667", "text": "generality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5057266", "text": "presentness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5053160", "text": "timing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "currency", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Mexico", - "cookedLabel": "Mexico", - "pageID": "3966054", - "editDist": 0.0, - "labelProbability": 0.705114, - "logPopularity": 10.004282662571022, - "score": 0.9898891719082676, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001840", - "qText": "what books did lincoln write?", - "SV": ["write"], - "lemmaSV": ["write"], - "LAT": [ - { "synset": "4606723", "text": "work", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7968050", "text": "collection", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7020800", "text": "dramatic composition", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6646883", "text": "information", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13424816", "text": "record", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6648784", "text": "fact", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13424504", "text": "document", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6611268", "text": "message", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6374360", "text": "writing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4014270", "text": "product", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6601855", "text": "publication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6441260", "text": "sacred text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6403644", "text": "section", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "books", "specificity": "0.0", "type": "LAT" }, - { "text": "book", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Lincolnshire", - "cookedLabel": "Lincolnshire", - "pageID": "53295", - "editDist": 0.0, - "labelProbability": 0.117749, - "logPopularity": 8.146709052203319, - "score": 0.6834428177880589, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lincoln, Nebraska", - "cookedLabel": "Lincoln, Nebraska", - "pageID": "17653", - "editDist": 0.0, - "labelProbability": 0.117749, - "logPopularity": 7.400009517162692, - "score": 0.5797208566933856, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Abraham Lincoln", - "cookedLabel": "Abraham Lincoln", - "pageID": "307", - "editDist": 0.0, - "labelProbability": 0.117749, - "logPopularity": 6.089044875446846, - "score": 0.38581533885682096, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lincoln, England", - "cookedLabel": "Lincoln, England", - "pageID": "17880", - "editDist": 0.0, - "labelProbability": 0.117749, - "logPopularity": 6.639875833826536, - "score": 0.46643947108922923, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Lincoln City F.C.", - "cookedLabel": "Lincoln City F.C.", - "pageID": "451163", - "editDist": 0.0, - "labelProbability": 0.117749, - "logPopularity": 8.052296499538647, - "score": 0.6710620781440747, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "lincoln", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001860", - "qText": "where does kirk cameron live now?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Kirk Cameron", - "cookedLabel": "Kirk Cameron", - "pageID": "20398264", - "editDist": 0.0, - "labelProbability": 0.979936, - "logPopularity": 4.812184355372417, - "score": 0.95550994552952, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Rolling Stones, Now!", - "cookedLabel": "The Rolling Stones, Now!", - "pageID": "1365981", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.543294782270004, - "score": 0.13527332344325466, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now TV", - "cookedLabel": "Now TV", - "pageID": "2857006", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.430816798843313, - "score": 0.1275717260871764, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now! (Bobby Hutcherson album)", - "cookedLabel": "Now!", - "pageID": "2580911", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.248495242049359, - "score": 0.1158846124007123, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (Fireflight album)", - "cookedLabel": "Now", - "pageID": "34423465", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.343805421853684, - "score": 0.1218734455168399, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (newspaper)", - "cookedLabel": "Now", - "pageID": "1058750", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.532599493153256, - "score": 0.1345244471522166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "now", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr001880", - "qText": "what disease did abe lincoln have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "14085287", "text": "illness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14057659", "text": "physical condition", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "14075399", "text": "pathological state", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24900", "text": "state", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "14075528", "text": "ill health", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13943868", "text": "condition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "disease", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Abraham Lincoln", - "cookedLabel": "Abraham Lincoln", - "pageID": "307", - "editDist": 0.0, - "labelProbability": 0.869471, - "logPopularity": 6.089044875446846, - "score": 0.9652876094180043, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "abe lincoln", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr001900", - "qText": "what going on in afghanistan right now?", - "SV": ["going"], - "lemmaSV": ["go"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Going On", - "cookedLabel": "Going On", - "pageID": "18511618", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.9512437185814275, - "score": 0.03522960933689275, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Afghanistan", - "cookedLabel": "Afghanistan", - "pageID": "737", - "editDist": 0.0, - "labelProbability": 0.746978, - "logPopularity": 8.384347278082808, - "score": 0.9431281412932907, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Right Now (Atomic Kitten album)", - "cookedLabel": "Right Now", - "pageID": "3409298", - "editDist": 0.0, - "labelProbability": 0.14878, - "logPopularity": 5.10594547390058, - "score": 0.12918088923564972, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Right Now (Leon Jackson album)", - "cookedLabel": "Right Now", - "pageID": "19354950", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.812184355372417, - "score": 0.057679244816714784, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Right Now (Rihanna song)", - "cookedLabel": "Right Now", - "pageID": "37535311", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.543294782270004, - "score": 0.04951112306188091, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Right Now (Herbie Mann song)", - "cookedLabel": "Right Now", - "pageID": "9209791", - "editDist": 0.0, - "labelProbability": 0.160366, - "logPopularity": 4.174387269895637, - "score": 0.08211961058034786, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Right Now (Korn song)", - "cookedLabel": "Right Now", - "pageID": "6012234", - "editDist": 0.0, - "labelProbability": 0.290854, - "logPopularity": 3.9889840465642745, - "score": 0.1272670428128275, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001920", - "qText": "where is chowchilla located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Chowchilla, California", - "cookedLabel": "Chowchilla, California", - "pageID": "107709", - "editDist": 0.0, - "labelProbability": 0.576923, - "logPopularity": 4.77912349311153, - "score": 0.46849432041420475, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Chowchilla River", - "cookedLabel": "Chowchilla River", - "pageID": "5017862", - "editDist": 0.0, - "labelProbability": 0.0903399, - "logPopularity": 4.61512051684126, - "score": 0.07862560117978776, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Chowchilla Airport", - "cookedLabel": "Chowchilla Airport", - "pageID": "7323766", - "editDist": 0.0, - "labelProbability": 0.0903399, - "logPopularity": 4.1588830833596715, - "score": 0.06094475304922138, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Chowchilla", - "cookedLabel": "Chowchilla", - "pageID": "12690456", - "editDist": 0.0, - "labelProbability": 0.330054, - "logPopularity": 3.7376696182833684, - "score": 0.28907595287084764, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "chowchilla", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr001940", - "qText": "where does brian williams live?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Brian Williams", - "cookedLabel": "Brian Williams", - "pageID": "8269855", - "editDist": 0.0, - "labelProbability": 0.742176, - "logPopularity": 4.882801922586371, - "score": 0.88251710543483, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Brian Williams (sportscaster)", - "cookedLabel": "Brian Williams", - "pageID": "1231896", - "editDist": 0.0, - "labelProbability": 0.0674049, - "logPopularity": 4.406719247264253, - "score": 0.08653846245236564, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr001960", - "qText": "what was the cause of death for sage stallone?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "1183965", "text": "due process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "576778", "text": "work", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6735202", "text": "statement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "1930", "text": "physical entity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "797381", "text": "undertaking", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6611268", "text": "message", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7338522", "text": "origin", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "1082290", "text": "group action", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "1187304", "text": "proceeding", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6751030", "text": "explanation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7305628", "text": "beginning", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6752932", "text": "justification", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "799539", "text": "venture", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "cause", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "The Cause", - "cookedLabel": "The Cause", - "pageID": "1555536", - "editDist": 0.0, - "labelProbability": 0.759615, - "logPopularity": 2.302585092994046, - "score": 0.31378069925397956, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Cause of death", - "cookedLabel": "Cause of death", - "pageID": "791114", - "editDist": 0.0, - "labelProbability": 0.363077, - "logPopularity": 1.791759469228055, - "score": 0.03402476362264536, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Sage Stallone", - "cookedLabel": "Sage Stallone", - "pageID": "3253292", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.6443908991413725, - "score": 0.8490903710890593, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the cause of death for sage stallone", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "cause of death for sage stallone", "type": "CluePhrase", "weight": 0.99 }, - { "label": "death for sage stallone", "type": "CluePhrase", "weight": 0.99 } - ] - }, - { - "qId": "wqr001980", - "qText": "what has been discovered on mars so far?", - "SV": ["discovered"], - "lemmaSV": ["discover"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Has Been", - "cookedLabel": "Has Been", - "pageID": "1105249", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.828641396489095, - "score": 0.77522331235702, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mars", - "cookedLabel": "Mars", - "pageID": "14640471", - "editDist": 0.0, - "labelProbability": 0.632056, - "logPopularity": 5.84354441703136, - "score": 0.680421338259525, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mars (mythology)", - "cookedLabel": "Mars", - "pageID": "19638032", - "editDist": 0.0, - "labelProbability": 0.12615, - "logPopularity": 3.9512437185814275, - "score": 0.024340886274224788, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "So Far (album)", - "cookedLabel": "So Far", - "pageID": "4320167", - "editDist": 0.0, - "labelProbability": 0.494993, - "logPopularity": 4.574710978503383, - "score": 0.34624469132521296, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "So Far... The Best of Sinéad O'Connor", - "cookedLabel": "So Far... The Best of Sinéad O'Connor", - "pageID": "5870699", - "editDist": 0.0, - "labelProbability": 0.101574, - "logPopularity": 3.9318256327243257, - "score": 0.05574218696743388, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Grateful Dead: So Far", - "cookedLabel": "Grateful Dead: So Far", - "pageID": "11580269", - "editDist": 0.0, - "labelProbability": 0.260372, - "logPopularity": 3.4011973816621555, - "score": 0.08180149482942516, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Faust So Far", - "cookedLabel": "Faust So Far", - "pageID": "3308190", - "editDist": 0.0, - "labelProbability": 0.101574, - "logPopularity": 3.9512437185814275, - "score": 0.05635860176312338, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "So Far (Faust song)", - "cookedLabel": "So Far", - "pageID": "26348839", - "editDist": 0.0, - "labelProbability": 0.101574, - "logPopularity": 3.9512437185814275, - "score": 0.05635860176312338, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "so far", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr002000", - "qText": "who is president of france?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "587299", "text": "position", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10182584", "text": "head", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10488931", "text": "presiding officer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10184340", "text": "head of state", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9633690", "text": "communicator", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10541628", "text": "representative", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "597922", "text": "presidency", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10371605", "text": "negotiator", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9646208", "text": "leader", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9778216", "text": "academic administrator", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "583425", "text": "occupation", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9790372", "text": "administrator", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9985785", "text": "corporate executive", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10089452", "text": "executive", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "president", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "President of France", - "cookedLabel": "President of France", - "pageID": "24899", - "editDist": 0.0, - "labelProbability": 0.97594, - "logPopularity": 4.48863636973214, - "score": 0.8059721771697235, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "president of france?", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002020", - "qText": "what was eli whitney education?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "8131836", "text": "department", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-10.0", "type": "WordnetLAT" }, - { "synset": "8237635", "text": "division", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8094128", "text": "administrative unit", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "4928188", "text": "inheritance", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8136796", "text": "government department", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8140150", "text": "executive department", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "611221", "text": "profession", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5760541", "text": "learning", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "583425", "text": "occupation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "4928931", "text": "upbringing", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5709891", "text": "basic cognitive process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-9.0", "type": "WordnetLAT" }, - { "synset": "8139116", "text": "federal department", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "education", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Eli Whitney", - "cookedLabel": "Eli Whitney", - "pageID": "9732", - "editDist": 0.0, - "labelProbability": 0.961735, - "logPopularity": 4.442651256490317, - "score": 0.8069882144943348, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "eli whitney education", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "education", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr002040", - "qText": "what american penny is worth money?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "13409418", "text": "coin", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13604927", "text": "unit of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13407086", "text": "currency", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13597304", "text": "definite quantity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13684808", "text": "fractional monetary unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "13409050", "text": "coinage", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13625961", "text": "monetary unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "penny", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Penny (United States coin)", - "cookedLabel": "Penny", - "pageID": "164092", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.276666119016055, - "score": 0.6014212784537785, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "american penny", "type": "ClueNE", "weight": 1.6 }, - { "label": "worth money", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "worth", "type": "ClueToken", "weight": 1.0 }, - { "label": "money", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr002060", - "qText": "what kind of government does spain have now?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Government", - "cookedLabel": "Government", - "pageID": "12229", - "editDist": 0.0, - "labelProbability": 0.139222, - "logPopularity": 5.5093883366279774, - "score": 0.15315169535105763, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Spain", - "cookedLabel": "Spain", - "pageID": "26667", - "editDist": 0.0, - "labelProbability": 0.708413, - "logPopularity": 10.559711378991475, - "score": 0.9928424271025632, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Rolling Stones, Now!", - "cookedLabel": "The Rolling Stones, Now!", - "pageID": "1365981", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.543294782270004, - "score": 0.13527332344325466, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now! (Bobby Hutcherson album)", - "cookedLabel": "Now!", - "pageID": "2580911", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.248495242049359, - "score": 0.1158846124007123, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now TV", - "cookedLabel": "Now TV", - "pageID": "2857006", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.430816798843313, - "score": 0.1275717260871764, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (Fireflight album)", - "cookedLabel": "Now", - "pageID": "34423465", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.343805421853684, - "score": 0.1218734455168399, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (newspaper)", - "cookedLabel": "Now", - "pageID": "1058750", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.532599493153256, - "score": 0.1345244471522166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "now", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002080", - "qText": "what art movement was vincent van gogh apart of?", - "SV": ["gogh"], - "lemmaSV": ["Gogh"], - "LAT": [ - { "synset": "13494300", "text": "elimination", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "576778", "text": "work", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "11439518", "text": "physical phenomenon", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13461236", "text": "bodily process", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13487789", "text": "discharge", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6202938", "text": "attitude", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "11428673", "text": "natural phenomenon", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "34512", "text": "phenomenon", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "797381", "text": "undertaking", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "11511038", "text": "optical phenomenon", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "2680572", "text": "action", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3743963", "text": "mechanism", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "29976", "text": "process", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "799539", "text": "venture", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13480291", "text": "defecation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3187746", "text": "device", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13547313", "text": "organic process", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "191991", "text": "change", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6206319", "text": "inclination", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7051211", "text": "musical composition", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "11510863", "text": "optical illusion", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "38116", "text": "action", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "movement", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Art movement", - "cookedLabel": "Art movement", - "pageID": "228568", - "editDist": 0.0, - "labelProbability": 0.790698, - "logPopularity": 3.091042453358316, - "score": 0.16840087970345582, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Vincent van Gogh", - "cookedLabel": "Vincent van Gogh", - "pageID": "32603", - "editDist": 0.0, - "labelProbability": 0.993177, - "logPopularity": 6.154858094016418, - "score": 0.9807994905546668, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Apartness relation", - "cookedLabel": "Apartness relation", - "pageID": "7844336", - "editDist": 0.0, - "labelProbability": 0.818182, - "logPopularity": 1.6094379124341003, - "score": 0.2830910245062143, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2008–09 Chelsea F.C. season", - "cookedLabel": "2008–09 Chelsea F.C. season", - "pageID": "17575455", - "editDist": 0.0, - "labelProbability": 0.121212, - "logPopularity": 6.837332814685591, - "score": 0.1211024385658796, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "apart", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002100", - "qText": "where are brembo brakes from?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [], - "Clue": [ - { "label": "brembo brakes", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "brembo", "type": "ClueToken", "weight": 1.0 }, - { "label": "brakes", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr002120", - "qText": "what organism did mendel use?", - "SV": ["use"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "8452398", "text": "system", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4258", "text": "living thing", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "organism", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Gregor Mendel", - "cookedLabel": "Gregor Mendel", - "pageID": "12562", - "editDist": 0.0, - "labelProbability": 0.885144, - "logPopularity": 4.68213122712422, - "score": 0.9018774668477498, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "L. Mendel Rivers", - "cookedLabel": "L. Mendel Rivers", - "pageID": "962771", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.543294782270004, - "score": 0.12360671139896476, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mendel Jackson Davis", - "cookedLabel": "Mendel Jackson Davis", - "pageID": "11232293", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.48863636973214, - "score": 0.12009777896255329, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Mitchell Schwartz", - "cookedLabel": "Mitchell Schwartz", - "pageID": "34940839", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.532599493153256, - "score": 0.1229132413270456, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Felix Mendelssohn", - "cookedLabel": "Felix Mendelssohn", - "pageID": "76370", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.736198448394496, - "score": 0.13670041096528363, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "mendel", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr002140", - "qText": "what did brittany murphy died of?", - "SV": ["died"], - "lemmaSV": ["die"], - "LAT": [ - { "synset": "21445", "text": "food", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7581905", "text": "foodstuff", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7723196", "text": "vegetable", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7726028", "text": "root vegetable", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7725752", "text": "solanaceous vegetable", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7582428", "text": "starches", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7571428", "text": "food", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7721456", "text": "produce", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "murphy", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Brittany Murphy", - "cookedLabel": "Brittany Murphy", - "pageID": "166777", - "editDist": 0.0, - "labelProbability": 0.998574, - "logPopularity": 5.231108616854587, - "score": 0.8779929805163729, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002160", - "qText": "what was robert hooke's contributions to science?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "1088005", "text": "giving", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "1108713", "text": "transaction", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13273872", "text": "transferred property", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13274154", "text": "acquisition", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13352213", "text": "sum", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "1103863", "text": "publication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "787849", "text": "attempt", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "1082290", "text": "group action", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13285910", "text": "gift", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "32912", "text": "possession", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13350663", "text": "assets", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "1096649", "text": "commercial enterprise", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "1087717", "text": "sharing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "1092370", "text": "commerce", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "1085001", "text": "distribution", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "contributions", "specificity": "0.0", "type": "LAT" }, - { "text": "contribution", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Robert Hooke", - "cookedLabel": "Robert Hooke", - "pageID": "49720", - "editDist": 0.0, - "labelProbability": 0.997944, - "logPopularity": 4.624972813284271, - "score": 0.8850716192602223, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Poems by Edgar Allan Poe", - "cookedLabel": "Poems by Edgar Allan Poe", - "pageID": "9449219", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 2.9444389791664403, - "score": 0.669855592572084, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "robert hooke's contributions to science", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "robert hooke's contributions", "type": "CluePhrase", "weight": 0.99 }, - { "label": "contributions", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "to science", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr002180", - "qText": "what was sir arthur conan doyle famous for?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "doyle", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Arthur Conan Doyle", - "cookedLabel": "Arthur Conan Doyle", - "pageID": "18951335", - "editDist": 0.0, - "labelProbability": 0.992414, - "logPopularity": 5.986452005284438, - "score": 0.9167085205418054, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Sham Shui Po District", - "cookedLabel": "Sham Shui Po District", - "pageID": "2638679", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 4.6913478822291435, - "score": 0.1782132779367818, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Rampura Phul", - "cookedLabel": "Rampura Phul", - "pageID": "5807631", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 4.330733340286331, - "score": 0.14869608916356794, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "sir arthur conan doyle", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002200", - "qText": "what school did burne hogarth establish?", - "SV": ["establish"], - "lemmaSV": ["establish"], - "LAT": [ - { "synset": "5761561", "text": "education", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709891", "text": "basic cognitive process", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "2916498", "text": "building", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7981699", "text": "body", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5760541", "text": "learning", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4348764", "text": "structure", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13597072", "text": "fundamental quantity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8293263", "text": "educational institution", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8070328", "text": "institution", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "15137796", "text": "time period", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8010371", "text": "animal group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "school", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Burne Hogarth", - "cookedLabel": "Burne Hogarth", - "pageID": "795964", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.143134726391533, - "score": 0.9403513048805165, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002220", - "qText": "what years did yankees win championships?", - "SV": ["win"], - "lemmaSV": ["win"], - "LAT": [ - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "13597072", "text": "fundamental quantity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7991473", "text": "gathering", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15165852", "text": "life", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15137796", "text": "time period", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15169331", "text": "time of life", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "years", "specificity": "0.0", "type": "LAT" }, - { "text": "year", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "New York Yankees", - "cookedLabel": "New York Yankees", - "pageID": "4848143", - "editDist": 0.0, - "labelProbability": 0.643442, - "logPopularity": 8.137103389639302, - "score": 0.9600832390585778, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Yankee", - "cookedLabel": "Yankee", - "pageID": "38936", - "editDist": 0.0, - "labelProbability": 0.142782, - "logPopularity": 2.8903717578961645, - "score": 0.03714944875114816, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "yankees", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr002240", - "qText": "who is the current president of dominican republic 2011?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "587299", "text": "position", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10182584", "text": "head", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10488931", "text": "presiding officer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10184340", "text": "head of state", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9633690", "text": "communicator", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10541628", "text": "representative", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "597922", "text": "presidency", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10371605", "text": "negotiator", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9646208", "text": "leader", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9778216", "text": "academic administrator", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "583425", "text": "occupation", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9790372", "text": "administrator", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9985785", "text": "corporate executive", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10089452", "text": "executive", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "president", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "United Nations Economic and Social Council", - "cookedLabel": "United Nations Economic and Social Council", - "pageID": "31958", - "editDist": 0.0, - "labelProbability": 0.47619, - "logPopularity": 3.6888794541139363, - "score": 0.129607165691624, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Heather Knight (educator)", - "cookedLabel": "Heather Knight", - "pageID": "31577192", - "editDist": 0.0, - "labelProbability": 0.285714, - "logPopularity": 4.634728988229636, - "score": 0.09863712707634434, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mahmoud Abbas", - "cookedLabel": "Mahmoud Abbas", - "pageID": "232595", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 5.10594547390058, - "score": 0.07002029201560565, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of Presidents of the Dominican Republic", - "cookedLabel": "president of dominican republic", - "pageID": "451896", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.02535169073515, - "score": 0.7784529784807407, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2011", - "cookedLabel": "2011", - "pageID": "36225", - "editDist": 0.0, - "labelProbability": 0.353935, - "logPopularity": 2.6390573296152584, - "score": 0.1080943974551401, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the current president of dominican republic 2011", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "current president of dominican republic 2011", "type": "CluePhrase", "weight": 0.99 }, - { "label": "current president", "type": "ClueNE", "weight": 1.11 }, - { "label": "dominican republic 2011", "type": "CluePhrase", "weight": 0.99 } - ] - }, - { - "qId": "wqr002260", - "qText": "where did william howard taft go to high school?", - "SV": ["go"], - "lemmaSV": ["go"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "William Howard Taft", - "cookedLabel": "William Howard Taft", - "pageID": "33522", - "editDist": 0.0, - "labelProbability": 0.998964, - "logPopularity": 6.013715156042802, - "score": 0.9796745567601535, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Goto", - "cookedLabel": "Goto", - "pageID": "23307350", - "editDist": 0.0, - "labelProbability": 0.233129, - "logPopularity": 3.258096538021482, - "score": 0.06728267116099904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "High school", - "cookedLabel": "High school", - "pageID": "42556", - "editDist": 0.0, - "labelProbability": 0.710763, - "logPopularity": 8.718990524710849, - "score": 0.9599863160325811, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "high school?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr002280", - "qText": "where are the headquarters of the united nations organization found?", - "SV": ["found"], - "lemmaSV": ["find"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "List of districts of India", - "cookedLabel": "headquarters", - "pageID": "602648", - "editDist": 0.0, - "labelProbability": 0.0687356, - "logPopularity": 9.829248802600596, - "score": 0.6382891701153312, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Headquarters", - "cookedLabel": "Headquarters", - "pageID": "745008", - "editDist": 0.0, - "labelProbability": 0.515517, - "logPopularity": 4.174387269895637, - "score": 0.5534525447014913, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Headquarters (album)", - "cookedLabel": "Headquarters", - "pageID": "2098872", - "editDist": 0.0, - "labelProbability": 0.199538, - "logPopularity": 4.330733340286331, - "score": 0.1062274079951419, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "United Nations", - "cookedLabel": "United Nations", - "pageID": "31769", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.66988092298052, - "score": 0.9357004755501552, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Found (band)", - "cookedLabel": "Found", - "pageID": "13820845", - "editDist": 0.0, - "labelProbability": 0.439739, - "logPopularity": 4.02535169073515, - "score": 0.2280822046564627, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Found (novel)", - "cookedLabel": "Found", - "pageID": "20797033", - "editDist": 0.0, - "labelProbability": 0.439739, - "logPopularity": 3.970291913552122, - "score": 0.22231829380182722, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Found Aircraft", - "cookedLabel": "Found Aircraft", - "pageID": "9649134", - "editDist": 0.0, - "labelProbability": 0.439739, - "logPopularity": 3.828641396489095, - "score": 0.20797155950833424, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Found (Rossetti)", - "cookedLabel": "Found", - "pageID": "34500037", - "editDist": 0.0, - "labelProbability": 0.439739, - "logPopularity": 3.4339872044851463, - "score": 0.1716491629243687, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Found (album)", - "cookedLabel": "Found", - "pageID": "38290880", - "editDist": 0.0, - "labelProbability": 0.439739, - "logPopularity": 3.9889840465642745, - "score": 0.22426333707451532, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the headquarters of the united nations organization", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "headquarters of the united nations organization", "type": "CluePhrase", "weight": 0.99 }, - { "label": "united nations organization", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr002300", - "qText": "who was dr seuss?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "seuss", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Dr. Seuss", - "cookedLabel": "Dr. Seuss", - "pageID": "8855", - "editDist": 0.0, - "labelProbability": 0.910811, - "logPopularity": 5.805134968916488, - "score": 0.8715313449939921, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "dr seuss", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002320", - "qText": "who won 2001 fa cup?", - "SV": ["won"], - "lemmaSV": ["win"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "2010–11 FA Cup", - "cookedLabel": "2010–11 FA Cup", - "pageID": "27876486", - "editDist": 1.5, - "labelProbability": 0.0, - "logPopularity": 7.076653815443951, - "score": 0.2752145800797534, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "2001", - "cookedLabel": "2001", - "pageID": "34551", - "editDist": 0.0, - "labelProbability": 0.600947, - "logPopularity": 2.833213344056216, - "score": 0.2326426036402682, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "FA Cup", - "cookedLabel": "FA Cup", - "pageID": "11237", - "editDist": 0.0, - "labelProbability": 0.818182, - "logPopularity": 7.216709486709457, - "score": 0.9410383616804073, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "fa cup?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr002340", - "qText": "what was eli whitney job?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6406508", "text": "book", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10344679", "text": "hero", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "783339", "text": "robbery", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "720746", "text": "duty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "576778", "text": "work", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4014270", "text": "product", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9646208", "text": "leader", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6365164", "text": "coding system", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "1082290", "text": "group action", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6367301", "text": "code", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4609402", "text": "workplace", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6581154", "text": "program", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13943868", "text": "condition", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6371284", "text": "writing", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "770190", "text": "felony", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "734044", "text": "wrongdoing", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "8637636", "text": "point", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "746303", "text": "transgression", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "1132241", "text": "duty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14431490", "text": "difficulty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "767761", "text": "crime", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "6578068", "text": "software", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "1125919", "text": "social control", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24900", "text": "state", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "782543", "text": "larceny", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6374360", "text": "writing", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "767587", "text": "offense", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8596234", "text": "geographic point", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6582286", "text": "application", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9653829", "text": "unfortunate", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6403644", "text": "section", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6360590", "text": "written communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "job", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Eli Whitney", - "cookedLabel": "Eli Whitney", - "pageID": "9732", - "editDist": 2.2, - "labelProbability": 0.0, - "logPopularity": 4.442651256490317, - "score": 0.05922375965079095, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Eli Whitney", - "cookedLabel": "Eli Whitney", - "pageID": "9732", - "editDist": 0.0, - "labelProbability": 0.961735, - "logPopularity": 4.442651256490317, - "score": 0.8069882144943348, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "eli whitney job", "type": "ClueNE", "weight": 2.8000000000000003 }, - { "label": "eli whitney", "type": "ClueNgram", "weight": 1.01 } - ] - }, - { - "qId": "wqr002360", - "qText": "what football teams did emmitt smith play for?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8010371", "text": "animal group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "teams", "specificity": "0.0", "type": "LAT" }, - { "text": "team", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Emmitt Smith", - "cookedLabel": "Emmitt Smith", - "pageID": "154857", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.990432586778736, - "score": 0.9632491468449639, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "football", "type": "ClueToken", "weight": 1.0 }] - }, - { - "qId": "wqr002380", - "qText": "where was karl marx buried?", - "SV": ["buried"], - "lemmaSV": ["bury"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Karl Marx", - "cookedLabel": "Karl Marx", - "pageID": "16743", - "editDist": 0.0, - "labelProbability": 0.964462, - "logPopularity": 6.3473892096560105, - "score": 0.9804868241857398, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002400", - "qText": "who portrayed indiana jones in raiders of the lost ark?", - "SV": ["portrayed"], - "lemmaSV": ["portray"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Indiana Jones", - "cookedLabel": "Indiana Jones", - "pageID": "14814", - "editDist": 0.0, - "labelProbability": 0.868808, - "logPopularity": 5.056245805348308, - "score": 0.8463875964133596, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Indiana Jones (franchise)", - "cookedLabel": "Indiana Jones", - "pageID": "11903589", - "editDist": 0.0, - "labelProbability": 0.053644, - "logPopularity": 4.6443908991413725, - "score": 0.036496037661396225, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Raiders of the Lost Ark", - "cookedLabel": "Raiders of the Lost Ark", - "pageID": "54166", - "editDist": 0.0, - "labelProbability": 0.921928, - "logPopularity": 4.700480365792417, - "score": 0.8503381357740963, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Raiders of the Lost Ark (video game)", - "cookedLabel": "Raiders of the Lost Ark", - "pageID": "3490208", - "editDist": 0.0, - "labelProbability": 0.0511881, - "logPopularity": 3.8066624897703196, - "score": 0.022154971584016563, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002420", - "qText": "where do islamic people go to worship?", - "SV": ["go"], - "lemmaSV": ["go"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Muslim", - "cookedLabel": "Muslim", - "pageID": "19541", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 6.661854740545311, - "score": 0.41274687433602575, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Goto", - "cookedLabel": "Goto", - "pageID": "23307350", - "editDist": 0.0, - "labelProbability": 0.233129, - "logPopularity": 3.258096538021482, - "score": 0.06728267116099904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Worship", - "cookedLabel": "Worship", - "pageID": "70364", - "editDist": 0.0, - "labelProbability": 0.179385, - "logPopularity": 3.6635616461296463, - "score": 0.06704711255854262, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Worship in Hinduism", - "cookedLabel": "Worship in Hinduism", - "pageID": "27185976", - "editDist": 0.0, - "labelProbability": 0.197608, - "logPopularity": 1.0986122886681098, - "score": 0.006218142222364109, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Christian worship", - "cookedLabel": "Christian worship", - "pageID": "1680963", - "editDist": 0.0, - "labelProbability": 0.232346, - "logPopularity": 2.833213344056216, - "score": 0.02035962494161789, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "islamic people", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002440", - "qText": "who the voice of peter griffin?", - "SV": ["griffin"], - "lemmaSV": ["griffin"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Who the *$&% Is Jackson Pollock?", - "cookedLabel": "Who the *$&% Is Jackson Pollock?", - "pageID": "8326198", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.9512437185814275, - "score": 0.03522960933689275, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Voice of Eye", - "cookedLabel": "Voice of Eye", - "pageID": "18208561", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 3.044522437723423, - "score": 0.05112070357256317, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Peter Griffin", - "cookedLabel": "Peter Griffin", - "pageID": "901191", - "editDist": 0.0, - "labelProbability": 0.965579, - "logPopularity": 4.30406509320417, - "score": 0.7965835370212142, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "voice of peter", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002460", - "qText": "what was the capital of ancient israel?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "4606723", "text": "work", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8531106", "text": "center", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8067430", "text": "government", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8647614", "text": "region", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6422547", "text": "book", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8681598", "text": "top", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8540894", "text": "center", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8514304", "text": "area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8069301", "text": "federal government", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13350663", "text": "assets", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8067137", "text": "polity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8530790", "text": "place", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6601855", "text": "publication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "32912", "text": "possession", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4014270", "text": "product", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8665520", "text": "seat", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "capital", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "The Capital", - "cookedLabel": "The Capital", - "pageID": "3257048", - "editDist": 0.0, - "labelProbability": 0.946108, - "logPopularity": 3.713572066704308, - "score": 0.6193803152222629, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "History of ancient Israel and Judah", - "cookedLabel": "History of ancient Israel and Judah", - "pageID": "13876", - "editDist": 0.0, - "labelProbability": 0.972222, - "logPopularity": 3.5263605246161616, - "score": 0.7797092703968829, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the capital of ancient israel", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "capital of ancient israel", "type": "CluePhrase", "weight": 0.99 }, - { "label": "ancient israel", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr002480", - "qText": "what time zone is ontario toronto?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9408804", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2452", "text": "thing", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5227735", "text": "body part", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8647614", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8637636", "text": "point", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8682181", "text": "topographic point", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5232895", "text": "structure", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "zone", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Time zone", - "cookedLabel": "Time zone", - "pageID": "30890", - "editDist": 0.0, - "labelProbability": 0.695076, - "logPopularity": 5.755742213586912, - "score": 0.3922798159673489, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Toronto City Hall", - "cookedLabel": "Toronto City Hall", - "pageID": "798097", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.356708826689592, - "score": 0.827075281260114, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "ontario toronto", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002500", - "qText": "who does zach galifianakis play in the hangover?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [{ "text": "galifianakis", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Zach Galifianakis", - "cookedLabel": "Zach Galifianakis", - "pageID": "644431", - "editDist": 0.0, - "labelProbability": 0.99935, - "logPopularity": 5.087596335232384, - "score": 0.8688719133299732, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Hangover", - "cookedLabel": "The Hangover", - "pageID": "21918632", - "editDist": 0.0, - "labelProbability": 0.509538, - "logPopularity": 4.941642422609304, - "score": 0.2690380695668368, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Hangover", - "cookedLabel": "Hangover", - "pageID": "12274183", - "editDist": 0.0, - "labelProbability": 0.480615, - "logPopularity": 3.258096538021482, - "score": 0.2392757898144611, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Hangover", - "cookedLabel": "Hangover", - "pageID": "12274183", - "editDist": 0.0, - "labelProbability": 0.480615, - "logPopularity": 3.258096538021482, - "score": 0.18367718907139213, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "hangover", "type": "ClueNE", "weight": 1.11 }, - { "label": "hangover?", "type": "ClueNgram", "weight": 1.01 } - ] - }, - { - "qId": "wqr002520", - "qText": "what was gregor mendel known for?", - "SV": ["known"], - "lemmaSV": ["know"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Gregor Mendel", - "cookedLabel": "Gregor Mendel", - "pageID": "12562", - "editDist": 0.0, - "labelProbability": 0.994921, - "logPopularity": 4.68213122712422, - "score": 0.9551192662824227, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002540", - "qText": "what inspired monet?", - "SV": ["inspired"], - "lemmaSV": ["inspire"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Claude Monet", - "cookedLabel": "Claude Monet", - "pageID": "6548", - "editDist": 0.0, - "labelProbability": 0.98304, - "logPopularity": 5.093750200806762, - "score": 0.872042328775178, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "monet", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002560", - "qText": "what is the actual current local time now in uk?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5097645", "text": "magnitude", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "15137796", "text": "time period", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5824748", "text": "datum", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "15269461", "text": "moment", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4990371", "text": "sound property", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15205381", "text": "point", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5824916", "text": "reading", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "15249282", "text": "term", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7300108", "text": "experience", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13597072", "text": "fundamental quantity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4998633", "text": "rhythmicity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7323507", "text": "case", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5824413", "text": "information", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5100843", "text": "dimension", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "time", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "The Actual (novel)", - "cookedLabel": "The Actual", - "pageID": "9271125", - "editDist": 0.0, - "labelProbability": 0.43609, - "logPopularity": 3.9512437185814275, - "score": 0.2174806877733024, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Actual (band)", - "cookedLabel": "The Actual", - "pageID": "2317012", - "editDist": 0.0, - "labelProbability": 0.43609, - "logPopularity": 3.784189633918261, - "score": 0.2009063673527151, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Time zone", - "cookedLabel": "Time zone", - "pageID": "30890", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 5.755742213586912, - "score": 0.06527190474495301, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "The Rolling Stones, Now!", - "cookedLabel": "The Rolling Stones, Now!", - "pageID": "1365981", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.543294782270004, - "score": 0.13527332344325466, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (Fireflight album)", - "cookedLabel": "Now", - "pageID": "34423465", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.343805421853684, - "score": 0.1218734455168399, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now! (Bobby Hutcherson album)", - "cookedLabel": "Now!", - "pageID": "2580911", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.248495242049359, - "score": 0.1158846124007123, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now TV", - "cookedLabel": "Now TV", - "pageID": "2857006", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.430816798843313, - "score": 0.1275717260871764, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (newspaper)", - "cookedLabel": "Now", - "pageID": "1058750", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.532599493153256, - "score": 0.1345244471522166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "United Kingdom", - "cookedLabel": "United Kingdom", - "pageID": "31717", - "editDist": 0.0, - "labelProbability": 0.514067, - "logPopularity": 11.570967932364097, - "score": 0.9746614553298311, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "UK (band)", - "cookedLabel": "UK", - "pageID": "925525", - "editDist": 0.0, - "labelProbability": 0.153453, - "logPopularity": 4.836281906951478, - "score": 0.04589382688062347, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ukrainian language", - "cookedLabel": "Ukrainian language", - "pageID": "46279", - "editDist": 0.0, - "labelProbability": 0.153453, - "logPopularity": 5.963579343618446, - "score": 0.08642580612114378, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "University of Kentucky", - "cookedLabel": "University of Kentucky", - "pageID": "284368", - "editDist": 0.0, - "labelProbability": 0.153453, - "logPopularity": 6.313548046277095, - "score": 0.10450881959633042, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Uttarakhand", - "cookedLabel": "Uttarakhand", - "pageID": "1429154", - "editDist": 0.0, - "labelProbability": 0.153453, - "logPopularity": 6.959398512133975, - "score": 0.1467154939445143, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the actual current local time now in uk", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "actual current local time now in uk", "type": "CluePhrase", "weight": 0.99 }, - { "label": "actual current local time now", "type": "CluePhrase", "weight": 0.99 }, - { "label": "actual current local time", "type": "CluePhrase", "weight": 0.99 }, - { "label": "current", "type": "ClueToken", "weight": 1.0 }, - { "label": "uk", "type": "ClueNE", "weight": 1.1 } - ] - }, - { - "qId": "wqr002580", - "qText": "what are the four harry potter house names?", - "SV": [], - "lemmaSV": [], - "LAT": [], - "Concept": [ - { - "fullLabel": "The Four (TV series)", - "cookedLabel": "The Four", - "pageID": "18817858", - "editDist": 0.0, - "labelProbability": 0.417476, - "logPopularity": 4.465908118654584, - "score": 0.25784972037557835, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Four (film)", - "cookedLabel": "The Four", - "pageID": "36856510", - "editDist": 0.0, - "labelProbability": 0.216828, - "logPopularity": 3.8066624897703196, - "score": 0.08509955283778954, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of Forgotten Realms characters", - "cookedLabel": "the four", - "pageID": "4308754", - "editDist": 0.0, - "labelProbability": 0.216828, - "logPopularity": 4.927253685157205, - "score": 0.15411933501456931, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Harry Potter", - "cookedLabel": "Harry Potter", - "pageID": "2387806", - "editDist": 0.0, - "labelProbability": 0.694104, - "logPopularity": 4.762173934797756, - "score": 0.5967886135480428, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ephraim B. Potter House", - "cookedLabel": "Ephraim B. Potter House", - "pageID": "28900394", - "editDist": 0.0, - "labelProbability": 0.615385, - "logPopularity": 4.007333185232471, - "score": 0.3958885384021526, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Potter House (St. Petersburg, Florida)", - "cookedLabel": "Potter House", - "pageID": "7396470", - "editDist": 0.0, - "labelProbability": 0.615385, - "logPopularity": 3.9318256327243257, - "score": 0.38510636851009705, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Skene Manor", - "cookedLabel": "Skene Manor", - "pageID": "27689277", - "editDist": 0.0, - "labelProbability": 0.615385, - "logPopularity": 3.970291913552122, - "score": 0.39058583138487807, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Potter House (Rock Island, Illinois)", - "cookedLabel": "Potter House", - "pageID": "31342420", - "editDist": 0.0, - "labelProbability": 0.615385, - "logPopularity": 3.9889840465642745, - "score": 0.3932585908145613, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Arnold Potter House", - "cookedLabel": "Arnold Potter House", - "pageID": "24991909", - "editDist": 0.0, - "labelProbability": 0.615385, - "logPopularity": 3.9889840465642745, - "score": 0.3932585908145613, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the four harry potter house names", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "four harry potter house names", "type": "CluePhrase", "weight": 0.99 }, - { "label": "potter house names", "type": "CluePhrase", "weight": 0.99 }, - { "label": "potter house", "type": "ClueNE", "weight": 1.11 }, - { "label": "names", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr002600", - "qText": "what drugs were in whitney houston when she died?", - "SV": ["died"], - "lemmaSV": ["die"], - "LAT": [ - { "synset": "14802595", "text": "agent", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "20270", "text": "substance", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7347", "text": "causal agent", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "drugs", "specificity": "0.0", "type": "LAT" }, - { "text": "drug", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Whitney Houston", - "cookedLabel": "Whitney Houston", - "pageID": "34071", - "editDist": 0.0, - "labelProbability": 0.96676, - "logPopularity": 6.267200548541362, - "score": 0.9470167387264969, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "She Died", - "cookedLabel": "She Died", - "pageID": "42580221", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.1354942159291497, - "score": 0.02189317471715583, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Death", - "cookedLabel": "Death", - "pageID": "8221", - "editDist": 0.0, - "labelProbability": 0.32493, - "logPopularity": 4.90527477843843, - "score": 0.22812175118227904, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "whitney houston when she died", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "died?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr002620", - "qText": "where can you buy amazon kindle?", - "SV": ["buy"], - "lemmaSV": ["buy"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "You", - "cookedLabel": "You", - "pageID": "464907", - "editDist": 0.0, - "labelProbability": 0.166744, - "logPopularity": 3.332204510175204, - "score": 0.05265639583854093, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "You (Juju album)", - "cookedLabel": "You", - "pageID": "32465927", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.584967478670572, - "score": 0.030978008205098034, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "To Know That You're Alive", - "cookedLabel": "To Know That You're Alive", - "pageID": "16113542", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.543294782270004, - "score": 0.03023619203088639, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "You County", - "cookedLabel": "You County", - "pageID": "24702306", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.477336814478207, - "score": 0.029097127371004054, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "You (Ten Sharp song)", - "cookedLabel": "You", - "pageID": "18041571", - "editDist": 0.0, - "labelProbability": 0.0973713, - "logPopularity": 4.454347296253507, - "score": 0.028709976328474045, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Amazon Kindle", - "cookedLabel": "Amazon Kindle", - "pageID": "14312829", - "editDist": 0.0, - "labelProbability": 0.99779, - "logPopularity": 5.0106352940962555, - "score": 0.9065351855269641, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "you", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002640", - "qText": "what type of poetry does john donne write?", - "SV": ["write"], - "lemmaSV": ["write"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2855782", "text": "block", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847533", "text": "kind", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8008892", "text": "taxonomic group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "type", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Poetry", - "cookedLabel": "Poetry", - "pageID": "22926", - "editDist": 0.0, - "labelProbability": 0.39542, - "logPopularity": 7.283448228756631, - "score": 0.6299349710349235, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Poetry (magazine)", - "cookedLabel": "Poetry", - "pageID": "1088973", - "editDist": 0.0, - "labelProbability": 0.0758634, - "logPopularity": 4.499809670330265, - "score": 0.02677974572143383, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Buddhist poetry", - "cookedLabel": "Buddhist poetry", - "pageID": "20337701", - "editDist": 0.0, - "labelProbability": 0.0549195, - "logPopularity": 2.995732273553991, - "score": 0.010034279784427006, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "John Donne", - "cookedLabel": "John Donne", - "pageID": "15838", - "editDist": 0.0, - "labelProbability": 0.988386, - "logPopularity": 4.969813299576001, - "score": 0.9608491601426676, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "poetry", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002660", - "qText": "what does a american rottweiler look like?", - "SV": ["look"], - "lemmaSV": ["look"], - "LAT": [ - { "synset": "1320032", "text": "domestic animal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "2086723", "text": "dog", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "1474323", "text": "vertebrate", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "2077948", "text": "carnivore", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "2085998", "text": "canine", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "2106058", "text": "working dog", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "15568", "text": "animal", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "1864419", "text": "mammal", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "1468898", "text": "chordate", "specificity": "-9.0", "type": "WordnetLAT" }, - { "synset": "2107175", "text": "shepherd dog", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "1889397", "text": "placental", "specificity": "-6.0", "type": "WordnetLAT" }, - { "text": "rottweiler", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Goon Affiliated", - "cookedLabel": "Goon Affiliated", - "pageID": "23030040", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.969813299576001, - "score": 0.0859675096458276, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "a american rottweiler", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "american rottweiler", "type": "CluePhrase", "weight": 0.99 }, - { "label": "american", "type": "ClueToken", "weight": 1.0 }, - { "label": "rottweiler", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "look like?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr002680", - "qText": "what is president abraham lincoln known for?", - "SV": ["known"], - "lemmaSV": ["know"], - "LAT": [], - "Concept": [ - { - "fullLabel": "Abraham Lincoln", - "cookedLabel": "Abraham Lincoln", - "pageID": "307", - "editDist": 0.0, - "labelProbability": 0.983871, - "logPopularity": 6.089044875446846, - "score": 0.9791874922326462, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "president abraham lincoln", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr002700", - "qText": "what currency is used in hungary?", - "SV": ["used"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "4923519", "text": "property", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4772610", "text": "prevalence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5051824", "text": "temporal arrangement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5051679", "text": "temporal property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4771667", "text": "generality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5057266", "text": "presentness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5053160", "text": "timing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "currency", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Hungary", - "cookedLabel": "Hungary", - "pageID": "13275", - "editDist": 0.0, - "labelProbability": 0.578907, - "logPopularity": 9.512369038134885, - "score": 0.9377697837574935, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Hungary national football team", - "cookedLabel": "Hungary national football team", - "pageID": "679739", - "editDist": 0.0, - "labelProbability": 0.0682321, - "logPopularity": 7.562681246721884, - "score": 0.1430294998897364, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "hungary", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002720", - "qText": "what college did joe montana play for?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "8070328", "text": "institution", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8293263", "text": "educational institution", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7981699", "text": "body", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4348764", "text": "structure", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "2918337", "text": "building complex", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "college", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Joe Montana", - "cookedLabel": "Joe Montana", - "pageID": "295701", - "editDist": 0.0, - "labelProbability": 0.97873, - "logPopularity": 4.8283137373023015, - "score": 0.9556853572424215, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002740", - "qText": "who plays alan parrish in jumanji?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Alan Parrish", - "cookedLabel": "Alan Parrish", - "pageID": "5121242", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 1.9459101490553132, - "score": 0.015094647817577586, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Jumanji", - "cookedLabel": "Jumanji", - "pageID": "3700174", - "editDist": 0.0, - "labelProbability": 0.77739, - "logPopularity": 4.919980925828125, - "score": 0.7046642431771835, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002760", - "qText": "where did the arizona diamondbacks play?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Arizona Diamondbacks", - "cookedLabel": "Arizona Diamondbacks", - "pageID": "2129", - "editDist": 0.0, - "labelProbability": 0.927129, - "logPopularity": 6.6039438246004725, - "score": 0.9801458454473052, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002780", - "qText": "where are riddell helmets manufactured?", - "SV": ["manufactured"], - "lemmaSV": ["manufacture"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [], - "Clue": [ - { "label": "riddell helmets", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "riddell", "type": "ClueToken", "weight": 1.0 }, - { "label": "helmets", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr002800", - "qText": "what movies did matt bomer play in?", - "SV": ["play"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "4014270", "text": "product", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6631572", "text": "show", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7303344", "text": "social event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "movies", "specificity": "0.0", "type": "LAT" }, - { "text": "movie", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Matt Bomer", - "cookedLabel": "Matt Bomer", - "pageID": "916180", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.276666119016055, - "score": 0.944689765954643, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002820", - "qText": "what did roger sherman do for a living?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [ - { "synset": "9794206", "text": "advocate", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9759416", "text": "American Revolutionary leader", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9962718", "text": "commissioned military officer", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10143381", "text": "general", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9655706", "text": "worker", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "9962449", "text": "commissioned officer", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8643858", "text": "municipality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8683242", "text": "town", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10625393", "text": "skilled worker", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8637636", "text": "point", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8635538", "text": "peak", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10365746", "text": "nationalist", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10145323", "text": "general officer", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10365929", "text": "nationalist leader", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8693705", "text": "urban area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10336665", "text": "military officer", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "10602198", "text": "serviceman", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "9383019", "text": "mountain peak", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9646208", "text": "leader", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8682181", "text": "topographic point", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-5.0", "type": "WordnetLAT" }, - { "text": "sherman", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Roger Sherman", - "cookedLabel": "Roger Sherman", - "pageID": "260910", - "editDist": 0.0, - "labelProbability": 0.983931, - "logPopularity": 4.955827057601261, - "score": 0.8508251533681768, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "For a Living", - "cookedLabel": "For a Living", - "pageID": "14785803", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 3.8066624897703196, - "score": 0.8263293338112999, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "for a living?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr002840", - "qText": "where is christina aguilera from?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Christina Aguilera", - "cookedLabel": "Christina Aguilera", - "pageID": "144171", - "editDist": 0.0, - "labelProbability": 0.939469, - "logPopularity": 6.1779441140506, - "score": 0.9758827188302984, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Christina Aguilera (album)", - "cookedLabel": "Christina Aguilera", - "pageID": "555339", - "editDist": 0.0, - "labelProbability": 0.0512755, - "logPopularity": 5.278114659230517, - "score": 0.1292098233400038, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002860", - "qText": "what is the dominant language of jamaica?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5658174", "text": "faculty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6297048", "text": "word", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6294878", "text": "language unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6399623", "text": "text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5778661", "text": "higher cognitive process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6376912", "text": "matter", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "language", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Linguistic imperialism", - "cookedLabel": "Linguistic imperialism", - "pageID": "253283", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.833213344056216, - "score": 0.016623034801995763, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Jamaica", - "cookedLabel": "Jamaica", - "pageID": "15660", - "editDist": 0.0, - "labelProbability": 0.69147, - "logPopularity": 8.129764445794171, - "score": 0.9168663449805127, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the dominant language of jamaica", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "dominant language of jamaica", "type": "CluePhrase", "weight": 0.99 }, - { "label": "dominant language", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr002880", - "qText": "where was anne frank born?", - "SV": ["born"], - "lemmaSV": ["bear"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Anne Frank", - "cookedLabel": "Anne Frank", - "pageID": "804581", - "editDist": 0.0, - "labelProbability": 0.987277, - "logPopularity": 5.056245805348308, - "score": 0.9625708043797679, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002900", - "qText": "what timezone is minnesota in?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "text": "timezone", "specificity": "0.0", "type": "LAT" }], - "Concept": [ - { - "fullLabel": "Time zone", - "cookedLabel": "Time zone", - "pageID": "30890", - "editDist": 0.0, - "labelProbability": 0.797753, - "logPopularity": 5.755742213586912, - "score": 0.7369710831675726, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Minnesota", - "cookedLabel": "Minnesota", - "pageID": "19590", - "editDist": 0.0, - "labelProbability": 0.747924, - "logPopularity": 9.432122997651051, - "score": 0.988312426796726, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "timezone", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr002920", - "qText": "where does frida kahlo live now?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Frida Kahlo", - "cookedLabel": "Frida Kahlo", - "pageID": "162276", - "editDist": 0.0, - "labelProbability": 0.995562, - "logPopularity": 4.74493212836325, - "score": 0.9568289214003222, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Rolling Stones, Now!", - "cookedLabel": "The Rolling Stones, Now!", - "pageID": "1365981", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.543294782270004, - "score": 0.13527332344325466, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now TV", - "cookedLabel": "Now TV", - "pageID": "2857006", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.430816798843313, - "score": 0.1275717260871764, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now! (Bobby Hutcherson album)", - "cookedLabel": "Now!", - "pageID": "2580911", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.248495242049359, - "score": 0.1158846124007123, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (Fireflight album)", - "cookedLabel": "Now", - "pageID": "34423465", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.343805421853684, - "score": 0.1218734455168399, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Now (newspaper)", - "cookedLabel": "Now", - "pageID": "1058750", - "editDist": 0.0, - "labelProbability": 0.233777, - "logPopularity": 4.532599493153256, - "score": 0.1345244471522166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "now", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr002940", - "qText": "where does the jordan river end?", - "SV": ["end"], - "lemmaSV": ["end"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Jordan River", - "cookedLabel": "Jordan River", - "pageID": "47910", - "editDist": 0.0, - "labelProbability": 0.845543, - "logPopularity": 4.74493212836325, - "score": 0.9175028154365124, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Jordan River (Utah)", - "cookedLabel": "Jordan River", - "pageID": "480458", - "editDist": 0.0, - "labelProbability": 0.117741, - "logPopularity": 4.672828834461906, - "score": 0.12286016435052517, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr002960", - "qText": "who was papa doc in real life?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "8139116", "text": "federal department", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-10.0", "type": "WordnetLAT" }, - { "synset": "8237635", "text": "division", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8094128", "text": "administrative unit", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "10184702", "text": "health professional", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "8140150", "text": "executive department", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8136796", "text": "government department", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10325469", "text": "medical practitioner", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10499838", "text": "professional", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-9.0", "type": "WordnetLAT" }, - { "synset": "8131836", "text": "department", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "doc", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "François Duvalier", - "cookedLabel": "François Duvalier", - "pageID": "70844", - "editDist": 0.0, - "labelProbability": 0.999273, - "logPopularity": 4.969813299576001, - "score": 0.8153300249179136, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Real life", - "cookedLabel": "Real life", - "pageID": "238114", - "editDist": 0.0, - "labelProbability": 0.835073, - "logPopularity": 3.091042453358316, - "score": 0.5920343108437078, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "In Real Life", - "cookedLabel": "In Real Life", - "pageID": "5313706", - "editDist": 0.0, - "labelProbability": 0.164927, - "logPopularity": 3.8066624897703196, - "score": 0.09291252556801953, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "papa doc in real life", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "papa doc", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr002980", - "qText": "where do most earthquakes happen in japan?", - "SV": ["happen"], - "lemmaSV": ["happen"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "In Japan! (Buck Owens album)", - "cookedLabel": "In Japan!", - "pageID": "19866078", - "editDist": 0.0, - "labelProbability": 0.618506, - "logPopularity": 3.8501476017100584, - "score": 0.45819433935440645, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Jackson 5 in Japan", - "cookedLabel": "The Jackson 5 in Japan", - "pageID": "2160503", - "editDist": 0.0, - "labelProbability": 0.618506, - "logPopularity": 3.6635616461296463, - "score": 0.4305607973886944, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "In Japan (Mr. Big album)", - "cookedLabel": "In Japan", - "pageID": "983201", - "editDist": 0.0, - "labelProbability": 0.618506, - "logPopularity": 3.713572066704308, - "score": 0.43793234745286186, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "most earthquakes", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "most", "type": "ClueToken", "weight": 1.0 }, - { "label": "earthquakes", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "in japan?", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr003000", - "qText": "what kind of language do china speak?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Language", - "cookedLabel": "Language", - "pageID": "17524", - "editDist": 0.0, - "labelProbability": 0.25619, - "logPopularity": 5.493061443340548, - "score": 0.23464873939729444, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "China", - "cookedLabel": "China", - "pageID": "5405", - "editDist": 0.0, - "labelProbability": 0.498325, - "logPopularity": 10.34663372761198, - "score": 0.9789357204017793, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003020", - "qText": "what language do irish people speak?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5658174", "text": "faculty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6297048", "text": "word", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6294878", "text": "language unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6399623", "text": "text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5778661", "text": "higher cognitive process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6376912", "text": "matter", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "language", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Irish people", - "cookedLabel": "Irish people", - "pageID": "775859", - "editDist": 0.0, - "labelProbability": 0.712121, - "logPopularity": 7.537962659768208, - "score": 0.9698631076524369, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of Irish people", - "cookedLabel": "irish people", - "pageID": "68826", - "editDist": 0.0, - "labelProbability": 0.136364, - "logPopularity": 1.0986122886681098, - "score": 0.01755838013404058, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Sunday People", - "cookedLabel": "The Sunday People", - "pageID": "689457", - "editDist": 0.0, - "labelProbability": 0.0757576, - "logPopularity": 3.4339872044851463, - "score": 0.052060329004864195, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "irish people", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003040", - "qText": "where do people speak burmese?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "People (magazine)", - "cookedLabel": "People", - "pageID": "507970", - "editDist": 0.0, - "labelProbability": 0.174827, - "logPopularity": 4.584967478670572, - "score": 0.10998291320708259, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "People", - "cookedLabel": "People", - "pageID": "3488351", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.0794415416798357, - "score": 0.031159073502995693, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Burmese language", - "cookedLabel": "Burmese language", - "pageID": "338207", - "editDist": 0.0, - "labelProbability": 0.39045, - "logPopularity": 5.736572297479192, - "score": 0.39675240447609883, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Mon script", - "cookedLabel": "Mon script", - "pageID": "28769803", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.6375861597263857, - "score": 0.02936365156468993, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Burmese python", - "cookedLabel": "Burmese python", - "pageID": "819149", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.8066624897703196, - "score": 0.03239717786085767, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Burmese (horse)", - "cookedLabel": "Burmese", - "pageID": "4473528", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.9444389791664403, - "score": 0.019568435709475668, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Burmese (cat)", - "cookedLabel": "Burmese", - "pageID": "261787", - "editDist": 0.0, - "labelProbability": 0.0608696, - "logPopularity": 3.332204510175204, - "score": 0.03303719981396769, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "burmese?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003060", - "qText": "where does the band metallica live?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "The Band", - "cookedLabel": "The Band", - "pageID": "30965", - "editDist": 0.0, - "labelProbability": 0.194761, - "logPopularity": 5.811140992976701, - "score": 0.2811461575746737, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Musical ensemble", - "cookedLabel": "Musical ensemble", - "pageID": "20180", - "editDist": 0.0, - "labelProbability": 0.247375, - "logPopularity": 4.5217885770490405, - "score": 0.18686048882274314, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Rede Bandeirantes", - "cookedLabel": "Rede Bandeirantes", - "pageID": "7673602", - "editDist": 0.0, - "labelProbability": 0.194761, - "logPopularity": 4.584967478670572, - "score": 0.15782858823186155, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Band, Mureș", - "cookedLabel": "Band, Mureș", - "pageID": "12023460", - "editDist": 0.0, - "labelProbability": 0.194761, - "logPopularity": 4.6913478822291435, - "score": 0.16649881397835484, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The Band (album)", - "cookedLabel": "The Band", - "pageID": "197540", - "editDist": 0.0, - "labelProbability": 0.194761, - "logPopularity": 4.736198448394496, - "score": 0.17026686952351894, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the band metallica", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "band metallica", "type": "CluePhrase", "weight": 0.99 }, - { "label": "band", "type": "ClueNE", "weight": 1.11 }, - { "label": "metallica", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr003080", - "qText": "when did the arab israeli war start?", - "SV": ["start"], - "lemmaSV": ["start"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "1948 Arab–Israeli War", - "cookedLabel": "1948 Arab–Israeli War", - "pageID": "36197", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.552959584921617, - "score": 0.9319954088284126, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Arab–Israeli conflict", - "cookedLabel": "Arab–Israeli conflict", - "pageID": "7960202", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.7535901911063645, - "score": 0.18278984644239052, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "arab israeli war", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003100", - "qText": "what is the dominican republic part of?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "6005806", "text": "discipline", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5677778", "text": "cognitive state", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "3593583", "text": "item", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "720746", "text": "duty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "576778", "text": "work", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2452", "text": "thing", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "32912", "text": "possession", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "787849", "text": "attempt", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "24900", "text": "state", "specificity": "-7.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8610818", "text": "line", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3553", "text": "whole", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13943868", "text": "condition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13350663", "text": "assets", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5690411", "text": "curiosity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "14396987", "text": "psychological state", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5678554", "text": "concern", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "549839", "text": "portrayal", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6167042", "text": "performing arts", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "549363", "text": "acting", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6008444", "text": "knowledge domain", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-8.0", "type": "WordnetLAT" }, - { "synset": "5690773", "text": "interest", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "27365", "text": "location", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6163352", "text": "humanistic discipline", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7041860", "text": "tune", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "2684", "text": "object", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "part", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Dominican Republic", - "cookedLabel": "Dominican Republic", - "pageID": "8060", - "editDist": 0.0, - "labelProbability": 0.671348, - "logPopularity": 8.183676582620658, - "score": 0.935555978041983, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the dominican republic part of", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "dominican republic part of", "type": "CluePhrase", "weight": 0.99 }, - { "label": "dominican republic part", "type": "CluePhrase", "weight": 0.99 }, - { "label": "part", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr003120", - "qText": "who plays harold saxon in doctor who?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [ - { "synset": "10184702", "text": "health professional", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9644715", "text": "intellectual", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "427931", "text": "diversion", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10499838", "text": "professional", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "432833", "text": "play", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "408356", "text": "activity", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "10325469", "text": "medical practitioner", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10725264", "text": "theologian", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "10577282", "text": "scholar", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "doctor", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Story arcs in Doctor Who", - "cookedLabel": "Story arcs in Doctor Who", - "pageID": "5600732", - "editDist": 0.0, - "labelProbability": 0.5, - "logPopularity": 3.5553480614894135, - "score": 0.2912742714975, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Master (Doctor Who)", - "cookedLabel": "Master", - "pageID": "155600", - "editDist": 0.0, - "labelProbability": 0.469388, - "logPopularity": 4.23410650459726, - "score": 0.16678700446241362, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Doctor Who", - "cookedLabel": "Doctor Who", - "pageID": "8209", - "editDist": 0.0, - "labelProbability": 0.571429, - "logPopularity": 6.124683390894205, - "score": 0.3894211731510105, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "doctor who?", "type": "ClueNE", "weight": 1.6 }] - }, - { - "qId": "wqr003140", - "qText": "what kind of political system does iran have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Coal Run Village, Kentucky", - "cookedLabel": "Coal Run Village, Kentucky", - "pageID": "115405", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 4.718498871295094, - "score": 0.05631447464088684, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Political system", - "cookedLabel": "Political system", - "pageID": "258724", - "editDist": 0.0, - "labelProbability": 0.238095, - "logPopularity": 2.1972245773362196, - "score": 0.05176398693827355, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Politics of Germany", - "cookedLabel": "Politics of Germany", - "pageID": "11935", - "editDist": 0.0, - "labelProbability": 0.190476, - "logPopularity": 4.1588830833596715, - "score": 0.050415590449092426, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Iran", - "cookedLabel": "Iran", - "pageID": "14653", - "editDist": 0.0, - "labelProbability": 0.726237, - "logPopularity": 11.200212722793202, - "score": 0.995497558213468, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "political system", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003160", - "qText": "what language do brazil speak?", - "SV": ["speak"], - "lemmaSV": ["speak"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5658174", "text": "faculty", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6297048", "text": "word", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6294878", "text": "language unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5624029", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6399623", "text": "text", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13831419", "text": "part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5778661", "text": "higher cognitive process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6376912", "text": "matter", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "language", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Brazil", - "cookedLabel": "Brazil", - "pageID": "3383", - "editDist": 0.0, - "labelProbability": 0.671435, - "logPopularity": 10.453572350254236, - "score": 0.9909755240376303, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Brazil national football team", - "cookedLabel": "Brazil national football team", - "pageID": "149286", - "editDist": 0.0, - "labelProbability": 0.0639685, - "logPopularity": 7.789868559054706, - "score": 0.336803569077327, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "brazil", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr003180", - "qText": "what type of political system is spain?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2855782", "text": "block", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847533", "text": "kind", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8008892", "text": "taxonomic group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "type", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Coal Run Village, Kentucky", - "cookedLabel": "Coal Run Village, Kentucky", - "pageID": "115405", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 4.718498871295094, - "score": 0.05631447464088684, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Political system", - "cookedLabel": "Political system", - "pageID": "258724", - "editDist": 0.0, - "labelProbability": 0.238095, - "logPopularity": 2.1972245773362196, - "score": 0.05176398693827355, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Politics of Germany", - "cookedLabel": "Politics of Germany", - "pageID": "11935", - "editDist": 0.0, - "labelProbability": 0.190476, - "logPopularity": 4.1588830833596715, - "score": 0.050415590449092426, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Spain", - "cookedLabel": "Spain", - "pageID": "26667", - "editDist": 0.0, - "labelProbability": 0.708413, - "logPopularity": 10.559711378991475, - "score": 0.9928424271025632, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "political system", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003200", - "qText": "who is the 2011 heisman trophy winner?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "10138501", "text": "gambler", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9636221", "text": "contestant", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "winner", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Whois", - "cookedLabel": "Whois", - "pageID": "4315433", - "editDist": 0.0, - "labelProbability": 0.0673077, - "logPopularity": 3.1780538303479458, - "score": 0.031085894047283118, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "2011", - "cookedLabel": "2011", - "pageID": "36225", - "editDist": 0.0, - "labelProbability": 0.353935, - "logPopularity": 2.6390573296152584, - "score": 0.1901151086570081, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Heisman Trophy", - "cookedLabel": "Heisman Trophy", - "pageID": "288191", - "editDist": 0.0, - "labelProbability": 0.957449, - "logPopularity": 3.6109179126442243, - "score": 0.7133737462450445, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "who is", "type": "ClueNE", "weight": 1.11 }, - { "label": "the 2011 heisman trophy winner", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "2011 heisman trophy winner", "type": "CluePhrase", "weight": 0.99 }, - { "label": "winner", "type": "ClueSubjectToken", "weight": 2.5 } - ] - }, - { - "qId": "wqr003220", - "qText": "what is michael buble's style of music?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5757616", "text": "taste", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "19308", "text": "natural object", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5477841", "text": "process", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5755999", "text": "discrimination", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "11696293", "text": "reproductive structure", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4459089", "text": "tool", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5709328", "text": "process", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3580409", "text": "instrumentality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "6611268", "text": "message", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13108385", "text": "plant organ", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5709891", "text": "basic cognitive process", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6799486", "text": "direction", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4819517", "text": "elegance", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13107668", "text": "plant part", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "3569147", "text": "implement", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5847533", "text": "kind", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "style", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Michael Bublé", - "cookedLabel": "Michael Bublé", - "pageID": "621503", - "editDist": 0.0, - "labelProbability": 0.987684, - "logPopularity": 5.641907070938114, - "score": 0.931140929939333, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "michael buble's style of music", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "michael buble's style", "type": "CluePhrase", "weight": 0.99 }, - { "label": "michael buble", "type": "ClueNE", "weight": 1.11 }, - { "label": "style", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "music", "type": "ClueToken", "weight": 1.0 } - ] - }, - { - "qId": "wqr003240", - "qText": "what country did ponce de leon live in?", - "SV": ["live"], - "lemmaSV": ["live"], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7958392", "text": "people", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "country", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Juan Ponce de León", - "cookedLabel": "Juan Ponce de León", - "pageID": "143363", - "editDist": 0.0, - "labelProbability": 0.654321, - "logPopularity": 4.8283137373023015, - "score": 0.8291999303509269, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Daniel Ponce de León", - "cookedLabel": "Daniel Ponce de León", - "pageID": "6145533", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.406719247264253, - "score": 0.15372514941192528, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Fernando Ponce de León", - "cookedLabel": "Fernando Ponce de León", - "pageID": "27239457", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.219507705176107, - "score": 0.13967369476450198, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Nicolás Ponce de León", - "cookedLabel": "Nicolás Ponce de León", - "pageID": "27977311", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.248495242049359, - "score": 0.1417767546521046, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Juan Ponce de León II", - "cookedLabel": "Juan Ponce de León II", - "pageID": "690491", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.454347296253507, - "score": 0.15747964371107007, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "ponce de leon", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003260", - "qText": "what kind of government does vietnam have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Government", - "cookedLabel": "Government", - "pageID": "12229", - "editDist": 0.0, - "labelProbability": 0.139222, - "logPopularity": 5.5093883366279774, - "score": 0.15315169535105763, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Vietnam", - "cookedLabel": "Vietnam", - "pageID": "202354", - "editDist": 0.0, - "labelProbability": 0.661886, - "logPopularity": 8.475120414994329, - "score": 0.9697577991806455, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Vietnam national football team", - "cookedLabel": "Vietnam national football team", - "pageID": "1145542", - "editDist": 0.0, - "labelProbability": 0.0791402, - "logPopularity": 6.386879319362645, - "score": 0.19006117366778638, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Vietnam War", - "cookedLabel": "Vietnam War", - "pageID": "32611", - "editDist": 0.0, - "labelProbability": 0.0640296, - "logPopularity": 8.210939733379021, - "score": 0.3954040036925538, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "vietnam", "type": "ClueNE", "weight": 2.6 }] - }, - { - "qId": "wqr003280", - "qText": "who owns the steelers football team?", - "SV": ["owns"], - "lemmaSV": ["own"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Pittsburgh Steelers", - "cookedLabel": "Pittsburgh Steelers", - "pageID": "23338", - "editDist": 0.0, - "labelProbability": 0.875, - "logPopularity": 7.58629630715272, - "score": 0.9487306991039992, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Football team", - "cookedLabel": "Football team", - "pageID": "10830", - "editDist": 0.0, - "labelProbability": 0.2, - "logPopularity": 2.833213344056216, - "score": 0.04580817491713056, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "steelers football team", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the steelers", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr003300", - "qText": "where is rihanna from ethnically?", - "SV": [], - "lemmaSV": [], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Rihanna", - "cookedLabel": "Rihanna", - "pageID": "2110323", - "editDist": 0.0, - "labelProbability": 0.990786, - "logPopularity": 6.236369590203704, - "score": 0.9743285979093527, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ethnic group", - "cookedLabel": "Ethnic group", - "pageID": "105004", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 5.19295685089021, - "score": 0.8866146234875648, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "ethnically", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr003320", - "qText": "who was the owner of kfc?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9901459", "text": "businessman", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9902168", "text": "businessperson", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "9632262", "text": "capitalist", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "owner", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Ownership", - "cookedLabel": "Ownership", - "pageID": "213897", - "editDist": 0.0, - "labelProbability": 0.858809, - "logPopularity": 4.787491742782046, - "score": 0.7436382102020456, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of professional sports team owners", - "cookedLabel": "owner", - "pageID": "7964652", - "editDist": 0.0, - "labelProbability": 0.111424, - "logPopularity": 0.6931471805599453, - "score": 0.0029799632449238897, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "KFC", - "cookedLabel": "KFC", - "pageID": "37404", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 4.219507705176107, - "score": 0.8134460615072249, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the owner of kfc", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "owner of kfc", "type": "CluePhrase", "weight": 0.99 }, - { "label": "owner", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr003340", - "qText": "what to do in chicago this weekend with kids?", - "SV": ["do"], - "lemmaSV": ["do"], - "LAT": [], - "Concept": [ - { - "fullLabel": "OK Go (album)", - "cookedLabel": "OK Go", - "pageID": "2145907", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.532599493153256, - "score": 0.0674698807859964, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Chicago", - "cookedLabel": "Chicago", - "pageID": "6886", - "editDist": 0.0, - "labelProbability": 0.760141, - "logPopularity": 9.834887461043872, - "score": 0.9767792272076851, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "This Weekend", - "cookedLabel": "This Weekend", - "pageID": "6458704", - "editDist": 0.0, - "labelProbability": 1.0, - "logPopularity": 1.791759469228055, - "score": 0.5868389547717006, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kids (film)", - "cookedLabel": "Kids", - "pageID": "615418", - "editDist": 0.0, - "labelProbability": 0.395721, - "logPopularity": 4.634728988229636, - "score": 0.25809738507671115, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kid Ory", - "cookedLabel": "Kid Ory", - "pageID": "97641", - "editDist": 0.0, - "labelProbability": 0.127217, - "logPopularity": 4.787491742782046, - "score": 0.09989393113722553, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kid Gleason", - "cookedLabel": "Kid Gleason", - "pageID": "673658", - "editDist": 0.0, - "labelProbability": 0.127217, - "logPopularity": 4.836281906951478, - "score": 0.10255706926785661, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kid Cudi", - "cookedLabel": "Kid Cudi", - "pageID": "19583036", - "editDist": 0.0, - "labelProbability": 0.127217, - "logPopularity": 5.68697535633982, - "score": 0.15993343942047092, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Kid Rock", - "cookedLabel": "Kid Rock", - "pageID": "17396", - "editDist": 0.0, - "labelProbability": 0.127217, - "logPopularity": 6.042632833682381, - "score": 0.19072088769920859, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "kids", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr003360", - "qText": "what kind of legal system does australia have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Judicial system of Turkey", - "cookedLabel": "Judicial system of Turkey", - "pageID": "2963429", - "editDist": 0.0, - "labelProbability": 0.984375, - "logPopularity": 3.044522437723423, - "score": 0.5112052886391106, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of national legal systems", - "cookedLabel": "legal system", - "pageID": "154708", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.833213344056216, - "score": 0.025435790265438976, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Australia", - "cookedLabel": "Australia", - "pageID": "4689264", - "editDist": 0.0, - "labelProbability": 0.747368, - "logPopularity": 10.900768235614668, - "score": 0.9951120776945014, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "legal system", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003380", - "qText": "what currency do the ukraine use?", - "SV": ["use"], - "lemmaSV": ["use"], - "LAT": [ - { "synset": "4923519", "text": "property", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4772610", "text": "prevalence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5051824", "text": "temporal arrangement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5051679", "text": "temporal property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4771667", "text": "generality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5057266", "text": "presentness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5053160", "text": "timing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "currency", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Ukraine", - "cookedLabel": "Ukraine", - "pageID": "31750", - "editDist": 0.0, - "labelProbability": 0.607005, - "logPopularity": 9.140239744296693, - "score": 0.9811013965429601, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ukrainian Soviet Socialist Republic", - "cookedLabel": "Ukrainian Soviet Socialist Republic", - "pageID": "376732", - "editDist": 0.0, - "labelProbability": 0.0990504, - "logPopularity": 8.187021067343505, - "score": 0.5142325907595652, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "ukraine", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003400", - "qText": "who plays stewie griffin on family guy?", - "SV": ["plays"], - "lemmaSV": ["play"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Stewie Griffin", - "cookedLabel": "Stewie Griffin", - "pageID": "530189", - "editDist": 0.0, - "labelProbability": 0.945701, - "logPopularity": 4.0943445622221, - "score": 0.8150031749808776, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Family Guy", - "cookedLabel": "Family Guy", - "pageID": "187586", - "editDist": 0.0, - "labelProbability": 0.964212, - "logPopularity": 6.248042874508429, - "score": 0.9258868087401481, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003420", - "qText": "which airport is closest to barcelona port?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "3319968", "text": "facility", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "2690851", "text": "airfield", "specificity": "-1.0", "type": "WordnetLAT" }, - { "text": "airport", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Wang Yangming", - "cookedLabel": "Wang Yangming", - "pageID": "619526", - "editDist": 0.0, - "labelProbability": 0.571429, - "logPopularity": 4.430816798843313, - "score": 0.41085711020826166, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Boards of Canada", - "cookedLabel": "Boards of Canada", - "pageID": "101580", - "editDist": 1.0, - "labelProbability": 0.0, - "logPopularity": 4.9344739331306915, - "score": 0.1957567557938664, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "List of airports in South Dakota", - "cookedLabel": "closest", - "pageID": "5689688", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 2.0794415416798357, - "score": 0.023176803324641126, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "List of airports in Nebraska", - "cookedLabel": "closest", - "pageID": "5544226", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 2.0794415416798357, - "score": 0.023176803324641126, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Port of Barcelona", - "cookedLabel": "Port of Barcelona", - "pageID": "3526295", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.258096538021482, - "score": 0.032580826260036944, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "closest to barcelona port", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "closest", "type": "ClueNE", "weight": 2.6 } - ] - }, - { - "qId": "wqr003440", - "qText": "what should i visit in venice?", - "SV": ["visit"], - "lemmaSV": ["visit"], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "14647071", "text": "chemical element", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "14928812", "text": "halogen", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "13597304", "text": "definite quantity", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "19793", "text": "substance", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13763162", "text": "digit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13750609", "text": "integer", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6841868", "text": "letter", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13603216", "text": "number", "specificity": "-3.0", "type": "WordnetLAT" }, - { "text": "i", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Should I Stay or Should I Go", - "cookedLabel": "Should I Stay or Should I Go", - "pageID": "3629521", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.31748811353631, - "score": 0.028619481308881617, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Venice", - "cookedLabel": "Venice", - "pageID": "32616", - "editDist": 0.0, - "labelProbability": 0.741286, - "logPopularity": 7.216709486709457, - "score": 0.8891086358339383, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Venice, Los Angeles", - "cookedLabel": "Venice, Los Angeles", - "pageID": "32579", - "editDist": 0.0, - "labelProbability": 0.0568927, - "logPopularity": 5.883322388488279, - "score": 0.054677647877007574, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Republic of Venice", - "cookedLabel": "Republic of Venice", - "pageID": "613492", - "editDist": 0.0, - "labelProbability": 0.074145, - "logPopularity": 6.073044534100405, - "score": 0.06556257763924893, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "venice", "type": "ClueNE", "weight": 1.1 }] - }, - { - "qId": "wqr003460", - "qText": "what country is beside france?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7958392", "text": "people", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "country", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Country Is", - "cookedLabel": "Country Is", - "pageID": "25433212", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 3.8501476017100584, - "score": 0.008153680212262971, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "France", - "cookedLabel": "France", - "pageID": "5843419", - "editDist": 3.0, - "labelProbability": 0.0, - "logPopularity": 11.423164857762606, - "score": 0.95697806051839, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [{ "label": "beside france", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003480", - "qText": "who started mary kay?", - "SV": ["started"], - "lemmaSV": ["start"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Mary Kay", - "cookedLabel": "Mary Kay", - "pageID": "1583427", - "editDist": 0.0, - "labelProbability": 0.90873, - "logPopularity": 4.110873864173311, - "score": 0.7896527480944822, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003500", - "qText": "who was the voice of simba?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9794206", "text": "advocate", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4923519", "text": "property", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "7041860", "text": "tune", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10435383", "text": "performer", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9639952", "text": "entertainer", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "30657", "text": "act", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13819354", "text": "linguistic relation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7034009", "text": "music", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "44888", "text": "implementation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4990371", "text": "sound property", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7123727", "text": "auditory communication", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "7298313", "text": "happening", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "32220", "text": "relation", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "13818991", "text": "grammatical relation", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "173531", "text": "means", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5208927", "text": "physical ability", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "10619214", "text": "singer", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6262268", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10360025", "text": "musician", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "4988388", "text": "sound", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7385893", "text": "sound", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7154581", "text": "expression", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5207437", "text": "ability", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "voice", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Simba", - "cookedLabel": "Simba", - "pageID": "983014", - "editDist": 0.0, - "labelProbability": 0.617504, - "logPopularity": 4.31748811353631, - "score": 0.44354266871782344, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Simba Makoni", - "cookedLabel": "Simba Makoni", - "pageID": "15257568", - "editDist": 0.0, - "labelProbability": 0.123501, - "logPopularity": 4.6913478822291435, - "score": 0.037002065590378067, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Simba Technologies", - "cookedLabel": "Simba Technologies", - "pageID": "9097838", - "editDist": 0.0, - "labelProbability": 0.123501, - "logPopularity": 3.7612001156935624, - "score": 0.021517087069021317, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Simba Rebellion", - "cookedLabel": "Simba Rebellion", - "pageID": "22017800", - "editDist": 0.0, - "labelProbability": 0.123501, - "logPopularity": 4.276666119016055, - "score": 0.029088793397246016, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Simba (film)", - "cookedLabel": "Simba", - "pageID": "27280272", - "editDist": 0.0, - "labelProbability": 0.123501, - "logPopularity": 3.9889840465642745, - "score": 0.024590718006657068, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the voice of simba", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "voice of simba", "type": "CluePhrase", "weight": 0.99 }, - { "label": "voice", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "simba", "type": "ClueNE", "weight": 1.1 } - ] - }, - { - "qId": "wqr003520", - "qText": "where was the first ford motor company located?", - "SV": ["located"], - "lemmaSV": ["locate"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "The First (album)", - "cookedLabel": "The First", - "pageID": "33680093", - "editDist": 0.0, - "labelProbability": 0.270512, - "logPopularity": 4.795790545596741, - "score": 0.17729699269180282, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The First (musical)", - "cookedLabel": "The First", - "pageID": "8438202", - "editDist": 0.0, - "labelProbability": 0.270512, - "logPopularity": 3.6375861597263857, - "score": 0.09711679219405046, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "The First 48", - "cookedLabel": "The First 48", - "pageID": "9686259", - "editDist": 0.0, - "labelProbability": 0.270512, - "logPopularity": 3.8501476017100584, - "score": 0.1088885732049012, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Ford Motor Company", - "cookedLabel": "Ford Motor Company", - "pageID": "30433662", - "editDist": 0.0, - "labelProbability": 0.983606, - "logPopularity": 7.491645473605133, - "score": 0.990910423790686, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the first ford motor company", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "first ford motor company", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the first", "type": "ClueNE", "weight": 1.11 } - ] - }, - { - "qId": "wqr003540", - "qText": "when did the mets win the pennant?", - "SV": ["win"], - "lemmaSV": ["win"], - "LAT": [ - { "synset": "15147173", "text": "time", "specificity": "0.0", "type": "QuestionWordLAT" }, - { "synset": "15184543", "text": "date", "specificity": "0.0", "type": "QuestionWordLAT" } - ], - "Concept": [ - { - "fullLabel": "New York Mets", - "cookedLabel": "New York Mets", - "pageID": "21728", - "editDist": 0.0, - "labelProbability": 0.529412, - "logPopularity": 7.575071699507561, - "score": 0.9104279922078052, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Jermaine Pennant", - "cookedLabel": "Jermaine Pennant", - "pageID": "1024810", - "editDist": 0.0, - "labelProbability": 0.199319, - "logPopularity": 5.056245805348308, - "score": 0.15371741856367757, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Pennant, Powys", - "cookedLabel": "Pennant, Powys", - "pageID": "30960035", - "editDist": 0.0, - "labelProbability": 0.203578, - "logPopularity": 3.5553480614894135, - "score": 0.07000041333163418, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Pennant Hills, New South Wales", - "cookedLabel": "Pennant Hills, New South Wales", - "pageID": "1110694", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 4.912654885736052, - "score": 0.061044378088798555, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Pennant, Saskatchewan", - "cookedLabel": "Pennant, Saskatchewan", - "pageID": "15774015", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 5.19295685089021, - "score": 0.07142594132589199, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Thomas Pennant", - "cookedLabel": "Thomas Pennant", - "pageID": "361699", - "editDist": 0.0, - "labelProbability": 0.183986, - "logPopularity": 4.624972813284271, - "score": 0.11557944742461398, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "the mets", "type": "ClueNE", "weight": 2.8000000000000003 }] - }, - { - "qId": "wqr003560", - "qText": "who wrote 2 timothy 4?", - "SV": ["wrote"], - "lemmaSV": ["write"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Second Epistle to Timothy", - "cookedLabel": "Second Epistle to Timothy", - "pageID": "1751383", - "editDist": 0.0, - "labelProbability": 0.984968, - "logPopularity": 3.58351893845611, - "score": 0.7353383189937804, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Second Epistle to Timothy", - "cookedLabel": "Second Epistle to Timothy", - "pageID": "1751383", - "editDist": 1.0, - "labelProbability": 0.0, - "logPopularity": 3.58351893845611, - "score": 0.03843269416191313, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - } - ], - "Clue": [ - { "label": "2 timothy", "type": "ClueNgram", "weight": 1.01 }, - { "label": "timothy 4", "type": "ClueNE", "weight": 1.1 } - ] - }, - { - "qId": "wqr003580", - "qText": "where did fred west work?", - "SV": ["work"], - "lemmaSV": ["work"], - "LAT": [{ "synset": "27365", "text": "location", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Fred West", - "cookedLabel": "Fred West", - "pageID": "200822", - "editDist": 0.0, - "labelProbability": 0.99942, - "logPopularity": 4.465908118654584, - "score": 0.9502055963037747, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003600", - "qText": "what city was nelson mandela born in?", - "SV": ["born"], - "lemmaSV": ["bear"], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8693705", "text": "urban area", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7991473", "text": "gathering", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8242502", "text": "municipality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8643858", "text": "municipality", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "text": "city", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Nelson Mandela", - "cookedLabel": "Nelson Mandela", - "pageID": "21492751", - "editDist": 0.0, - "labelProbability": 0.996993, - "logPopularity": 5.749392985908253, - "score": 0.9760535385100926, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003620", - "qText": "what is currency in panama?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "4923519", "text": "property", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "13394134", "text": "medium of exchange", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7275291", "text": "standard", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33914", "text": "measure", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4772610", "text": "prevalence", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5051824", "text": "temporal arrangement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5051679", "text": "temporal property", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "4771667", "text": "generality", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "13598374", "text": "system of measurement", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5057266", "text": "presentness", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5053160", "text": "timing", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "currency", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Currency", - "cookedLabel": "Currency", - "pageID": "5665", - "editDist": 0.0, - "labelProbability": 0.800259, - "logPopularity": 3.2188758248682006, - "score": 0.38218402154685344, - "getByLAT": 1, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Panama", - "cookedLabel": "Panama", - "pageID": "22997", - "editDist": 0.0, - "labelProbability": 0.697723, - "logPopularity": 7.716906135298388, - "score": 0.8985798300836622, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "currency in panama", "type": "ClueSubjectPhrase", "weight": 2.7 }] - }, - { - "qId": "wqr003640", - "qText": "what type of voting system does australia have?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "33319", "text": "communication", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "7957410", "text": "biological group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "2855782", "text": "block", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847533", "text": "kind", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6830481", "text": "written symbol", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "6819327", "text": "symbol", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "6831828", "text": "character", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "6804229", "text": "signal", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "9628463", "text": "adult", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8008892", "text": "taxonomic group", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "type", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Voting system", - "cookedLabel": "Voting system", - "pageID": "29066482", - "editDist": 0.0, - "labelProbability": 0.955556, - "logPopularity": 5.288267030694535, - "score": 0.9041743837111924, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Australia", - "cookedLabel": "Australia", - "pageID": "4689264", - "editDist": 0.0, - "labelProbability": 0.747368, - "logPopularity": 10.900768235614668, - "score": 0.9951120776945014, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003660", - "qText": "who started southwest airlines?", - "SV": ["started"], - "lemmaSV": ["start"], - "LAT": [{ "synset": "7846", "text": "person", "specificity": "0.0", "type": "QuestionWordLAT" }], - "Concept": [ - { - "fullLabel": "Southwest Airlines", - "cookedLabel": "Southwest Airlines", - "pageID": "63032", - "editDist": 0.0, - "labelProbability": 0.998978, - "logPopularity": 5.0369526024136295, - "score": 0.9083202536141494, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "southwest airlines?", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003680", - "qText": "who does new zealand import from?", - "SV": ["import"], - "lemmaSV": ["import"], - "LAT": [ - { "synset": "2684", "text": "object", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "9339360", "text": "island", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "9357302", "text": "land", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "zealand", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "New Zealand", - "cookedLabel": "New Zealand", - "pageID": "4913064", - "editDist": 0.0, - "labelProbability": 0.681166, - "logPopularity": 9.861310473636943, - "score": 0.9641762949090023, - "getByLAT": 1, - "getByNE": 1, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [] - }, - { - "qId": "wqr003700", - "qText": "what kind of political system is canada?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "5817200", "text": "content", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5847274", "text": "category", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5844071", "text": "concept", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "kind", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Coal Run Village, Kentucky", - "cookedLabel": "Coal Run Village, Kentucky", - "pageID": "115405", - "editDist": 0.0, - "labelProbability": 0.142857, - "logPopularity": 4.718498871295094, - "score": 0.05631447464088684, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Political system", - "cookedLabel": "Political system", - "pageID": "258724", - "editDist": 0.0, - "labelProbability": 0.238095, - "logPopularity": 2.1972245773362196, - "score": 0.05176398693827355, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Politics of Germany", - "cookedLabel": "Politics of Germany", - "pageID": "11935", - "editDist": 0.0, - "labelProbability": 0.190476, - "logPopularity": 4.1588830833596715, - "score": 0.050415590449092426, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - }, - { - "fullLabel": "Canada", - "cookedLabel": "Canada", - "pageID": "5042916", - "editDist": 0.0, - "labelProbability": 0.755844, - "logPopularity": 11.329182899020827, - "score": 0.9963598807069324, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 1, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "political system", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003720", - "qText": "what countries are included in the netherlands?", - "SV": ["included"], - "lemmaSV": ["include"], - "LAT": [ - { "synset": "27365", "text": "location", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "8648560", "text": "region", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8024893", "text": "organization", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "8508836", "text": "administrative district", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7958392", "text": "people", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8206589", "text": "unit", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8569713", "text": "district", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "8376876", "text": "political unit", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "8591861", "text": "geographical area", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7967506", "text": "social group", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "31563", "text": "group", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "countries", "specificity": "0.0", "type": "LAT" }, - { "text": "country", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Netherlands", - "cookedLabel": "Netherlands", - "pageID": "21148", - "editDist": 0.0, - "labelProbability": 0.625727, - "logPopularity": 10.223031598136654, - "score": 0.9756182783081621, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - }, - { - "fullLabel": "Netherlands national football team", - "cookedLabel": "Netherlands national football team", - "pageID": "9647657", - "editDist": 0.0, - "labelProbability": 0.0556568, - "logPopularity": 7.611842399580417, - "score": 0.18487179702788376, - "getByLAT": 0, - "getByNE": 1, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 0, - "getByCWLookup": 1 - } - ], - "Clue": [{ "label": "netherlands", "type": "ClueNE", "weight": 1.11 }] - }, - { - "qId": "wqr003740", - "qText": "what are the major imports and exports of canada?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "9652940", "text": "traveler", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "10123254", "text": "foreigner", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "4731092", "text": "quality", "specificity": "-5.0", "type": "WordnetLAT" }, - { "synset": "5145753", "text": "value", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "24444", "text": "attribute", "specificity": "-6.0", "type": "WordnetLAT" }, - { "synset": "23451", "text": "cognition", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5842164", "text": "idea", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "3080712", "text": "commodity", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5145473", "text": "worth", "specificity": "-4.0", "type": "WordnetLAT" }, - { "synset": "5177340", "text": "significance", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "7846", "text": "person", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6611268", "text": "message", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5928460", "text": "meaning", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "5817200", "text": "content", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "5175788", "text": "importance", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "33319", "text": "communication", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "imports", "specificity": "0.0", "type": "LAT" }, - { "text": "import", "specificity": "0.0", "type": "ImplicitQLAT" } - ], - "Concept": [ - { - "fullLabel": "Major", - "cookedLabel": "Major", - "pageID": "201920", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 6.82001636467413, - "score": 0.16955685053201597, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "The Major", - "cookedLabel": "The Major", - "pageID": "9600545", - "editDist": 0.0, - "labelProbability": 0.0, - "logPopularity": 2.995732273553991, - "score": 0.020167690730618193, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 0 - }, - { - "fullLabel": "Canada", - "cookedLabel": "Canada", - "pageID": "5042916", - "editDist": 0.0, - "labelProbability": 0.755844, - "logPopularity": 11.329182899020827, - "score": 0.9902048363115007, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "the major imports and exports of canada", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "major imports and exports of canada", "type": "CluePhrase", "weight": 0.99 }, - { "label": "major imports and exports", "type": "CluePhrase", "weight": 0.99 }, - { "label": "the major", "type": "ClueNE", "weight": 1.11 }, - { "label": "imports", "type": "ClueSubjectToken", "weight": 2.5 }, - { "label": "exports", "type": "ClueToken", "weight": 1.0 } - ] - }, - { - "qId": "wqr003760", - "qText": "what movie is angelina jolie directing?", - "SV": [], - "lemmaSV": [], - "LAT": [ - { "synset": "4014270", "text": "product", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "22119", "text": "artifact", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "29677", "text": "event", "specificity": "-3.0", "type": "WordnetLAT" }, - { "synset": "6631572", "text": "show", "specificity": "-1.0", "type": "WordnetLAT" }, - { "synset": "3133774", "text": "creation", "specificity": "-2.0", "type": "WordnetLAT" }, - { "synset": "7303344", "text": "social event", "specificity": "-2.0", "type": "WordnetLAT" }, - { "text": "movie", "specificity": "0.0", "type": "LAT" } - ], - "Concept": [ - { - "fullLabel": "Angelina Jolie", - "cookedLabel": "Angelina Jolie", - "pageID": "5792809", - "editDist": 0.0, - "labelProbability": 0.995458, - "logPopularity": 5.575949103146316, - "score": 0.9059818840251335, - "getByLAT": 0, - "getByNE": 0, - "getBySubject": 0, - "getByNgram": 0, - "getByFuzzyLookup": 1, - "getByCWLookup": 1 - } - ], - "Clue": [ - { "label": "angelina jolie directing", "type": "ClueSubjectPhrase", "weight": 2.7 }, - { "label": "directing", "type": "ClueSubjectToken", "weight": 2.5 } - ] - } -] diff --git a/ChatQnA/deprecated/deployment/nginx/.env b/ChatQnA/deprecated/deployment/nginx/.env deleted file mode 100644 index bc3da51d3..000000000 --- a/ChatQnA/deprecated/deployment/nginx/.env +++ /dev/null @@ -1,7 +0,0 @@ -HUGGING_FACE_HUB_TOKEN= -volume=./data -model=meta-llama/Llama-2-13b-chat-hf -MAX_TOTAL_TOKENS=2000 -ENABLE_HPU_GRAPH=True -PT_HPU_ENABLE_LAZY_COLLECTIVES=true -OMPI_MCA_btl_vader_single_copy_mechanism=none \ No newline at end of file diff --git a/ChatQnA/deprecated/deployment/nginx/README.md b/ChatQnA/deprecated/deployment/nginx/README.md deleted file mode 100644 index 3359c3248..000000000 --- a/ChatQnA/deprecated/deployment/nginx/README.md +++ /dev/null @@ -1,9 +0,0 @@ -## Launch 8 models on 8 separate Gaudi2 cards: - -Add HuggingFace access token in .env
-Optionally change model name and linked volume directory to store downloaded model

-Run the following command in your terminal to launch nginx load balancer and 8 instances of tgi_gaudi containers (one for each Gaudi card): - -``` -docker compose up -f docker-compose.yml -d -``` diff --git a/ChatQnA/deprecated/deployment/nginx/docker-compose.yml b/ChatQnA/deprecated/deployment/nginx/docker-compose.yml deleted file mode 100644 index 85d8861b8..000000000 --- a/ChatQnA/deprecated/deployment/nginx/docker-compose.yml +++ /dev/null @@ -1,139 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -version: "3" -services: - gaudi0: - image: tgi_gaudi - runtime: habana - ports: - - "8081:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=0 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi1: - image: tgi_gaudi - runtime: habana - ports: - - "8082:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=1 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi2: - image: tgi_gaudi - runtime: habana - ports: - - "8083:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=2 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi3: - image: tgi_gaudi - runtime: habana - ports: - - "8084:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=3 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi4: - image: tgi_gaudi - runtime: habana - ports: - - "8085:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=4 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi5: - image: tgi_gaudi - runtime: habana - ports: - - "8086:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=5 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi6: - image: tgi_gaudi - runtime: habana - ports: - - "8087:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=6 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - gaudi7: - image: tgi_gaudi - runtime: habana - ports: - - "8088:80" - env_file: - - .env - environment: - - HABANA_VISIBLE_DEVICES=7 - volumes: - - $volume:/data - cap_add: - - sys_nice - ipc: "host" - command: ["--model-id", "$model"] - nginx: - build: ./nginx - ports: - - "80:80" - depends_on: - - gaudi0 - - gaudi1 - - gaudi2 - - gaudi3 - - gaudi4 - - gaudi5 - - gaudi6 - - gaudi7 diff --git a/ChatQnA/deprecated/deployment/nginx/nginx/Dockerfile b/ChatQnA/deprecated/deployment/nginx/nginx/Dockerfile deleted file mode 100644 index b4369d8de..000000000 --- a/ChatQnA/deprecated/deployment/nginx/nginx/Dockerfile +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# FROM nginx - -# RUN rm /etc/nginx/conf.d/default.conf -# COPY nginx.conf /etc/nginx/conf.d/default.conf - - -FROM nginx:latest -RUN rm /etc/nginx/conf.d/default.conf -COPY nginx.conf /etc/nginx/conf.d/default.conf -EXPOSE 80 -CMD ["nginx", "-g", "daemon off;"] \ No newline at end of file diff --git a/ChatQnA/deprecated/deployment/nginx/nginx/nginx.conf b/ChatQnA/deprecated/deployment/nginx/nginx/nginx.conf deleted file mode 100644 index 3700ec4f9..000000000 --- a/ChatQnA/deprecated/deployment/nginx/nginx/nginx.conf +++ /dev/null @@ -1,23 +0,0 @@ -upstream backend { - least_conn; - server gaudi0:80 max_fails=3 fail_timeout=30s; - server gaudi1:80 max_fails=3 fail_timeout=30s; - server gaudi2:80 max_fails=3 fail_timeout=30s; - server gaudi3:80 max_fails=3 fail_timeout=30s; - server gaudi4:80 max_fails=3 fail_timeout=30s; - server gaudi5:80 max_fails=3 fail_timeout=30s; - server gaudi6:80 max_fails=3 fail_timeout=30s; - server gaudi7:80 max_fails=3 fail_timeout=30s; -} - -server { - listen 80; - - location / { - proxy_pass http://backend; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - } -} diff --git a/ChatQnA/deprecated/langchain/chroma/README.md b/ChatQnA/deprecated/langchain/chroma/README.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/ChatQnA/deprecated/langchain/docker/Dockerfile b/ChatQnA/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index ace02d45f..000000000 --- a/ChatQnA/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,38 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - -FROM langchain/langchain:latest - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir --upgrade pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt - -ENV PYTHONPATH=$PYTHONPATH:/ws:/home/user:/home/user/qna-app/app - -WORKDIR /home/user/qna-app -COPY qna-app /home/user/qna-app - -ENTRYPOINT ["/usr/bin/sleep", "infinity"] diff --git a/ChatQnA/deprecated/langchain/docker/docker-compose-qdrant.yml b/ChatQnA/deprecated/langchain/docker/docker-compose-qdrant.yml deleted file mode 100644 index cc508cce9..000000000 --- a/ChatQnA/deprecated/langchain/docker/docker-compose-qdrant.yml +++ /dev/null @@ -1,35 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -services: - qdrant-vector-db: - image: qdrant/qdrant:v1.9.0 - container_name: qdrant-vector-db - ports: - - "6333:6333" - - "6334:6334" - qna-rag-qdrant-server: - build: - args: - https_proxy: ${https_proxy} - http_proxy: ${http_proxy} - dockerfile: Dockerfile - context: . - image: intel/gen-ai-examples:qna-rag-qdrant-server - container_name: qna-rag-qdrant-server - environment: - - https_proxy=${https_proxy} - - HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} - - "EMBED_MODEL=BAAI/bge-base-en-v1.5" - - "VECTOR_DATABASE=QDRANT" - - "TGI_LLM_ENDPOINT=http://localhost:8080" - # "TEI_ENDPOINT="http://xxx.xxx.xxx.xxx:9090" - To use a custom TEI endpoint - ulimits: - memlock: - soft: -1 # Set memlock to unlimited (no soft or hard limit) - hard: -1 - volumes: - - ../qdrant:/ws - - ../test:/test - network_mode: "host" diff --git a/ChatQnA/deprecated/langchain/docker/docker-compose.yml b/ChatQnA/deprecated/langchain/docker/docker-compose.yml deleted file mode 100644 index 56a4c687a..000000000 --- a/ChatQnA/deprecated/langchain/docker/docker-compose.yml +++ /dev/null @@ -1,44 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -services: - redis-vector-db: - image: redis/redis-stack:7.2.0-v9 - container_name: redis-vector-db - ports: - - "6379:6379" - - "8001:8001" - qna-rag-redis-server: - build: - args: - https_proxy: ${https_proxy} - http_proxy: ${http_proxy} - dockerfile: Dockerfile - context: . - image: intel/gen-ai-examples:qna-rag-redis-server - container_name: qna-rag-redis-server - environment: - - https_proxy=${https_proxy} - - http_proxy=${http_proxy} - - HTTP_PROXY=${HTTP_PROXY} - - HTTPS_PROXY=${HTTPS_PROXY} - - no_proxy=${no_proxy} - - SOCKS_PROXY=${SOCKS_PROXY} - - socks_proxy=${socks_proxy} - - FTP_PROXY=${FTP_PROXY} - - ftp_proxy=${ftp_proxy} - - HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} - - CONFLUENCE_ACCESS_TOKEN=${CONFLUENCE_ACCESS_TOKEN} - - "REDIS_PORT=6379" - - "EMBED_MODEL=BAAI/bge-base-en-v1.5" - - "REDIS_SCHEMA=schema_dim_768.yml" - - "VECTOR_DATABASE=REDIS" - ulimits: - memlock: - soft: -1 # Set memlock to unlimited (no soft or hard limit) - hard: -1 - volumes: - - ../redis:/ws - - ../test:/test - network_mode: "host" diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/Dockerfile b/ChatQnA/deprecated/langchain/docker/qna-app/Dockerfile deleted file mode 100644 index caac655e2..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/Dockerfile +++ /dev/null @@ -1,25 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -FROM python:3.11-slim - -RUN pip install --no-cache-dir poetry==1.6.1 - -RUN poetry config virtualenvs.create false - -WORKDIR /code - -COPY ./pyproject.toml ./README.md ./poetry.lock* ./ - -COPY ./package[s] ./packages - -RUN poetry install --no-interaction --no-ansi --no-root - -COPY ./app ./app - -RUN poetry install --no-interaction --no-ansi - -EXPOSE 8080 - -CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"] diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/README.md b/ChatQnA/deprecated/langchain/docker/qna-app/README.md deleted file mode 100644 index c76e0d1af..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/README.md +++ /dev/null @@ -1,79 +0,0 @@ -# my-app - -## Installation - -Install the LangChain CLI if you haven't yet - -```bash -pip install -U langchain-cli -``` - -## Adding packages - -```bash -# adding packages from -# https://github.com/langchain-ai/langchain/tree/master/templates -langchain app add $PROJECT_NAME - -# adding custom GitHub repo packages -langchain app add --repo $OWNER/$REPO -# or with whole git string (supports other git providers): -# langchain app add git+https://github.com/hwchase17/chain-of-verification - -# with a custom api mount point (defaults to `/{package_name}`) -langchain app add $PROJECT_NAME --api_path=/my/custom/path/rag -``` - -Note: you remove packages by their api path - -```bash -langchain app remove my/custom/path/rag -``` - -## Setup LangSmith (Optional) - -LangSmith will help us trace, monitor and debug LangChain applications. -LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). -If you don't have access, you can skip this section - -```shell -export LANGCHAIN_TRACING_V2=true -export LANGCHAIN_API_KEY= -export LANGCHAIN_PROJECT= # if not specified, defaults to "default" -``` - -## Launch LangServe - -```bash -langchain serve -``` - -## Running in Docker - -This project folder includes a Dockerfile that allows you to easily build and host your LangServe app. - -### Building the Image - -To build the image, you simply: - -```shell -docker build . -t my-langserve-app -``` - -If you tag your image with something other than `my-langserve-app`, -note it for use in the next step. - -### Running the Image Locally - -To run the image, you'll need to include any environment variables -necessary for your application. - -In the below example, we inject the `OPENAI_API_KEY` environment -variable with the value set in my local environment -(`$OPENAI_API_KEY`) - -We also expose port 8080 with the `-p 8080:8080` option. - -```shell -docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -p 8080:8080 my-langserve-app -``` diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/app/__init__.py b/ChatQnA/deprecated/langchain/docker/qna-app/app/__init__.py deleted file mode 100644 index c495d1896..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/app/__init__.py +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/app/guardrails.py b/ChatQnA/deprecated/langchain/docker/qna-app/app/guardrails.py deleted file mode 100644 index f3a0b614a..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/app/guardrails.py +++ /dev/null @@ -1,90 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -unsafe_categories = """O1: Violence and Hate. -Should not -- Help people plan or engage in violence. -- Encourage people to engage in violence. -- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability. -- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics. -Can -- Provide information on violence and discrimination. -- Discuss the topics of hate and violence, and can discuss historical events involving violence. -O2: Sexual Content. -Should not -- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities. -Can -- Discuss topics related to sexuality and sexual education. -- Provide general information regarding sexual activities. -- Provide guidance on topics related to sexual health. -O3: Criminal Planning. -Should not -- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them. -- Encourage people to engage in criminal activities or condone criminal behavior. -Can -- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes. -- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems. -O4: Guns and Illegal Weapons. -Should not -- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them. -- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior. -Can -- Discuss firearms and the arguments for and against firearm ownership. -O5: Regulated or Controlled Substances. -Should not -- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis. -- Assist or encourage people to create such substances when it is illegal to do so. -Can -- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are). -- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation. -O6: Self-Harm. -Should not -- Encourage people to harm themselves. -- Romanticize or condone self-harm or suicide. -- Provide information regarding the methods of suicide or self-harm. -- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves. -Can -- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions). -Should -- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.""" - -unsafe_dict = { - "O1": "Violence and Hate", - "O2": "Sexual Content", - "O3": "Criminal Planning", - "O4": "Guns and Illegal Weapons", - "O5": "Regulated or Controlled Substances", - "O6": "Self-Harm", -} - - -def moderation_prompt_for_chat(chat): - # For simplicity, we assume the chat dictionary correctly alternates "user" and "assistant" roles - # Feel free to include error checking if needed - conversation = [turn["content"] for turn in chat] - is_agent = len(conversation) % 2 == 0 - role = "Agent" if len(conversation) % 2 == 0 else "User" - prompt = f"""[INST] Task: Check if there is unsafe content in '{role}' messages in conversations according our safety policy with the below categories. - - -{unsafe_categories} - - - - -""" - - # Alternate User/Agent turns, inserting 2 newlines between each - for i, m in enumerate(conversation): - role = "User" if i % 2 == 0 else "Agent" - prompt += f"{role}: {m}\n\n" - prompt += f""" - -Provide your safety assessment for {role} in the above conversation: -- First line must read 'safe' or 'unsafe'. -- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]""" - return prompt diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/app/prompts.py b/ChatQnA/deprecated/langchain/docker/qna-app/app/prompts.py deleted file mode 100644 index e5cd7c8fd..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/app/prompts.py +++ /dev/null @@ -1,57 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder - -# ========= Raw Q&A template prompt ========= -template = """ - Use the following pieces of context from retrieved - dataset to answer the question. Do not make up an answer if there is no - context provided to help answer it. Include the 'source' and 'start_index' - from the metadata included in the context you used to answer the question - - Context: - --------- - {context} - - --------- - Question: {question} - --------- - - Answer: -""" -prompt = ChatPromptTemplate.from_template(template) - - -# ========= contextualize prompt ========= -contextualize_q_system_prompt = """Given a chat history and the latest user question \ -which might reference context in the chat history, formulate a standalone question \ -which can be understood without the chat history. Do NOT answer the question, \ -just reformulate it if needed and otherwise return it as is.""" -contextualize_q_prompt = ChatPromptTemplate.from_messages( - [ - ("system", contextualize_q_system_prompt), - MessagesPlaceholder(variable_name="chat_history"), - ("human", "{question}"), - ] -) - - -# ========= Q&A with history prompt ========= -qa_system_prompt = """You are an assistant for question-answering tasks. \ -Use the following pieces of retrieved context to answer the question. \ -If you don't know the answer, just say that you don't know. \ -Use three sentences maximum and keep the answer concise.\ - -{context}""" -qa_prompt = ChatPromptTemplate.from_messages( - [ - ("system", qa_system_prompt), - MessagesPlaceholder(variable_name="chat_history"), - ("human", "{question}"), - ] -) diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/app/server.py b/ChatQnA/deprecated/langchain/docker/qna-app/app/server.py deleted file mode 100644 index 801f440a0..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/app/server.py +++ /dev/null @@ -1,354 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import argparse -import os - -from fastapi import APIRouter, FastAPI, File, Request, UploadFile -from fastapi.responses import JSONResponse, RedirectResponse, StreamingResponse -from guardrails import moderation_prompt_for_chat, unsafe_dict -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_core.messages import HumanMessage -from langchain_core.output_parsers import StrOutputParser -from langchain_core.runnables import RunnablePassthrough -from langserve import add_routes -from prompts import contextualize_q_prompt, prompt, qa_prompt -from starlette.middleware.cors import CORSMiddleware -from utils import ( - VECTOR_DATABASE, - create_kb_folder, - create_retriever_from_files, - create_retriever_from_links, - get_current_beijing_time, - post_process_text, - reload_retriever, -) - -if VECTOR_DATABASE == "REDIS": - from rag_redis.config import INDEX_NAME -elif VECTOR_DATABASE == "QDRANT": - from rag_qdrant.config import COLLECTION_NAME as INDEX_NAME - -parser = argparse.ArgumentParser(description="Server Configuration") -parser.add_argument("--chathistory", action="store_true", help="Enable debug mode") -args = parser.parse_args() - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -class RAGAPIRouter(APIRouter): - def __init__(self, upload_dir, entrypoint, safety_guard_endpoint, tei_endpoint=None) -> None: - super().__init__() - self.upload_dir = upload_dir - self.entrypoint = entrypoint - self.safety_guard_endpoint = safety_guard_endpoint - print( - f"[rag - router] Initializing API Router, params:\n \ - upload_dir={upload_dir}, entrypoint={entrypoint}" - ) - - # Define LLM - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - - if self.safety_guard_endpoint: - self.llm_guard = HuggingFaceEndpoint( - endpoint_url=safety_guard_endpoint, - max_new_tokens=100, - top_k=1, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - ) - print("[rag - router] LLM initialized.") - - # Define LLM Chain - if tei_endpoint: - # create embeddings using TEI endpoint service - self.embeddings = HuggingFaceHubEmbeddings(model=tei_endpoint) - else: - # create embeddings using local embedding model - EMBED_MODEL = os.getenv("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2") - self.embeddings = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - if VECTOR_DATABASE == "REDIS": - from langchain_community.vectorstores import Redis - from rag_redis.config import INDEX_SCHEMA, REDIS_URL - - vdb = Redis.from_existing_index( - self.embeddings, - index_name=INDEX_NAME, - redis_url=REDIS_URL, - schema=INDEX_SCHEMA, - ) - elif VECTOR_DATABASE == "QDRANT": - from langchain_community.vectorstores import Qdrant - from qdrant_client import QdrantClient - from rag_qdrant.config import QDRANT_HOST, QDRANT_PORT - - client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT) - vdb = Qdrant( - embeddings=self.embeddings, - collection_name=INDEX_NAME, - client=client, - ) - retriever = vdb.as_retriever(search_type="mmr") - - # Define contextualize chain - self.contextualize_q_chain = contextualize_q_prompt | self.llm | StrOutputParser() - - # Define LLM chain - if args.chathistory: - self.llm_chain = ( - RunnablePassthrough.assign(context=self.contextualized_question | retriever) | qa_prompt | self.llm - ) - else: - self.llm_chain = ( - RunnablePassthrough.assign(context=self.contextualized_question | retriever) | prompt | self.llm - ) - - print("[rag - router] LLM chain initialized.") - - # Define chat history - self.chat_history = [] - - def contextualized_question(self, input: dict): - if input.get("chat_history"): - return self.contextualize_q_chain - else: - return input["question"] - - def handle_rag_chat(self, query: str): - response = self.llm_chain.invoke( - {"question": query, "chat_history": self.chat_history} if args.chathistory else {"question": query} - ) - result = response.split("")[0] - if args.chathistory: - self.chat_history.extend([HumanMessage(content=query), response]) - # output guardrails - if self.safety_guard_endpoint: - response_output_guard = self.llm_guard( - moderation_prompt_for_chat("Agent", f"User: {query}\n Agent: {response}") - ) - if "unsafe" in response_output_guard: - policy_violation_level = response_output_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - return policy_violations + " are found in the output" - else: - return result - return result - - -upload_dir = os.getenv("RAG_UPLOAD_DIR", "./upload_dir") -tgi_llm_endpoint = os.getenv("TGI_LLM_ENDPOINT", "http://localhost:8080") -safety_guard_endpoint = os.getenv("SAFETY_GUARD_ENDPOINT") -tei_embedding_endpoint = os.getenv("TEI_ENDPOINT") -router = RAGAPIRouter(upload_dir, tgi_llm_endpoint, safety_guard_endpoint, tei_embedding_endpoint) - - -@router.post("/v1/rag/chat") -async def rag_chat(request: Request): - params = await request.json() - print(f"[rag - chat] POST request: /v1/rag/chat, params:{params}") - query = params["query"] - kb_id = params.get("knowledge_base_id", "default") - - # prompt guardrails - if router.safety_guard_endpoint: - response_input_guard = router.llm_guard(moderation_prompt_for_chat("User", query)) - if "unsafe" in response_input_guard: - policy_violation_level = response_input_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - return f"Violated policies: {policy_violations}, please check your input." - - if kb_id == "default": - print("[rag - chat] use default knowledge base") - retriever = reload_retriever(router.embeddings, INDEX_NAME) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - elif kb_id.startswith("kb"): - new_index_name = INDEX_NAME + kb_id - print(f"[rag - chat] use knowledge base {kb_id}, index name is {new_index_name}") - retriever = reload_retriever(router.embeddings, new_index_name) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - else: - return JSONResponse(status_code=400, content={"message": "Wrong knowledge base id."}) - return router.handle_rag_chat(query=query) - - -@router.post("/v1/rag/chat_stream") -async def rag_chat_stream(request: Request): - params = await request.json() - print(f"[rag - chat_stream] POST request: /v1/rag/chat_stream, params:{params}") - query = params["query"] - kb_id = params.get("knowledge_base_id", "default") - - # prompt guardrails - if router.safety_guard_endpoint: - response_input_guard = router.llm_guard(moderation_prompt_for_chat("User", query)) - if "unsafe" in response_input_guard: - policy_violation_level = response_input_guard.split("\n")[1].strip() - policy_violations = unsafe_dict[policy_violation_level] - print(f"Violated policies: {policy_violations}") - - def generate_content(): - content = f"Violated policies: {policy_violations}, please check your input." - yield f"data: {content}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(generate_content(), media_type="text/event-stream") - - if kb_id == "default": - retriever = reload_retriever(router.embeddings, INDEX_NAME) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - elif kb_id.startswith("kb"): - new_index_name = INDEX_NAME + kb_id - retriever = reload_retriever(router.embeddings, new_index_name) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - else: - return JSONResponse(status_code=400, content={"message": "Wrong knowledge base id."}) - - def stream_generator(): - chat_response = "" - for text in router.llm_chain.stream( - {"question": query, "chat_history": router.chat_history} if args.chathistory else {"question": query} - ): - chat_response += text - processed_text = post_process_text(text) - if text and processed_text: - yield processed_text - chat_response = chat_response.split("")[0] - print(f"[rag - chat_stream] stream response: {chat_response}") - if args.chathistory: - router.chat_history.extend([HumanMessage(content=query), chat_response]) - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -@router.post("/v1/rag/create") -async def rag_create(file: UploadFile = File(...)): - filename = file.filename - if "/" in filename: - filename = filename.split("/")[-1] - print(f"[rag - create] POST request: /v1/rag/create, filename:{filename}") - - kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir) - # save file to local path - cur_time = get_current_beijing_time() - save_file_name = str(user_upload_dir) + "/" + cur_time + "-" + filename - with open(save_file_name, "wb") as fout: - content = await file.read() - fout.write(content) - print(f"[rag - create] file saved to local path: {save_file_name}") - - # create new retriever - try: - # get retrieval instance and reload db with new knowledge base - print("[rag - create] starting to create local db...") - index_name = INDEX_NAME + kb_id - retriever = create_retriever_from_files(save_file_name, router.embeddings, index_name) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - print("[rag - create] kb created successfully") - except Exception as e: - print(f"[rag - create] create knowledge base failed! {e}") - return JSONResponse(status_code=500, content={"message": "Fail to create new knowledge base."}) - return {"knowledge_base_id": kb_id} - - -@router.post("/v1/rag/upload_link") -async def rag_upload_link(request: Request): - params = await request.json() - link_list = params["link_list"] - print(f"[rag - upload_link] POST request: /v1/rag/upload_link, link list:{link_list}") - - kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir) - - # create new retriever - try: - print("[rag - upload_link] starting to create local db...") - index_name = INDEX_NAME + kb_id - retriever = create_retriever_from_links(router.embeddings, link_list, index_name) - if args.chathistory: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | qa_prompt | router.llm - ) - else: - router.llm_chain = ( - RunnablePassthrough.assign(context=router.contextualized_question | retriever) | prompt | router.llm - ) - print("[rag - upload_link] kb created successfully") - except Exception as e: - print(f"[rag - upload_link] create knowledge base failed! {e}") - return JSONResponse(status_code=500, content={"message": "Fail to create new knowledge base."}) - return {"knowledge_base_id": kb_id} - - -app.include_router(router) - - -@app.get("/") -async def redirect_root_to_docs(): - return RedirectResponse("/docs") - - -add_routes(app, router.llm_chain, path="/rag-redis") - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/app/utils.py b/ChatQnA/deprecated/langchain/docker/qna-app/app/utils.py deleted file mode 100644 index 9807badbc..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/app/utils.py +++ /dev/null @@ -1,396 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import multiprocessing -import os -import re -import unicodedata -import uuid -from datetime import datetime, timedelta, timezone -from pathlib import Path -from urllib.parse import urlparse, urlunparse - -import requests -from bs4 import BeautifulSoup -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.document_loaders import UnstructuredFileLoader -from langchain_core.documents import Document - -SUPPORTED_VECTOR_DATABASES = ["REDIS", "QDRANT"] - -VECTOR_DATABASE = str(os.getenv("VECTOR_DATABASE", "redis")).upper() - -assert VECTOR_DATABASE in SUPPORTED_VECTOR_DATABASES, f"Invalid VECTOR_DATABASE: {VECTOR_DATABASE}" - - -def get_current_beijing_time(): - SHA_TZ = timezone(timedelta(hours=8), name="Asia/Shanghai") - utc_now = datetime.utcnow().replace(tzinfo=timezone.utc) - beijing_time = utc_now.astimezone(SHA_TZ).strftime("%Y-%m-%d-%H:%M:%S") - return beijing_time - - -def create_kb_folder(upload_dir): - kb_id = f"kb_{str(uuid.uuid1())[:8]}" - path_prefix = upload_dir - - # create local folder for retieval - cur_path = Path(path_prefix) / kb_id - os.makedirs(path_prefix, exist_ok=True) - cur_path.mkdir(parents=True, exist_ok=True) - user_upload_dir = Path(path_prefix) / f"{kb_id}/upload_dir" - user_persist_dir = Path(path_prefix) / f"{kb_id}/persist_dir" - user_upload_dir.mkdir(parents=True, exist_ok=True) - user_persist_dir.mkdir(parents=True, exist_ok=True) - print(f"[rag - create kb folder] upload path: {user_upload_dir}, persist path: {user_persist_dir}") - return kb_id, str(user_upload_dir), str(user_persist_dir) - - -class Crawler: - def __init__(self, pool=None): - if pool: - assert isinstance(pool, (str, list, tuple)), "url pool should be str, list or tuple" - self.pool = pool - self.headers = { - "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng, \ - */*;q=0.8,application/signed-exchange;v=b3;q=0.7", - "Accept-Encoding": "gzip, deflate, br", - "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7", - "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, \ - like Gecko) Chrome/113.0.0.0 Safari/537.36", - } - self.fetched_pool = set() - - def get_sublinks(self, soup): - sublinks = [] - for links in soup.find_all("a"): - sublinks.append(str(links.get("href"))) - return sublinks - - def get_hyperlink(self, soup, base_url): - sublinks = [] - for links in soup.find_all("a"): - link = str(links.get("href")) - if link.startswith("#") or link is None or link == "None": - continue - suffix = link.split("/")[-1] - if "." in suffix and suffix.split(".")[-1] not in ["html", "htmld"]: - continue - link_parse = urlparse(link) - base_url_parse = urlparse(base_url) - if link_parse.path == "": - continue - if link_parse.netloc != "": - # keep crawler works in the same domain - if link_parse.netloc != base_url_parse.netloc: - continue - sublinks.append(link) - else: - sublinks.append( - urlunparse( - ( - base_url_parse.scheme, - base_url_parse.netloc, - link_parse.path, - link_parse.params, - link_parse.query, - link_parse.fragment, - ) - ) - ) - return sublinks - - def fetch(self, url, headers=None, max_times=5): - if not headers: - headers = self.headers - while max_times: - if not url.startswith("http") or not url.startswith("https"): - url = "http://" + url - print("start fetch %s...", url) - try: - response = requests.get(url, headers=headers, verify=True) - if response.status_code != 200: - print("fail to fetch %s, response status code: %s", url, response.status_code) - else: - return response - except Exception as e: - print("fail to fetch %s, caused by %s", url, e) - raise Exception(e) - max_times -= 1 - return None - - def process_work(self, sub_url, work): - response = self.fetch(sub_url) - if response is None: - return [] - self.fetched_pool.add(sub_url) - soup = self.parse(response.text) - base_url = self.get_base_url(sub_url) - sublinks = self.get_hyperlink(soup, base_url) - if work: - work(sub_url, soup) - return sublinks - - def crawl(self, pool, work=None, max_depth=10, workers=10): - url_pool = set() - for url in pool: - base_url = self.get_base_url(url) - response = self.fetch(url) - soup = self.parse(response.text) - sublinks = self.get_hyperlink(soup, base_url) - self.fetched_pool.add(url) - url_pool.update(sublinks) - depth = 0 - while len(url_pool) > 0 and depth < max_depth: - print("current depth %s...", depth) - mp = multiprocessing.Pool(processes=workers) - results = [] - for sub_url in url_pool: - if sub_url not in self.fetched_pool: - results.append(mp.apply_async(self.process_work, (sub_url, work))) - mp.close() - mp.join() - url_pool = set() - for result in results: - sublinks = result.get() - url_pool.update(sublinks) - depth += 1 - - def parse(self, html_doc): - soup = BeautifulSoup(html_doc, "lxml") - return soup - - def download(self, url, file_name): - print("download %s into %s...", url, file_name) - try: - r = requests.get(url, stream=True, headers=self.headers, verify=True) - f = open(file_name, "wb") - for chunk in r.iter_content(chunk_size=512): - if chunk: - f.write(chunk) - except Exception as e: - print("fail to download %s, caused by %s", url, e) - - def get_base_url(self, url): - result = urlparse(url) - return urlunparse((result.scheme, result.netloc, "", "", "", "")) - - def clean_text(self, text): - text = text.strip().replace("\r", "\n") - text = re.sub(" +", " ", text) - text = re.sub("\n+", "\n", text) - text = text.split("\n") - return "\n".join([i for i in text if i and i != " "]) - - -def uni_pro(text): - """Check if the character is ASCII or falls in the category of non-spacing marks.""" - normalized_text = unicodedata.normalize("NFKD", text) - filtered_text = "" - for char in normalized_text: - if ord(char) < 128 or unicodedata.category(char) == "Mn": - filtered_text += char - return filtered_text - - -def load_html_data(url): - crawler = Crawler() - res = crawler.fetch(url) - if res is None: - return None - soup = crawler.parse(res.text) - all_text = crawler.clean_text(soup.select_one("body").text) - main_content = "" - for element_name in ["main", "container"]: - main_block = None - if soup.select(f".{element_name}"): - main_block = soup.select(f".{element_name}") - elif soup.select(f"#{element_name}"): - main_block = soup.select(f"#{element_name}") - if main_block: - for element in main_block: - text = crawler.clean_text(element.text) - if text not in main_content: - main_content += f"\n{text}" - main_content = crawler.clean_text(main_content) - - main_content = main_content.replace("\n", "") - main_content = main_content.replace("\n\n", "") - main_content = uni_pro(main_content) - main_content = re.sub(r"\s+", " ", main_content) - - # {'text': all_text, 'main_content': main_content} - - return main_content - - -def get_chuck_data(content, max_length, min_length, input): - """Process the context to make it maintain a suitable length for the generation.""" - sentences = re.split("(?<=[!.?])", content) - - paragraphs = [] - current_length = 0 - count = 0 - current_paragraph = "" - for sub_sen in sentences: - count += 1 - sentence_length = len(sub_sen) - if current_length + sentence_length <= max_length: - current_paragraph += sub_sen - current_length += sentence_length - if count == len(sentences) and len(current_paragraph.strip()) > min_length: - paragraphs.append([current_paragraph.strip(), input]) - else: - paragraphs.append([current_paragraph.strip(), input]) - current_paragraph = sub_sen - current_length = sentence_length - - return paragraphs - - -def parse_html(input): - """Parse the uploaded file.""" - chucks = [] - for link in input: - if re.match(r"^https?:/{2}\w.+$", link): - content = load_html_data(link) - if content is None: - continue - chuck = [[content.strip(), link]] - chucks += chuck - else: - print("The given link/str {} cannot be parsed.".format(link)) - - return chucks - - -def document_transfer(data_collection): - "Transfer the raw document into langchain supported format." - documents = [] - for data, meta in data_collection: - doc_id = str(uuid.uuid4()) - metadata = {"source": meta, "identify_id": doc_id} - doc = Document(page_content=data, metadata=metadata) - documents.append(doc) - return documents - - -def create_retriever_from_files(doc, embeddings, index_name: str): - print(f"[rag - create retriever] create with index: {index_name}") - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - loader = UnstructuredFileLoader(doc, mode="single", strategy="fast") - chunks = loader.load_and_split(text_splitter) - - if VECTOR_DATABASE == "REDIS": - from langchain_community.vectorstores import Redis - from rag_redis.config import INDEX_SCHEMA, REDIS_URL - - vdb = Redis.from_texts( - texts=[chunk.page_content for chunk in chunks], - metadatas=[chunk.metadata for chunk in chunks], - embedding=embeddings, - index_name=index_name, - redis_url=REDIS_URL, - index_schema=INDEX_SCHEMA, - ) - - elif VECTOR_DATABASE == "QDRANT": - from langchain_community.vectorstores import Qdrant - from rag_qdrant.config import COLLECTION_NAME, QDRANT_HOST, QDRANT_PORT - - vdb = Qdrant.from_texts( - texts=[chunk.page_content for chunk in chunks], - metadatas=[chunk.metadata for chunk in chunks], - embedding=embeddings, - collection_name=COLLECTION_NAME, - host=QDRANT_HOST, - port=QDRANT_PORT, - ) - - retriever = vdb.as_retriever(search_type="mmr") - return retriever - - -def create_retriever_from_links(embeddings, link_list: list, index_name): - data_collection = parse_html(link_list) - texts = [] - metadatas = [] - for data, meta in data_collection: - doc_id = str(uuid.uuid4()) - metadata = {"source": meta, "identify_id": doc_id} - texts.append(data) - metadatas.append(metadata) - - if VECTOR_DATABASE == "REDIS": - from langchain_community.vectorstores import Redis - from rag_redis.config import INDEX_SCHEMA, REDIS_URL - - vdb = Redis.from_texts( - texts=texts, - metadatas=metadatas, - embedding=embeddings, - index_name=index_name, - redis_url=REDIS_URL, - index_schema=INDEX_SCHEMA, - ) - - elif VECTOR_DATABASE == "QDRANT": - from langchain_community.vectorstores import Qdrant - from rag_qdrant.config import COLLECTION_NAME, QDRANT_HOST, QDRANT_PORT - - vdb = Qdrant.from_texts( - texts=texts, - metadatas=metadatas, - embedding=embeddings, - collection_name=COLLECTION_NAME, - host=QDRANT_HOST, - port=QDRANT_PORT, - ) - - retriever = vdb.as_retriever(search_type="mmr") - return retriever - - -def reload_retriever(embeddings, index_name): - print(f"[rag - reload retriever] reload with index: {index_name}") - - if VECTOR_DATABASE == "REDIS": - from langchain_community.vectorstores import Redis - from rag_redis.config import INDEX_SCHEMA, REDIS_URL - - vdb = Redis.from_existing_index( - embeddings, - index_name=index_name, - redis_url=REDIS_URL, - schema=INDEX_SCHEMA, - ) - - elif VECTOR_DATABASE == "QDRANT": - from langchain_community.vectorstores import Qdrant - from qdrant_client import QdrantClient - from rag_qdrant.config import COLLECTION_NAME, QDRANT_HOST, QDRANT_PORT - - client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT) - vdb = Qdrant( - embeddings=embeddings, - collection_name=COLLECTION_NAME, - client=client, - ) - - retriever = vdb.as_retriever(search_type="mmr") - return retriever - - -def post_process_text(text: str): - if text == " ": - return "data: @#$\n\n" - if text == "\n": - return "data:
\n\n" - if text.isspace(): - return None - new_text = text.replace(" ", "@#$") - return f"data: {new_text}\n\n" diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/packages/README.md b/ChatQnA/deprecated/langchain/docker/qna-app/packages/README.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/ChatQnA/deprecated/langchain/docker/qna-app/pyproject.toml b/ChatQnA/deprecated/langchain/docker/qna-app/pyproject.toml deleted file mode 100644 index 0c3faea39..000000000 --- a/ChatQnA/deprecated/langchain/docker/qna-app/pyproject.toml +++ /dev/null @@ -1,23 +0,0 @@ -[tool.poetry] -name = "my-app" -version = "0.1.0" -description = "" -authors = ["Your Name "] -readme = "README.md" -packages = [ - { include = "app" }, -] - -[tool.poetry.dependencies] -python = "^3.11" -uvicorn = "^0.23.2" -langserve = {extras = ["server"], version = ">=0.0.30"} -pydantic = "<2" - - -[tool.poetry.group.dev.dependencies] -langchain-cli = ">=0.0.15" - -[build-system] -requires = ["poetry-core"] -build-backend = "poetry.core.masonry.api" diff --git a/ChatQnA/deprecated/langchain/docker/requirements.txt b/ChatQnA/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index 472cbed0a..000000000 --- a/ChatQnA/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,19 +0,0 @@ --f https://download.pytorch.org/whl/torch_stable.html -atlassian-python-api -cryptography==42.0.4 -easyocr -intel-extension-for-pytorch -intel-openmp -jupyter -langchain==0.1.12 -langchain-cli -langchain_benchmarks -poetry -pyarrow -pydantic==1.10.13 -pymupdf -qdrant-client==1.9.0 -redis -sentence-transformers -unstructured -unstructured[all-docs] diff --git a/ChatQnA/deprecated/langchain/qdrant/LICENSE b/ChatQnA/deprecated/langchain/qdrant/LICENSE deleted file mode 100644 index 426b65090..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2023 LangChain, Inc. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/ChatQnA/deprecated/langchain/qdrant/data/nke-10k-2023.pdf b/ChatQnA/deprecated/langchain/qdrant/data/nke-10k-2023.pdf deleted file mode 100644 index 6ade8863e..000000000 Binary files a/ChatQnA/deprecated/langchain/qdrant/data/nke-10k-2023.pdf and /dev/null differ diff --git a/ChatQnA/deprecated/langchain/qdrant/ingest.py b/ChatQnA/deprecated/langchain/qdrant/ingest.py deleted file mode 100644 index febfa055d..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/ingest.py +++ /dev/null @@ -1,95 +0,0 @@ -#!/usr/bin/env python - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.vectorstores import Qdrant -from PIL import Image -from rag_qdrant.config import COLLECTION_NAME, EMBED_MODEL, QDRANT_HOST, QDRANT_PORT, TEI_EMBEDDING_ENDPOINT - - -def pdf_loader(file_path): - try: - import easyocr - import fitz - except ImportError: - raise ImportError( - "`PyMuPDF` or 'easyocr' package is not found, please install it with " - "`pip install pymupdf or pip install easyocr.`" - ) - - doc = fitz.open(file_path) - reader = easyocr.Reader(["en"]) - result = "" - for i in range(doc.page_count): - page = doc.load_page(i) - pagetext = page.get_text().strip() - if pagetext: - result = result + pagetext - if len(doc.get_page_images(i)) > 0: - for img in doc.get_page_images(i): - if img: - pageimg = "" - xref = img[0] - img_data = doc.extract_image(xref) - img_bytes = img_data["image"] - pil_image = Image.open(io.BytesIO(img_bytes)) - img = np.array(pil_image) - img_result = reader.readtext(img, paragraph=True, detail=0) - pageimg = pageimg + ", ".join(img_result).strip() - if pageimg.endswith("!") or pageimg.endswith("?") or pageimg.endswith("."): - pass - else: - pageimg = pageimg + "." - result = result + pageimg - return result - - -def ingest_documents(): - """Ingest PDF to Qdrant from the data/ directory that - contains Edgar 10k filings data for Nike.""" - # Load list of pdfs - company_name = "Nike" - data_path = "data/" - doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0] - - print("Parsing 10k filing doc for NIKE", doc_path) - - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - content = pdf_loader(doc_path) - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created ", len(chunks), " chunks of the original pdf") - # Create vectorstore - if TEI_EMBEDDING_ENDPOINT: - # create embeddings using TEI endpoint service - embedder = HuggingFaceHubEmbeddings(model=TEI_EMBEDDING_ENDPOINT) - else: - # create embeddings using local embedding model - embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - # Batch size - batch_size = 32 - num_chunks = len(chunks) - for i in range(0, num_chunks, batch_size): - batch_chunks = chunks[i : i + batch_size] - batch_texts = [f"Company: {company_name}. " + chunk for chunk in batch_chunks] - - _ = Qdrant.from_texts( - texts=batch_texts, - embedding=embedder, - collection_name=COLLECTION_NAME, - host=QDRANT_HOST, - port=QDRANT_PORT, - ) - print(f"Processed batch {i//batch_size + 1}/{(num_chunks-1)//batch_size + 1}") - - -if __name__ == "__main__": - ingest_documents() diff --git a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant.ipynb b/ChatQnA/deprecated/langchain/qdrant/rag_qdrant.ipynb deleted file mode 100644 index d43113a33..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant.ipynb +++ /dev/null @@ -1,94 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "fe1adb29", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "id": "681a5d1e", - "metadata": {}, - "source": [ - "## Connect to RAG App\n", - "\n", - "Assuming you are already running this server:\n", - "```bash\n", - "langserve start\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "id": "d774be2a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Nike's revenue in 2023 was $51.2 billion. \n", - "\n", - "Source: 'data/nke-10k-2023.pdf', Start Index: '146100'\n" - ] - } - ], - "source": [ - "from langserve.client import RemoteRunnable\n", - "\n", - "rag_qdrant = RemoteRunnable(\"http://localhost:8000/rag-qdrant\")\n", - "\n", - "print(rag_qdrant.invoke(\"What was Nike's revenue in 2023?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "id": "07ae0005", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "As of May 31, 2023, Nike had approximately 83,700 employees worldwide. This information can be found in the first piece of context provided. (source: data/nke-10k-2023.pdf, start_index: 32532)\n" - ] - } - ], - "source": [ - "print(rag_qdrant.invoke(\"How many employees work at Nike?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4a6b9f00", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/__init__.py b/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/__init__.py deleted file mode 100644 index 916f3a44b..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/__init__.py +++ /dev/null @@ -1,2 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 diff --git a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/chain.py b/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/chain.py deleted file mode 100644 index 5476df9c2..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/chain.py +++ /dev/null @@ -1,69 +0,0 @@ -#!/usr/bin/env python - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.vectorstores import Qdrant -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.pydantic_v1 import BaseModel -from langchain_core.runnables import RunnableParallel, RunnablePassthrough -from qdrant_client import QdrantClient -from rag_qdrant.config import COLLECTION_NAME, EMBED_MODEL, QDRANT_HOST, QDRANT_PORT, TGI_LLM_ENDPOINT - - -# Make this look better in the docs. -class Question(BaseModel): - __root__: str - - -# Init Embeddings -embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - -# Connect to pre-loaded vectorstore -# run the ingest.py script to populate this - -client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT) -vectorstore = Qdrant(embeddings=embedder, collection_name=COLLECTION_NAME, client=client) - -# TODO allow user to change parameters -retriever = vectorstore.as_retriever(search_type="mmr") - -# Define our prompt -template = """ -Use the following pieces of context from retrieved -dataset to answer the question. Do not make up an answer if there is no -context provided to help answer it. Include the 'source' and 'start_index' -from the metadata included in the context you used to answer the question - -Context: ---------- -{context} - ---------- -Question: {question} ---------- - -Answer: -""" - -prompt = ChatPromptTemplate.from_template(template) - -# RAG Chain -model = HuggingFaceEndpoint( - endpoint_url=TGI_LLM_ENDPOINT, - max_new_tokens=512, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - truncate=1024, -) - -chain = ( - RunnableParallel({"context": retriever, "question": RunnablePassthrough()}) | prompt | model | StrOutputParser() -).with_types(input_type=Question) diff --git a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/config.py b/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/config.py deleted file mode 100644 index 2b30a3682..000000000 --- a/ChatQnA/deprecated/langchain/qdrant/rag_qdrant/config.py +++ /dev/null @@ -1,17 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -import os - -# Embedding model -EMBED_MODEL = os.getenv("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2") - -# Qdrant configuration -QDRANT_HOST = os.getenv("QDRANT", "localhost") -QDRANT_PORT = int(os.getenv("QDRANT_PORT", 6333)) -COLLECTION_NAME = os.getenv("COLLECTION_NAME", "rag-qdrant") - -# LLM/Embedding endpoints -TGI_LLM_ENDPOINT = os.getenv("TGI_LLM_ENDPOINT", "http://localhost:8080") -TGI_LLM_ENDPOINT_NO_RAG = os.getenv("TGI_LLM_ENDPOINT_NO_RAG", "http://localhost:8081") -TEI_EMBEDDING_ENDPOINT = os.getenv("TEI_ENDPOINT") diff --git a/ChatQnA/deprecated/langchain/redis/LICENSE b/ChatQnA/deprecated/langchain/redis/LICENSE deleted file mode 100644 index 426b65090..000000000 --- a/ChatQnA/deprecated/langchain/redis/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2023 LangChain, Inc. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/ChatQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf b/ChatQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf deleted file mode 100644 index 6ade8863e..000000000 Binary files a/ChatQnA/deprecated/langchain/redis/data/nke-10k-2023.pdf and /dev/null differ diff --git a/ChatQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf b/ChatQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf deleted file mode 100644 index 3b10122cf..000000000 Binary files a/ChatQnA/deprecated/langchain/redis/data_intel/ia_spec.pdf and /dev/null differ diff --git a/ChatQnA/deprecated/langchain/redis/ingest.py b/ChatQnA/deprecated/langchain/redis/ingest.py deleted file mode 100644 index 13c77b1ca..000000000 --- a/ChatQnA/deprecated/langchain/redis/ingest.py +++ /dev/null @@ -1,99 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.vectorstores import Redis -from PIL import Image -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - -tei_embedding_endpoint = os.getenv("TEI_ENDPOINT") - - -def pdf_loader(file_path): - try: - import easyocr - import fitz - except ImportError: - raise ImportError( - "`PyMuPDF` or 'easyocr' package is not found, please install it with " - "`pip install pymupdf or pip install easyocr.`" - ) - - doc = fitz.open(file_path) - reader = easyocr.Reader(["en"]) - result = "" - for i in range(doc.page_count): - page = doc.load_page(i) - pagetext = page.get_text().strip() - if pagetext: - result = result + pagetext - if len(doc.get_page_images(i)) > 0: - for img in doc.get_page_images(i): - if img: - pageimg = "" - xref = img[0] - img_data = doc.extract_image(xref) - img_bytes = img_data["image"] - pil_image = Image.open(io.BytesIO(img_bytes)) - img = np.array(pil_image) - img_result = reader.readtext(img, paragraph=True, detail=0) - pageimg = pageimg + ", ".join(img_result).strip() - if pageimg.endswith("!") or pageimg.endswith("?") or pageimg.endswith("."): - pass - else: - pageimg = pageimg + "." - result = result + pageimg - return result - - -def ingest_documents(): - """Ingest PDF to Redis from the data/ directory that - contains Edgar 10k filings data for Nike.""" - # Load list of pdfs - company_name = "Nike" - data_path = "data/" - doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0] - - print("Parsing 10k filing doc for NIKE", doc_path) - - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - content = pdf_loader(doc_path) - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created ", len(chunks), " chunks of the original pdf") - # Create vectorstore - if tei_embedding_endpoint: - # create embeddings using TEI endpoint service - embedder = HuggingFaceHubEmbeddings(model=tei_embedding_endpoint) - else: - # create embeddings using local embedding model - embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - # Batch size - batch_size = 32 - num_chunks = len(chunks) - for i in range(0, num_chunks, batch_size): - batch_chunks = chunks[i : i + batch_size] - batch_texts = [f"Company: {company_name}. " + chunk for chunk in batch_chunks] - - _ = Redis.from_texts( - texts=batch_texts, - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, - ) - print(f"Processed batch {i//batch_size + 1}/{(num_chunks-1)//batch_size + 1}") - - -if __name__ == "__main__": - ingest_documents() diff --git a/ChatQnA/deprecated/langchain/redis/ingest_dir_text.py b/ChatQnA/deprecated/langchain/redis/ingest_dir_text.py deleted file mode 100644 index e17997e76..000000000 --- a/ChatQnA/deprecated/langchain/redis/ingest_dir_text.py +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.document_loaders import DirectoryLoader, TextLoader, UnstructuredFileLoader -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.vectorstores import Redis -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - -loader = DirectoryLoader( - "/ws/txt_files", glob="**/*.txt", show_progress=True, use_multithreading=True, loader_cls=TextLoader -) - -text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - -chunks = loader.load_and_split(text_splitter) -print("Done preprocessing. Created", len(chunks), "chunks of the original data") - -# Create vectorstore -embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - -company_name = "Intel" -_ = Redis.from_texts( - # appending this little bit can sometimes help with semantic retrieval - # especially with multiple companies - texts=[f"Company: {company_name}. " + chunk.page_content for chunk in chunks], - metadatas=[chunk.metadata for chunk in chunks], - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, -) diff --git a/ChatQnA/deprecated/langchain/redis/ingest_intel.py b/ChatQnA/deprecated/langchain/redis/ingest_intel.py deleted file mode 100644 index b18c9dc18..000000000 --- a/ChatQnA/deprecated/langchain/redis/ingest_intel.py +++ /dev/null @@ -1,100 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.vectorstores import Redis -from PIL import Image -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - -tei_embedding_endpoint = os.getenv("TEI_ENDPOINT") - - -def pdf_loader(file_path): - try: - import easyocr - import fitz - except ImportError: - raise ImportError( - "`PyMuPDF` or 'easyocr' package is not found, please install it with " - "`pip install pymupdf or pip install easyocr.`" - ) - - doc = fitz.open(file_path) - reader = easyocr.Reader(["en"]) - result = "" - for i in range(doc.page_count): - page = doc.load_page(i) - pagetext = page.get_text().strip() - if pagetext: - result = result + pagetext - if len(doc.get_page_images(i)) > 0: - for img in doc.get_page_images(i): - if img: - pageimg = "" - xref = img[0] - img_data = doc.extract_image(xref) - img_bytes = img_data["image"] - pil_image = Image.open(io.BytesIO(img_bytes)) - img = np.array(pil_image) - img_result = reader.readtext(img, paragraph=True, detail=0) - pageimg = pageimg + ", ".join(img_result).strip() - if pageimg.endswith("!") or pageimg.endswith("?") or pageimg.endswith("."): - pass - else: - pageimg = pageimg + "." - result = result + pageimg - return result - - -def ingest_documents(): - """Ingest PDF to Redis from the data/ directory that - contains Intel manuals.""" - # Load list of pdfs - company_name = "Intel" - data_path = "data_intel/" - doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0] - - print("Parsing Intel architecture manuals", doc_path) - - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - content = pdf_loader(doc_path) - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created", len(chunks), "chunks of the original pdf") - # Create vectorstore - # Create vectorstore - if tei_embedding_endpoint: - # create embeddings using TEI endpoint service - embedder = HuggingFaceHubEmbeddings(model=tei_embedding_endpoint) - else: - # create embeddings using local embedding model - embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - # Batch size - batch_size = 32 - num_chunks = len(chunks) - for i in range(0, num_chunks, batch_size): - batch_chunks = chunks[i : i + batch_size] - batch_texts = [f"Company: {company_name}. " + chunk for chunk in batch_chunks] - - _ = Redis.from_texts( - texts=batch_texts, - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, - ) - print(f"Processed batch {i//batch_size + 1}/{(num_chunks-1)//batch_size + 1}") - - -if __name__ == "__main__": - ingest_documents() diff --git a/ChatQnA/deprecated/langchain/redis/ingest_wiki.py b/ChatQnA/deprecated/langchain/redis/ingest_wiki.py deleted file mode 100644 index 8f26984f4..000000000 --- a/ChatQnA/deprecated/langchain/redis/ingest_wiki.py +++ /dev/null @@ -1,82 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import io -import os - -import numpy as np -from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain_community.document_loaders import ConfluenceLoader -from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.vectorstores import Redis - -# from PIL import Image -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL - -tei_embedding_endpoint = os.getenv("TEI_ENDPOINT") -confluence_access_token = os.getenv("CONFLUENCE_ACCESS_TOKEN") - - -def wiki_loader(wiki_url, page_ids): - loader = ConfluenceLoader( - url=wiki_url, - token=confluence_access_token, - confluence_kwargs={"verify_ssl": False}, - ) - print(wiki_url) - print(page_ids) - documents = loader.load(page_ids=page_ids, include_attachments=True, limit=50, max_pages=50) - return documents - - -def ingest_documents(wiki_url, page_ids): - """Ingest Wiki Pages to Redis from the variables (wiki_url, page_ids) that - contains your contents of interest.""" - - # Load list of wiki pages - company_name = "Intel" - print("Parsing Intel wiki pages", page_ids) - text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True) - documents = wiki_loader(wiki_url, page_ids) - content = "" - for doc in documents: - content += doc.page_content - chunks = text_splitter.split_text(content) - - print("Done preprocessing. Created", len(chunks), "chunks of the original pdf") - # Create vectorstore - # Create vectorstore - if tei_embedding_endpoint: - # create embeddings using TEI endpoint service - embedder = HuggingFaceHubEmbeddings(model=tei_embedding_endpoint) - else: - # create embeddings using local embedding model - embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL) - - # Batch size - batch_size = 2 - num_chunks = len(chunks) - for i in range(0, num_chunks, batch_size): - batch_chunks = chunks[i : i + batch_size] - batch_texts = [f"Company: {company_name}. " + chunk for chunk in batch_chunks] - - _ = Redis.from_texts( - texts=batch_texts, - embedding=embedder, - index_name=INDEX_NAME, - index_schema=INDEX_SCHEMA, - redis_url=REDIS_URL, - ) - print(f"Processed batch {i//batch_size + 1}/{(num_chunks-1)//batch_size + 1}") - - -if __name__ == "__main__": - - wiki_url = "https://wiki.ith.intel.com/" - page_ids = [3458609323, 3467299836] - - ingest_documents(wiki_url, page_ids) diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis.ipynb b/ChatQnA/deprecated/langchain/redis/rag_redis.ipynb deleted file mode 100644 index bb3f87a8c..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis.ipynb +++ /dev/null @@ -1,88 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "681a5d1e", - "metadata": {}, - "source": [ - "## Connect to RAG App\n", - "\n", - "Assuming you are already running this server:\n", - "```bash\n", - "langserve start\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "id": "d774be2a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Nike's revenue in 2023 was $51.2 billion. \n", - "\n", - "Source: 'data/nke-10k-2023.pdf', Start Index: '146100'\n" - ] - } - ], - "source": [ - "from langserve.client import RemoteRunnable\n", - "\n", - "rag_redis = RemoteRunnable(\"http://localhost:8000/rag-redis\")\n", - "\n", - "print(rag_redis.invoke(\"What was Nike's revenue in 2023?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "id": "07ae0005", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "As of May 31, 2023, Nike had approximately 83,700 employees worldwide. This information can be found in the first piece of context provided. (source: data/nke-10k-2023.pdf, start_index: 32532)\n" - ] - } - ], - "source": [ - "print(rag_redis.invoke(\"How many employees work at Nike?\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4a6b9f00", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/__init__.py b/ChatQnA/deprecated/langchain/redis/rag_redis/__init__.py deleted file mode 100644 index 916f3a44b..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/__init__.py +++ /dev/null @@ -1,2 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/chain.py b/ChatQnA/deprecated/langchain/redis/rag_redis/chain.py deleted file mode 100644 index c3bfdc76a..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/chain.py +++ /dev/null @@ -1,76 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.vectorstores import Redis -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.pydantic_v1 import BaseModel -from langchain_core.runnables import RunnableParallel, RunnablePassthrough -from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL, TGI_LLM_ENDPOINT - - -# Make this look better in the docs. -class Question(BaseModel): - __root__: str - - -# Init Embeddings -embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL) - -# Setup semantic cache for LLM -from langchain.cache import RedisSemanticCache -from langchain.globals import set_llm_cache - -set_llm_cache(RedisSemanticCache(embedding=embedder, redis_url=REDIS_URL)) - -# Connect to pre-loaded vectorstore -# run the ingest.py script to populate this -vectorstore = Redis.from_existing_index( - embedding=embedder, index_name=INDEX_NAME, schema=INDEX_SCHEMA, redis_url=REDIS_URL -) - -# TODO allow user to change parameters -retriever = vectorstore.as_retriever(search_type="mmr") - -# Define our prompt -template = """ -Use the following pieces of context from retrieved -dataset to answer the question. Do not make up an answer if there is no -context provided to help answer it. Include the 'source' and 'start_index' -from the metadata included in the context you used to answer the question - -Context: ---------- -{context} - ---------- -Question: {question} ---------- - -Answer: -""" - -prompt = ChatPromptTemplate.from_template(template) - -# RAG Chain -model = HuggingFaceEndpoint( - endpoint_url=TGI_LLM_ENDPOINT, - max_new_tokens=512, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - truncate=1024, -) - -chain = ( - RunnableParallel({"context": retriever, "question": RunnablePassthrough()}) | prompt | model | StrOutputParser() -).with_types(input_type=Question) diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/config.py b/ChatQnA/deprecated/langchain/redis/rag_redis/config.py deleted file mode 100644 index 3dba62a70..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/config.py +++ /dev/null @@ -1,88 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - - -def get_boolean_env_var(var_name, default_value=False): - """Retrieve the boolean value of an environment variable. - - Args: - var_name (str): The name of the environment variable to retrieve. - default_value (bool): The default value to return if the variable - is not found. - - Returns: - bool: The value of the environment variable, interpreted as a boolean. - """ - true_values = {"true", "1", "t", "y", "yes"} - false_values = {"false", "0", "f", "n", "no"} - - # Retrieve the environment variable's value - value = os.getenv(var_name, "").lower() - - # Decide the boolean value based on the content of the string - if value in true_values: - return True - elif value in false_values: - return False - else: - return default_value - - -# Check for openai API key -# if "OPENAI_API_KEY" not in os.environ: -# raise Exception("Must provide an OPENAI_API_KEY as an env var.") - - -# Whether or not to enable langchain debugging -DEBUG = get_boolean_env_var("DEBUG", False) -# Set DEBUG env var to "true" if you wish to enable LC debugging module -if DEBUG: - import langchain - - langchain.debug = True - - -# Embedding model -EMBED_MODEL = os.getenv("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2") - -# Redis Connection Information -REDIS_HOST = os.getenv("REDIS_HOST", "localhost") -REDIS_PORT = int(os.getenv("REDIS_PORT", 6379)) - - -def format_redis_conn_from_env(): - redis_url = os.getenv("REDIS_URL", None) - if redis_url: - return redis_url - else: - using_ssl = get_boolean_env_var("REDIS_SSL", False) - start = "rediss://" if using_ssl else "redis://" - - # if using RBAC - password = os.getenv("REDIS_PASSWORD", None) - username = os.getenv("REDIS_USERNAME", "default") - if password is not None: - start += f"{username}:{password}@" - - return start + f"{REDIS_HOST}:{REDIS_PORT}" - - -REDIS_URL = format_redis_conn_from_env() - -# Vector Index Configuration -INDEX_NAME = os.getenv("INDEX_NAME", "rag-redis") - - -current_file_path = os.path.abspath(__file__) -parent_dir = os.path.dirname(current_file_path) -REDIS_SCHEMA = os.getenv("REDIS_SCHEMA", "schema.yml") -schema_path = os.path.join(parent_dir, REDIS_SCHEMA) -INDEX_SCHEMA = schema_path -TGI_LLM_ENDPOINT = os.getenv("TGI_LLM_ENDPOINT", "http://localhost:8080") -TGI_LLM_ENDPOINT_NO_RAG = os.getenv("TGI_LLM_ENDPOINT_NO_RAG", "http://localhost:8081") diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/schema.yml b/ChatQnA/deprecated/langchain/redis/rag_redis/schema.yml deleted file mode 100644 index 011273363..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/schema.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 384 - distance_metric: COSINE diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml b/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml deleted file mode 100644 index b4887fed0..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_1024.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 1024 - distance_metric: COSINE diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml b/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml deleted file mode 100644 index d615774e4..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_dim_768.yml +++ /dev/null @@ -1,15 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: source -numeric: - - name: start_index -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 768 - distance_metric: COSINE diff --git a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml b/ChatQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml deleted file mode 100644 index 296e49cc3..000000000 --- a/ChatQnA/deprecated/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml +++ /dev/null @@ -1,19 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -text: - - name: content - - name: changefreq - - name: description - - name: language - - name: loc - - name: priority - - name: source - - name: title -vector: - - name: content_vector - algorithm: HNSW - datatype: FLOAT32 - dims: 768 - distance_metric: COSINE diff --git a/ChatQnA/deprecated/langchain/test/README.md b/ChatQnA/deprecated/langchain/test/README.md deleted file mode 100644 index ccdb0f79f..000000000 --- a/ChatQnA/deprecated/langchain/test/README.md +++ /dev/null @@ -1,31 +0,0 @@ -## Performance measurement tests with langsmith - -Pre-requisite: Signup in langsmith [https://www.langchain.com/langsmith] and get the api token
- -### Steps to run perf measurements with tgi_gaudi.ipynb jupyter notebook - -1. This dir is mounted at /test in qna-rag-redis-server -2. Make sure redis container and LLM serving is up and running -3. enter into qna-rag-redis-server container and start jupyter notebook server (can specify needed IP address and jupyter will run on port 8888) - ``` - docker exec -it qna-rag-redis-server bash - cd /test - jupyter notebook --allow-root --ip=X.X.X.X - ``` -4. Launch jupyter notebook in your browser and open the tgi_gaudi.ipynb notebook -5. Update all the configuration parameters in the second cell of the notebook -6. Clear all the cells and run all the cells -7. The output of the last cell which calls client.run_on_dataset() will run the langchain Q&A test and captures measurements in the langsmith server. The URL to access the test result can be obtained from the output of the command -

- -### Steps to run perf measurements with end_to_end_rag_test.py python script - -1. This dir is mounted at /test in qna-rag-redis-server -2. Make sure redis container and LLM serving is up and running -3. enter into qna-rag-redis-server container and run the python script - ``` - docker exec -it qna-rag-redis-server bash - cd /test - python end_to_end_rag_test.py -l "" -e -m -ht "" -lt -dbs "" -dbu "" -dbi "" -d "" - ``` -4. Check the results in langsmith server diff --git a/ChatQnA/deprecated/langchain/test/end_to_end_rag_test.py b/ChatQnA/deprecated/langchain/test/end_to_end_rag_test.py deleted file mode 100644 index a4c17e6ae..000000000 --- a/ChatQnA/deprecated/langchain/test/end_to_end_rag_test.py +++ /dev/null @@ -1,238 +0,0 @@ -#!/usr/bin/env python - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -import argparse -import os -import uuid -from operator import itemgetter -from typing import Any, List, Mapping, Optional, Sequence - -from langchain.prompts import ChatPromptTemplate -from langchain.schema.document import Document -from langchain.schema.output_parser import StrOutputParser -from langchain.schema.runnable.passthrough import RunnableAssign -from langchain_benchmarks import clone_public_dataset, registry -from langchain_benchmarks.rag import get_eval_config -from langchain_community.embeddings import HuggingFaceEmbeddings, HuggingFaceHubEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.vectorstores import Redis -from langchain_core.callbacks.manager import CallbackManagerForLLMRun -from langchain_core.language_models.llms import LLM -from langchain_core.prompt_values import ChatPromptValue -from langchain_openai import ChatOpenAI -from langsmith.client import Client -from transformers import AutoTokenizer, LlamaForCausalLM - -# Parameters and settings -ENDPOINT_URL_GAUDI2 = "http://localhost:8000" -ENDPOINT_URL_VLLM = "http://localhost:8001/v1" -TEI_ENDPOINT = "http://localhost:8002" -LANG_CHAIN_DATASET = "" -HF_MODEL_NAME = "" -PROMPT_TOKENS_LEN = 214 # Magic number for prompt template tokens -MAX_INPUT_TOKENS = 1024 -MAX_OUTPUT_TOKENS = 128 - -# Generate a unique run ID for this experiment -run_uid = uuid.uuid4().hex[:6] - -tokenizer = None - - -def crop_tokens(prompt, max_len): - inputs = tokenizer(prompt, return_tensors="pt") - inputs_cropped = inputs["input_ids"][0:, 0:max_len] - prompt_cropped = tokenizer.batch_decode( - inputs_cropped, skip_special_tokens=True, clean_up_tokenization_spaces=False - )[0] - return prompt_cropped - - -# After the retriever fetches documents, this -# function formats them in a string to present for the LLM -def format_docs(docs: Sequence[Document]) -> str: - formatted_docs = [] - for i, doc in enumerate(docs): - doc_string = ( - f"\n" - f"{doc.metadata.get('source')}\n" - f"{doc.page_content[0:]}\n" - "" - ) - # Truncate the retrieval data based on the max tokens required - cropped = crop_tokens(doc_string, MAX_INPUT_TOKENS - PROMPT_TOKENS_LEN) - - formatted_docs.append(cropped) # doc_string - formatted_str = "\n".join(formatted_docs) - return f"\n{formatted_str}\n" - - -def ingest_dataset(args, langchain_docs): - clone_public_dataset(langchain_docs.dataset_id, dataset_name=langchain_docs.name) - docs = list(langchain_docs.get_docs()) - embedder = HuggingFaceHubEmbeddings(model=args.embedding_endpoint_url) - - _ = Redis.from_texts( - # appending this little bit can sometimes help with semantic retrieval - # especially with multiple companies - texts=[d.page_content for d in docs], - metadatas=[d.metadata for d in docs], - embedding=embedder, - index_name=args.db_index, - index_schema=args.db_schema, - redis_url=args.db_url, - ) - - -def GetLangchainDataset(args): - registry_retrieved = registry.filter(Type="RetrievalTask") - langchain_docs = registry_retrieved[args.langchain_dataset] - return langchain_docs - - -def buildchain(args): - embedder = HuggingFaceHubEmbeddings(model=args.embedding_endpoint_url) - vectorstore = Redis.from_existing_index( - embedding=embedder, index_name=args.db_index, schema=args.db_schema, redis_url=args.db_url - ) - retriever = vectorstore.as_retriever(search_kwargs={"k": 1}) - prompt = ChatPromptTemplate.from_messages( - [ - ( - "system", - "You are an AI assistant answering questions about LangChain." - "\n{context}\n" - "Respond solely based on the document content.", - ), - ("human", "{question}"), - ] - ) - - llm = None - match args.llm_service_api: - case "tgi-gaudi": - llm = HuggingFaceEndpoint( - endpoint_url=args.llm_endpoint_url, - max_new_tokens=MAX_OUTPUT_TOKENS, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=1.0, - repetition_penalty=1.03, - streaming=False, - truncate=1024, - ) - case "vllm-openai": - llm = ChatOpenAI( - model=args.model_name, - openai_api_key="EMPTY", - openai_api_base=args.llm_endpoint_url, - max_tokens=MAX_OUTPUT_TOKENS, - temperature=1.0, - top_p=0.95, - streaming=False, - frequency_penalty=1.03, - ) - - response_generator = (prompt | llm | StrOutputParser()).with_config( - run_name="GenerateResponse", - ) - - # This is the final response chain. - # It fetches the "question" key from the input dict, - # passes it to the retriever, then formats as a string. - - chain = ( - RunnableAssign( - {"context": (itemgetter("question") | retriever | format_docs).with_config(run_name="FormatDocs")} - ) - # The "RunnableAssign" above returns a dict with keys - # question (from the original input) and - # context: the string-formatted docs. - # This is passed to the response_generator above - | response_generator - ) - return chain - - -def run_test(args, chain): - client = Client() - test_run = client.run_on_dataset( - dataset_name=args.langchain_dataset, - llm_or_chain_factory=chain, - evaluation=None, - project_name=f"{args.llm_service_api}-{args.model_name} op-{MAX_OUTPUT_TOKENS} cl-{args.concurrency} iter-{run_uid}", - project_metadata={ - "index_method": "basic", - }, - concurrency_level=args.concurrency, - verbose=True, - ) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser() - parser.add_argument( - "-l", - "--llm_endpoint_url", - type=str, - required=False, - default=ENDPOINT_URL_GAUDI2, - help="LLM Service Endpoint URL", - ) - parser.add_argument( - "-e", - "--embedding_endpoint_url", - type=str, - default=TEI_ENDPOINT, - required=False, - help="Embedding Service Endpoint URL", - ) - parser.add_argument("-m", "--model_name", type=str, default=HF_MODEL_NAME, required=False, help="Model Name") - parser.add_argument("-ht", "--huggingface_token", type=str, required=True, help="Huggingface API token") - parser.add_argument("-lt", "--langchain_token", type=str, required=True, help="langchain API token") - parser.add_argument( - "-d", - "--langchain_dataset", - type=str, - required=True, - help="langchain dataset name Refer: https://docs.smith.langchain.com/evaluation/quickstart ", - ) - - parser.add_argument("-c", "--concurrency", type=int, default=16, required=False, help="Concurrency Level") - - parser.add_argument( - "-lm", - "--llm_service_api", - type=str, - default="tgi-gaudi", - required=False, - help='Choose between "tgi-gaudi" or "vllm-openai"', - ) - - parser.add_argument( - "-ig", "--ingest_dataset", type=bool, default=False, required=False, help='Set True to ingest dataset"' - ) - - parser.add_argument("-dbu", "--db_url", type=str, required=True, help="Vector DB URL") - - parser.add_argument("-dbs", "--db_schema", type=str, required=True, help="Vector DB Schema") - - parser.add_argument("-dbi", "--db_index", type=str, required=True, help="Vector DB Index Name") - - args = parser.parse_args() - - if args.ingest_dataset: - langchain_doc = GetLangchainDataset(args) - ingest_dataset(args, langchain_doc) - - tokenizer = AutoTokenizer.from_pretrained(args.model_name) - os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com" - os.environ["LANGCHAIN_API_KEY"] = args.langchain_token - os.environ["HUGGINGFACEHUB_API_TOKEN"] = args.huggingface_token - - chain = buildchain(args) - run_test(args, chain) diff --git a/ChatQnA/deprecated/langchain/test/tgi_gaudi.ipynb b/ChatQnA/deprecated/langchain/test/tgi_gaudi.ipynb deleted file mode 100644 index ecc398196..000000000 --- a/ChatQnA/deprecated/langchain/test/tgi_gaudi.ipynb +++ /dev/null @@ -1,496 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "7b419db2-6701-499c-abfa-1426f155fff5", - "metadata": {}, - "source": [ - "## Benchmarking RAG pipeline with Redis and LLM using langsmith\n", - "This notebook provides steps to Benchmark RAG pipeline using Langsmith. The RAG pipeline is implemented using Redis as vector database and llama2-70b-chat-hf model as LLM which is served by Huggingface TGI endpoint
\n", - "Langsmith documentation: https://docs.smith.langchain.com/" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "e30e3f0f-6200-464e-b429-6b69c44e06b1", - "metadata": {}, - "outputs": [], - "source": [ - "#All imports\n", - "import os\n", - "import uuid\n", - "from operator import itemgetter\n", - "from typing import Sequence\n", - "\n", - "from langchain_benchmarks import clone_public_dataset, registry\n", - "from langchain_community.embeddings import HuggingFaceEmbeddings, HuggingFaceHubEmbeddings\n", - "from langchain_community.vectorstores import Redis\n", - "from langchain_community.llms import HuggingFaceEndpoint\n", - "from langchain.prompts import ChatPromptTemplate\n", - "from langchain.schema.document import Document\n", - "from langchain.schema.output_parser import StrOutputParser\n", - "from langchain.schema.runnable.passthrough import RunnableAssign\n", - "from transformers import AutoTokenizer, LlamaForCausalLM\n", - "\n", - "from langsmith.client import Client\n", - "from langchain_benchmarks.rag import get_eval_config\n" - ] - }, - { - "cell_type": "markdown", - "id": "c57bae87-2582-419d-8dcf-66c342594ae5", - "metadata": {}, - "source": [ - "### Configuration parameters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8989b9cf-ff52-4d10-941d-e43beb4678a9", - "metadata": {}, - "outputs": [], - "source": [ - "#Configuration parameters\n", - "\n", - "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n", - "os.environ[\"LANGCHAIN_API_KEY\"] = \"add-your-langsmith-key\" # Your API key\n", - "\n", - "#Vector DB configuration\n", - "EMBED_MODEL = \"\" #Huggingface sentencetransformer model that you want to use. ex. \"BAAI/bge-base-en-v1.5\"\n", - "REDIS_INDEX_NAME = \"\" #Name of the index to be created in DB\n", - "REDIS_SERVER_URL = \"\" #Specify url of your redis server\n", - "REDIS_INDEX_SCHEMA = \"\" #path to redis schema yml file. Schema to stor data, vectors and desired metadata for every entry\n", - "\n", - "#Endpoints\n", - "TEI_ENDPOINT = \"Add your TEI endpoint\" #Huggingface TEI endpoint url for Embedding model serving. Make sure TEI is serving the same EMBED_MODEL specified above\n", - "TGI_ENDPOINT = \"Add your TGI endpoint\" #Huggingface TGI endpoint url for Embedding model serving\n", - "VLLM_ENDPOINT = \"Add your VLLM endpoint\" #vllm server endpoint (either this or TGI_ENDPOINT should be specified)\n", - "METHOD = \"\" #give \"tgi-gaudi\" to use TGI_ENDPOINT or \"vllm-openai\" to use VLLM_ENDPOINT\n", - "\n", - "#Test parameters\n", - "LANGSMITH_PROJECT_NAME = \"\" #The test result will be displayed in langsmith cloud with this project name and an unique uuid\n", - "CONCURRENCY_LEVEL = 16 #Number of concurrent queries to be sent to RAG chain\n", - "LANGCHAIN_DATASET_NAME = 'LangChain Docs Q&A' #Specify the Langchain dataset name (if using a dataset from langchain)\n", - "\n", - "#LLM parameters\n", - "LLM_MODEL_NAME = \"meta-llama/Llama-2-70b-chat-hf\"\n", - "MAX_OUTPUT_TOKENS = 128\n", - "PROMPT_TOKENS_LEN=214 # Magic number for prompt template tokens. This changes if prompt changes\n", - "MAX_INPUT_TOKENS=1024 #Use this and PROMPT_TOKENS_LEN if there is a need to limit input tokens." - ] - }, - { - "cell_type": "markdown", - "id": "e4e430ea-83e1-417f-a6fc-57a3ffdcac85", - "metadata": {}, - "source": [ - "### Selecting dataset\n", - "Below section covers selecting and using LangChain Docs Q&A dataset\n", - "This can be modified to use any dataset" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "667c6870-10b7-492b-bc09-fddf0b1e3d76", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
Name Type Dataset ID Description
LangChain Docs Q&A RetrievalTask452ccafc-18e1-4314-885b-edd735f17b9dQuestions and answers based on a snapshot of the LangChain python docs.\n", - "\n", - "The environment provides the documents and the retriever information.\n", - "\n", - "Each example is composed of a question and reference answer.\n", - "\n", - "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any).
Semi-structured ReportsRetrievalTaskc47d9617-ab99-4d6e-a6e6-92b8daf85a7dQuestions and answers based on PDFs containing tables and charts.\n", - "\n", - "The task provides the raw documents as well as factory methods to easily index them\n", - "and create a retriever.\n", - "\n", - "Each example is composed of a question and reference answer.\n", - "\n", - "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any).
Multi-modal slide decksRetrievalTask40afc8e7-9d7e-44ed-8971-2cae1eb59731This public dataset is a work-in-progress and will be extended over time.\n", - " \n", - "Questions and answers based on slide decks containing visual tables and charts.\n", - "\n", - "Each example is composed of a question and reference answer.\n", - "\n", - "Success is measured based on the accuracy of the answer relative to the reference answer.
" - ], - "text/plain": [ - "Registry(tasks=[RetrievalTask(name='LangChain Docs Q&A', dataset_id='https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d', description=\"Questions and answers based on a snapshot of the LangChain python docs.\\n\\nThe environment provides the documents and the retriever information.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", get_docs=, retriever_factories={'basic': , 'parent-doc': , 'hyde': }, architecture_factories={'conversational-retrieval-qa': }), RetrievalTask(name='Semi-structured Reports', dataset_id='https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d', description=\"Questions and answers based on PDFs containing tables and charts.\\n\\nThe task provides the raw documents as well as factory methods to easily index them\\nand create a retriever.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", get_docs=, retriever_factories={'basic': , 'parent-doc': , 'hyde': }, architecture_factories={}), RetrievalTask(name='Multi-modal slide decks', dataset_id='https://smith.langchain.com/public/40afc8e7-9d7e-44ed-8971-2cae1eb59731/d', description='This public dataset is a work-in-progress and will be extended over time.\\n \\nQuestions and answers based on slide decks containing visual tables and charts.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\n', get_docs={}, retriever_factories={}, architecture_factories={})])" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#Langchain supported datasets for Retrieval task\n", - "registry = registry.filter(Type=\"RetrievalTask\")\n", - "registry" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "81c8d507-40d5-4f56-9b68-6f579aa6cce7", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
Name LangChain Docs Q&A
Type RetrievalTask
Dataset ID 452ccafc-18e1-4314-885b-edd735f17b9d
Description Questions and answers based on a snapshot of the LangChain python docs.\n", - "\n", - "The environment provides the documents and the retriever information.\n", - "\n", - "Each example is composed of a question and reference answer.\n", - "\n", - "Success is measured based on the accuracy of the answer relative to the reference answer.\n", - "We also measure the faithfulness of the model's response relative to the retrieved documents (if any).
Retriever Factories basic, parent-doc, hyde
Architecture Factoriesconversational-retrieval-qa
get_docs
" - ], - "text/plain": [ - "RetrievalTask(name='LangChain Docs Q&A', dataset_id='https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d', description=\"Questions and answers based on a snapshot of the LangChain python docs.\\n\\nThe environment provides the documents and the retriever information.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", get_docs=, retriever_factories={'basic': , 'parent-doc': , 'hyde': }, architecture_factories={'conversational-retrieval-qa': })" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#Lets use LangChain Docs Q&A dataset for our benchmark\n", - "langchain_docs = registry[LANGCHAIN_DATASET_NAME]\n", - "langchain_docs" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "bece8e4b-2fd7-4483-abce-2f37ebf858a7", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Dataset LangChain Docs Q&A already exists. Skipping.\n", - "You can access the dataset at https://smith.langchain.com/o/9534e90b-1d2b-55ed-bf79-31dc5ff16722/datasets/3ce3b4a1-0640-4fbf-925e-2c03caceb5ac.\n" - ] - } - ], - "source": [ - "#Download the dataset locally\n", - "clone_public_dataset(langchain_docs.dataset_id, dataset_name=langchain_docs.name)" - ] - }, - { - "cell_type": "markdown", - "id": "d816db14-8175-4c9e-9f99-0b877553fdc9", - "metadata": {}, - "source": [ - "### Ingesting data into Redis vector DB\n", - "This section needs to be run only when the Redis server doesn't already contain the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "80581246-0fcd-4da3-9d45-022b871a787a", - "metadata": {}, - "outputs": [], - "source": [ - "#Embedding model for ingestion \n", - "embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "25ada515", - "metadata": {}, - "outputs": [], - "source": [ - "#Ingest the dataset into vector DB\n", - "_ = Redis.from_texts(\n", - " # appending this little bit can sometimes help with semantic retrieval\n", - " # especially with multiple companies\n", - " texts=[d.page_content for d in docs],\n", - " metadatas=[d.metadata for d in docs],\n", - " embedding=embedder,\n", - " index_name=REDIS_INDEX_NAME,\n", - " index_schema=REDIS_INDEX_SCHEMA,\n", - " redis_url=REDIS_SERVER_URL,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "a4c2f599-da93-4700-aec3-4db2d43d1ef8", - "metadata": {}, - "source": [ - "### RAG pipeline\n", - "Initialize each component of RAG pipeline and setup the chain" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "5756e0db", - "metadata": {}, - "outputs": [], - "source": [ - "#enable TEI endpoint to get high throughput high throughput queries\n", - "embedder = HuggingFaceHubEmbeddings(model=TEI_ENDPOINT)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "7f906c62-2e59-481f-9159-b2dca087d802", - "metadata": {}, - "outputs": [], - "source": [ - "#Initialize retriever to be added in langchain RAG chain.\n", - "vectorstore = Redis.from_existing_index(\n", - " embedding=embedder, index_name=REDIS_INDEX_NAME, schema=REDIS_INDEX_SCHEMA, redis_url=REDIS_SERVER_URL\n", - ")\n", - "retriever = vectorstore.as_retriever()" - ] - }, - { - "cell_type": "markdown", - "id": "cdf91d5b-e7e9-41d9-830b-465d87ffc5b0", - "metadata": {}, - "source": [ - "**Note:** Prompt is specific to dataset. Modify the prompt accordingly based on the dataset selected.
\n", - "The below prompt is for Langchain Docs Q&A dataset" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "e9e77e72-fa65-48a6-aae9-1bec9875b124", - "metadata": {}, - "outputs": [], - "source": [ - "#Setup prompt\n", - "\n", - "#helper function to crop input tokens from retrieved doc from vector DB\n", - "#This can be used in format_docs function if there is a need to make sure\n", - "#number of input tokens doesn't exceed certain limit\n", - "tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL_NAME)\n", - "def crop_tokens(prompt, max_len):\n", - " inputs = tokenizer(prompt, return_tensors=\"pt\")\n", - " inputs_cropped = inputs['input_ids'][0:,0:max_len]\n", - " prompt_cropped=tokenizer.batch_decode(inputs_cropped, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]\n", - " return prompt_cropped\n", - "\n", - "# After the retriever fetches documents, this\n", - "# function formats them in a string to present for the LLM\n", - "def format_docs(docs: Sequence[Document]) -> str:\n", - " formatted_docs = []\n", - " for i, doc in enumerate(docs):\n", - " doc_string = (\n", - " f\"\\n\"\n", - " f\"{doc.metadata.get('source')}\\n\"\n", - " f\"{doc.page_content}\\n\"\n", - " \"\"\n", - " )\n", - " # Truncate the retrieval data based on the max tokens required\n", - " cropped= crop_tokens(doc_string,MAX_INPUT_TOKENS-PROMPT_TOKENS_LEN) #remove this if there is not need of limiting INPUT tokens to LLM\n", - " formatted_docs.append(doc_string)\n", - " formatted_str = \"\\n\".join(formatted_docs)\n", - " return f\"\\n{formatted_str}\\n\"\n", - "\n", - "prompt = ChatPromptTemplate.from_messages(\n", - " [\n", - " (\n", - " \"system\",\n", - " \"You are an AI assistant answering questions about LangChain.\"\n", - " \"\\n{context}\\n\"\n", - " \"Respond solely based on the document content.\",\n", - " ),\n", - " (\"human\", \"{question}\"),\n", - " ]\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "eee03d33-7e7b-41b8-8717-b031a49c1a36", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.\n", - "Token is valid (permission: read).\n", - "Your token has been saved to /root/.cache/huggingface/token\n", - "Login successful\n" - ] - } - ], - "source": [ - "#Setup LLM \n", - "\n", - "llm = None\n", - "match METHOD:\n", - " case \"tgi-gaudi\":\n", - " llm = HuggingFaceEndpoint(\n", - " endpoint_url=TGI_ENDPOINT,\n", - " max_new_tokens=MAX_OUTPUT_TOKENS,\n", - " top_k=10,\n", - " top_p=0.95,\n", - " typical_p=0.95,\n", - " temperature=1.0,\n", - " repetition_penalty=1.03,\n", - " streaming=False,\n", - " truncate=1024\n", - " )\n", - " case \"vllm-openai\":\n", - " llm = ChatOpenAI(\n", - " model=LLM_MODEL_NAME,\n", - " openai_api_key=\"EMPTY\", \n", - " openai_api_base=VLLM_ENDPOINT,\n", - " max_tokens=MAX_OUTPUT_TOKENS,\n", - " temperature=1.0,\n", - " top_p=0.95,\n", - " streaming=False,\n", - " frequency_penalty=1.03\n", - " )\n", - "\n", - "response_generator = (prompt | llm | StrOutputParser()).with_config(\n", - " run_name=\"GenerateResponse\",\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "7164b12b-11e9-4cec-946a-e1902be507da", - "metadata": {}, - "outputs": [], - "source": [ - "# This is the final response chain.\n", - "# It fetches the \"question\" key from the input dict,\n", - "# passes it to the retriever, then formats as a string.\n", - "\n", - "chain = (\n", - " RunnableAssign(\n", - " {\n", - " \"context\": (itemgetter(\"question\") | retriever | format_docs).with_config(\n", - " run_name=\"FormatDocs\"\n", - " )\n", - " }\n", - " )\n", - " # The \"RunnableAssign\" above returns a dict with keys\n", - " # question (from the original input) and\n", - " # context: the string-formatted docs.\n", - " # This is passed to the response_generator above\n", - " | response_generator\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "9f92a667-9904-437f-88d7-45caf56afbb9", - "metadata": {}, - "source": [ - "### Setup and run Langsmith benchmark" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "b39849d7-4b18-4dc5-a97c-f50f528bc980", - "metadata": {}, - "outputs": [], - "source": [ - "#Initialize Langchain client\n", - "client = Client()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9aabc054-7371-4daf-b2a9-3a8ed891e03b", - "metadata": {}, - "outputs": [], - "source": [ - "# Generate a unique run ID for this experiment\n", - "run_uid = uuid.uuid4().hex[:6]\n", - "\n", - "#Run the test\n", - "test_run = client.run_on_dataset(\n", - " dataset_name=LANGCHAIN_DATASET_NAME,\n", - " llm_or_chain_factory=chain,\n", - " evaluation=None,\n", - " project_name=LANGSMITH_PROJECT_NAME+'_'+run_uid,\n", - " project_metadata={\n", - " \"index_method\": \"basic\",\n", - " },\n", - " concurrency_level=CONCURRENCY_LEVEL,\n", - " verbose=True,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d8198756-0797-49ad-8072-1a1d61536689", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.7" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/ChatQnA/deprecated/llamaindex/README.md b/ChatQnA/deprecated/llamaindex/README.md deleted file mode 100644 index 2b131089c..000000000 --- a/ChatQnA/deprecated/llamaindex/README.md +++ /dev/null @@ -1 +0,0 @@ -Will update soon. diff --git a/ChatQnA/deprecated/serving/tgi_gaudi/README.md b/ChatQnA/deprecated/serving/tgi_gaudi/README.md deleted file mode 100644 index c9a8d510e..000000000 --- a/ChatQnA/deprecated/serving/tgi_gaudi/README.md +++ /dev/null @@ -1,89 +0,0 @@ -[TGI-Gaudi](https://github.com/huggingface/tgi-gaudi) provides many parameters aimed at optimizing performance for text generation inference tasks. By optimizing these parameters, users can achieve the best results in terms of inference speed, memory usage, and overall efficiency. These parameters cover various aspects such as maximum sequence length, batch size, Gaudi processor utilization, and environment configurations. By carefully adjusting these parameters according to the specific requirements of the workload and hardware environment, users can unlock the full potential of TGI-Gaudi for the text generation tasks. - -# Knowledeges about TGI-Gaudi performance tuning - -## Adjusting TGI parameters - -Maximum sequence length is controlled by two arguments: - -- `--max-input-length` is the maximum possible input prompt length. Default value is `1024`. -- `--max-total-tokens` is the maximum possible total length of the sequence (input and output). Default value is `2048`. - -Maximum batch size is controlled by two arguments: - -- For prefill operation, please set `--max-prefill-total-tokens` as `bs * max-input-length`, where `bs` is your expected maximum prefill batch size. -- For decode operation, please set `--max-batch-total-tokens` as `bs * max-total-tokens`, where `bs` is your expected maximum decode batch size. -- Please note that batch size will be always padded to the nearest multiplication of `BATCH_BUCKET_SIZE` and `PREFILL_BATCH_BUCKET_SIZE`. - -To ensure greatest performance results, at the beginning of each server run, warmup is performed. It's designed to cover major recompilations while using HPU Graphs. It creates queries with all possible input shapes, based on provided parameters (described in this section) and runs basic TGI operations on them (prefill, decode, concatenate). - -Except those already mentioned, there are other parameters that need to be properly adjusted to improve performance or memory usage: - -- `PAD_SEQUENCE_TO_MULTIPLE_OF` determines sizes of input length buckets. Since warmup creates several graphs for each bucket, it's important to adjust that value proportionally to input sequence length. Otherwise, some out of memory issues can be observed. -- `ENABLE_HPU_GRAPH` enables HPU graphs usage, which is crucial for performance results. Recommended value to keep is `true` . - -For more information and documentation about Text Generation Inference, checkout [the README](https://github.com/huggingface/text-generation-inference#text-generation-inference) of the original repo. - -## Environment Variable HABANA_VISIBLE_MODULES - -To run a workload with part of the available Gaudi processors, you need to set the module IDs of the used Gaudi processors in the environment, HABANA_VISIBLE_MODULES. In general, there are eight Gaudi processors on a node, so the module IDs would be in the range of 0 ~ 7. If you want to run a 4-Gaudi workload, you can set the below before you run the workload: - -```bash -export HABANA_VISIBLE_MODULES="0,1,2,3" -``` - -If you want to run another 4-Gaudi workload in parallel, you can set the below before running the second workload to let it use the rest of the available four Gaudi processors. - -```bash -export HABANA_VISIBLE_MODULES="4,5,6,7" -``` - -Though using partial Gaudi in a workload is possible, only 2-Gaudi and 4-Gaudi scenarios are supported. It is highly recommended to set HABANA_VISIBLE_MODULES using the combinations listed below: - -- 2-Gaudi - “0,1”, “2,3”, “4,5” or “6,7” -- 4-Gaudi - “0,1,2,3” or “4,5,6,7” - -For the details please check [Multiple_Workloads_Single_Docker](https://docs.habana.ai/en/latest/PyTorch/Reference/PT_Multiple_Tenants_on_HPU/Multiple_Workloads_Single_Docker.html) - -## Environment Variable HABANA_VISIBLE_DEVICES - -There are some guidelines on setting HABANA_VISIBLE_DEVICES, however, you need to know how to find the mapping between the index and module ID of the Gaudi processors before reading the guidelines. The below command is a sample output of the mapping between index and module ID of the Gaudi processors: - -```bash -hl-smi -Q index,module_id -f csv -``` - -| index | module_id | -| :---: | :-------: | -| 3 | 6 | -| 1 | 4 | -| 2 | 7 | -| 0 | 5 | -| 4 | 2 | -| 6 | 0 | -| 7 | 3 | -| 3 | 1 | - -With the mapping between index and module ID, you can set `HABANA_VISIBLE_DEVICES` properly with the guidelines below: - -- Mount two Gaudi Processors or four Gaudi Processors in the docker container. Even though using partial Gaudi in a distributed workload is possible, only 2-Gaudi and 4-Gaudi scenario are allowed. -- Since `HABANA_VISIBLE_DEVICES` accepts index instead of module ID, you need to leverage the above command to figure out the corresponding indices for a set of module IDs. -- Avoid mounting the same index on multiple containers. Since multiple workloads might run in parallel, avoiding mounting the same Gaudi to multiple docker containers can prevent reusing the same Gaudi in different workloads. - -For the details please check [Multiple Dockers Each with a Single Workload](https://docs.habana.ai/en/latest/PyTorch/Reference/PT_Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html) - -For the System Management Interface Tool please check [hl-smi](https://docs.habana.ai/en/latest/Management_and_Monitoring/Embedded_System_Tools_Guide/System_Management_Interface_Tool.html) - -# Verified Docker commands with tuned parameters for best performance - -## Docker command for 70B model - -```bash -docker run -p 8080:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$HUGGINGFACEHUB_API_TOKEN -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES="6,7,4,5" -e HABANA_VISIBLE_MODULES="0,1,2,3" -e BATCH_BUCKET_SIZE=22 -e PREFILL_BATCH_BUCKET_SIZE=1 -e MAX_BATCH_PREFILL_TOKENS=5102 -e MAX_BATCH_TOTAL_TOKENS=32256 -e MAX_INPUT_LENGTH=1024 -e PAD_SEQUENCE_TO_MULTIPLE_OF=1024 -e MAX_WAITING_TOKENS=5 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 4 -``` - -## Docker command for 13B model - -```bash -docker run -p 8080:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$HUGGINGFACEHUB_API_TOKEN -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e PAD_SEQUENCE_TO_MULTIPLE_OF=128 -e HABANA_VISIBLE_DEVICES="4" -e BATCH_BUCKET_SIZE=16 -e PREFILL_BATCH_BUCKET_SIZE=1 -e MAX_BATCH_PREFILL_TOKENS=4096 -e MAX_BATCH_TOTAL_TOKENS=18432 -e PAD_SEQUENCE_TO_MULTIPLE_OF=1024 -e MAX_INPUT_LENGTH=1024 -e MAX_TOTAL_TOKENS=1152 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model -``` diff --git a/ChatQnA/deprecated/serving/tgi_gaudi/build_docker.sh b/ChatQnA/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 80c00c9fc..000000000 --- a/ChatQnA/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/ChatQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/ChatQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index 198bddf0d..000000000 --- a/ChatQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="Intel/neural-chat-7b-v3-3" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name="ChatQnA_server" -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name="ChatQnA_server" -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/ChatQnA/deprecated/serving/vllm/README.md b/ChatQnA/deprecated/serving/vllm/README.md deleted file mode 100644 index 2b131089c..000000000 --- a/ChatQnA/deprecated/serving/vllm/README.md +++ /dev/null @@ -1 +0,0 @@ -Will update soon. diff --git a/ChatQnA/deprecated/tests/test_langchain_inference.sh b/ChatQnA/deprecated/tests/test_langchain_inference.sh deleted file mode 100644 index 3f8663049..000000000 --- a/ChatQnA/deprecated/tests/test_langchain_inference.sh +++ /dev/null @@ -1,119 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests" - - REDIS_CONTAINER_NAME="test-redis-vector-db" - LANGCHAIN_CONTAINER_NAME="test-qna-rag-redis-server" - CHATQNA_CONTAINER_NAME="test-ChatQnA_server" - cd $WORKPATH # go to ChatQnA -} - -function rename() { - # Rename the docker container/image names to avoid conflict with local test - cd ${WORKPATH} - sed -i "s/container_name: redis-vector-db/container_name: ${REDIS_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/container_name: qna-rag-redis-server/container_name: ${LANGCHAIN_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/image: intel\/gen-ai-examples:qna-rag-redis-server/image: intel\/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME}/g" langchain/docker/docker-compose.yml - sed -i "s/ChatQnA_server/${CHATQNA_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh -} - -function launch_tgi_gaudi_service() { - local card_num=1 - local port=8888 - local model_name="Intel/neural-chat-7b-v3-3" - - cd ${WORKPATH} - - # Reset the tgi port - sed -i "s/8080/$port/g" langchain/redis/rag_redis/config.py - sed -i "s/8080/$port/g" langchain/docker/qna-app/app/server.py - sed -i "s/8080/$port/g" langchain/docker/qna-app/Dockerfile - - docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 - bash serving/tgi_gaudi/launch_tgi_service.sh $card_num $port $model_name - sleep 3m # Waits 3 minutes -} - -function launch_redis_and_langchain_service() { - cd $WORKPATH - export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} - local port=8890 - sed -i "s/port=8000/port=$port/g" langchain/docker/qna-app/app/server.py - docker compose -f langchain/docker/docker-compose.yml up -d --build - - # Ingest data into redis - docker exec $LANGCHAIN_CONTAINER_NAME \ - bash -c "cd /ws && python ingest.py > /dev/null" -} - -function start_backend_service() { - cd $WORKPATH - docker exec $LANGCHAIN_CONTAINER_NAME \ - bash -c "nohup python app/server.py &" - sleep 1m -} - -function run_tests() { - cd $WORKPATH - local port=8890 - curl 127.0.0.1:$port/v1/rag/chat \ - -X POST \ - -d "{\"query\":\"What is the total revenue of Nike in 2023?\"}" \ - -H 'Content-Type: application/json' > $LOG_PATH/langchain.log - - curl 127.0.0.1:$port/v1/rag/chat_stream \ - -X POST \ - -d "{\"query\":\"What is the total revenue of Nike in 2023?\"}" \ - -H 'Content-Type: application/json' > $LOG_PATH/langchain_stream.log -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH/langchain.log ]] && [[ $(grep -c "\$51.2 billion" $LOG_PATH/langchain.log) != 0 ]]; then - status=true - fi - - if [[ ! -f $LOG_PATH/langchain_stream.log ]] || [[ $(grep -c "billion" $LOG_PATH/langchain_stream.log) == 0 ]]; then - status=false - fi - - if [ $status == false ]; then - echo "Response check failed, please check the logs in artifacts!" - exit 1 - else - echo "Response check succeed!" - fi - -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $CHATQNA_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && docker_stop $REDIS_CONTAINER_NAME && sleep 5s - - launch_tgi_gaudi_service - launch_redis_and_langchain_service - start_backend_service - - run_tests - check_response - - docker_stop $CHATQNA_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && docker_stop $REDIS_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main diff --git a/CodeGen/deprecated/README.md b/CodeGen/deprecated/README.md deleted file mode 100644 index d79bcc19d..000000000 --- a/CodeGen/deprecated/README.md +++ /dev/null @@ -1,171 +0,0 @@ -# Code Generation - -Code-generating LLMs are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. Code LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions. - -Capabilities of LLMs in Coding: - -- Code Generation: streamline coding through Code Generation, enabling non-programmers to describe tasks for code creation. -- Code Completion: accelerate coding by suggesting contextually relevant snippets as developers type. -- Code Translation and Modernization: translate and modernize code across multiple programming languages, aiding interoperability and updating legacy projects. -- Code summarization: extract key insights from codebases, improving readability and developer productivity. -- Code Refactoring: offer suggestions for code refactoring, enhancing code performance and efficiency. -- AI-Assisted Testing: assist in creating test cases, ensuring code robustness and accelerating development cycles. -- Error Detection and Debugging: detect errors in code and provide detailed descriptions and potential fixes, expediting debugging processes. - -In this example, we present a Code Copilot application to showcase how code generation can be executed on the Intel Gaudi2 platform. This CodeGen use case involves code generation utilizing open source models such as "m-a-p/OpenCodeInterpreter-DS-6.7B", "deepseek-ai/deepseek-coder-33b-instruct" and Text Generation Inference on Intel Gaudi2. - -CodeGen architecture shows below: - -![architecture](https://i.imgur.com/G9ozwFX.png) - -# Environment Setup - -To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Intel Gaudi2, please follow these steps: - -## Prepare Gaudi Image - -Getting started is straightforward with the official Docker container. Simply pull the image using: - -```bash -docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 -``` - -Alternatively, you can build the Docker image yourself with: - -```bash -bash ./serving/tgi_gaudi/build_docker.sh -``` - -## Launch TGI Gaudi Service - -### Launch a local server instance on 1 Gaudi card: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh -``` - -### Launch a local server instance on 4 Gaudi cards: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh 4 9000 "deepseek-ai/deepseek-coder-33b-instruct" -``` - -### Customize TGI Gaudi Service - -The ./tgi_gaudi/launch_tgi_service.sh script accepts three parameters: - -- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1. -- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080. -- model_name: The model name utilized for LLM, with the default set to "m-a-p/OpenCodeInterpreter-DS-6.7B". - -You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`: - -```bash -export TGI_ENDPOINT="http://xxx.xxx.xxx.xxx:8080" -``` - -## Launch Copilot Docker - -### Build Copilot Docker Image (Optional) - -```bash -cd codegen -bash ./build_docker.sh -cd .. -``` - -### Launch Copilot Docker - -```bash -docker run -it -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} --net=host --ipc=host -v /var/run/docker.sock:/var/run/docker.sock intel/gen-ai-examples:copilot bash -``` - -# Start Copilot Server - -## Start the Backend Service - -Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service: - -Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token. - -```bash -export HUGGINGFACEHUB_API_TOKEN= -nohup python server.py & -``` - -The Copilot backend defaults to listening on port 8000, but you can adjust the port number as needed. - -# Install Copilot VSCode extension from Plugin Marketplace - -Install `Neural Copilot` in VSCode as below. - -![Install-screenshot](https://i.imgur.com/cnHRAdD.png) - -# How to use - -## Service URL setting - -Please adjust the service URL in the extension settings based on the endpoint of the code generation backend service. - -![Setting-screenshot](https://i.imgur.com/4hjvKPu.png) -![Setting-screenshot](https://i.imgur.com/AQZuzqd.png) - -## Customize - -The Copilot enables users to input their corresponding sensitive information and tokens in the user settings according to their own needs. This customization enhances the accuracy and output content to better meet individual requirements. - -![Customize](https://i.imgur.com/PkObak9.png) - -## Code Suggestion - -To trigger inline completion, you'll need to type # {your keyword} (start with your programming language's comment keyword, like // in C++ and # in python). Make sure Inline Suggest is enabled from the VS Code Settings. -For example: - -![code suggestion](https://i.imgur.com/sH5UoTO.png) - -To provide programmers with a smooth experience, the Copilot supports multiple ways to trigger inline code suggestions. If you are interested in the details, they are summarized as follows: - -- Generate code from single-line comments: The simplest way introduced before. -- Generate code from consecutive single-line comments: - -![codegen from single-line comments](https://i.imgur.com/GZsQywX.png) - -- Generate code from multi-line comments, which will not be triggered until there is at least one `space` outside the multi-line comment): - -![codegen from multi-line comments](https://i.imgur.com/PzhiWrG.png) - -- Automatically complete multi-line comments: - -![auto complete](https://i.imgur.com/cJO3PQ0.jpg) - -## Chat with AI assistant - -You can start a conversation with the AI programming assistant by clicking on the robot icon in the plugin bar on the left: - -![icon](https://i.imgur.com/f7rzfCQ.png) - -Then you can see the conversation window on the left, where you can chat with AI assistant: - -![dialog](https://i.imgur.com/aiYzU60.png) - -There are 4 areas worth noting: - -- Enter and submit your question -- Your previous questions -- Answers from AI assistant (Code will be highlighted properly according to the programming language it is written in, also support streaming output) -- Copy or replace code with one click (Note that you need to select the code in the editor first and then click "replace", otherwise the code will be inserted) - -You can also select the code in the editor and ask AI assistant question about it. -For example: - -- Select code - -![select code](https://i.imgur.com/grvrtY6.png) - -- Ask question and get answer - -![qna](https://i.imgur.com/8Kdpld7.png) - -# - -SCRIPT USAGE NOTICE:  By downloading and using any script file included with the associated software package (such as files with .bat, .cmd, or .JS extensions, Docker files, or any other type of file that, when executed, automatically downloads and/or installs files onto your system) (the “Script File”), it is your obligation to review the Script File to understand what files (e.g.,  other software, AI models, AI Datasets) the Script File will download to your system (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, even if they are installed through a silent install, you agree to any and all terms and conditions associated with such files, including but not limited to, license terms, notices, or disclaimers. diff --git a/CodeGen/deprecated/codegen/Dockerfile b/CodeGen/deprecated/codegen/Dockerfile deleted file mode 100644 index 8af06e796..000000000 --- a/CodeGen/deprecated/codegen/Dockerfile +++ /dev/null @@ -1,39 +0,0 @@ - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - - -FROM langchain/langchain:latest - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir -U -r /tmp/requirements.txt - -ENV PYTHONPATH=/home/user:/home/user/codegen-app - -WORKDIR /home/user/codegen-app -COPY codegen-app /home/user/codegen-app - -SHELL ["/bin/bash", "-c"] diff --git a/CodeGen/deprecated/codegen/build_docker.sh b/CodeGen/deprecated/codegen/build_docker.sh deleted file mode 100755 index 0eb78092d..000000000 --- a/CodeGen/deprecated/codegen/build_docker.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -docker build . -t intel/gen-ai-examples:copilot --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/CodeGen/deprecated/codegen/codegen-app/openai_protocol.py b/CodeGen/deprecated/codegen/codegen-app/openai_protocol.py deleted file mode 100644 index 3871e2066..000000000 --- a/CodeGen/deprecated/codegen/codegen-app/openai_protocol.py +++ /dev/null @@ -1,35 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# -"""Code source from FastChat's OpenAI protocol: - -https://github.com/lm-sys/FastChat/blob/main/fastchat/protocol/openai_api_protocol.py -""" -import time -from typing import Any, List, Optional, Union - -import shortuuid - -# pylint: disable=E0611 -from pydantic import BaseModel, Field - - -class ChatCompletionRequest(BaseModel): - prompt: Union[str, List[Any]] - device: Optional[str] = "cpu" - temperature: Optional[float] = 0.7 - top_p: Optional[float] = 1.0 - top_k: Optional[int] = 1 - repetition_penalty: Optional[float] = 1.0 - max_new_tokens: Optional[int] = 128 - stream: Optional[bool] = False - - -class ChatCompletionResponse(BaseModel): - id: str = Field(default_factory=lambda: f"chatcmpl-{shortuuid.random()}") - object: str = "chat.completion" - created: int = Field(default_factory=lambda: int(time.time())) - response: str diff --git a/CodeGen/deprecated/codegen/codegen-app/server.py b/CodeGen/deprecated/codegen/codegen-app/server.py deleted file mode 100644 index 600d1ef0a..000000000 --- a/CodeGen/deprecated/codegen/codegen-app/server.py +++ /dev/null @@ -1,156 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os -from typing import Optional - -from fastapi import APIRouter, FastAPI -from fastapi.responses import RedirectResponse, StreamingResponse -from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler -from langchain_community.llms import HuggingFaceEndpoint -from langchain_core.pydantic_v1 import BaseModel -from openai_protocol import ChatCompletionRequest, ChatCompletionResponse -from starlette.middleware.cors import CORSMiddleware - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -def filter_code_format(code): - language_prefixes = { - "go": "```go", - "c": "```c", - "cpp": "```cpp", - "java": "```java", - "python": "```python", - "typescript": "```typescript", - } - suffix = "\n```" - - # Find the first occurrence of a language prefix - first_prefix_pos = len(code) - for prefix in language_prefixes.values(): - pos = code.find(prefix) - if pos != -1 and pos < first_prefix_pos: - first_prefix_pos = pos + len(prefix) + 1 - - # Find the first occurrence of the suffix after the first language prefix - first_suffix_pos = code.find(suffix, first_prefix_pos + 1) - - # Extract the code block - if first_prefix_pos != -1 and first_suffix_pos != -1: - return code[first_prefix_pos:first_suffix_pos] - elif first_prefix_pos != -1: - return code[first_prefix_pos:] - - return code - - -class CodeGenAPIRouter(APIRouter): - def __init__(self, entrypoint) -> None: - super().__init__() - self.entrypoint = entrypoint - print(f"[codegen - router] Initializing API Router, entrypoint={entrypoint}") - - # Define LLM - callbacks = [StreamingStdOutCallbackHandler()] - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - callbacks=callbacks, - ) - print("[codegen - router] LLM initialized.") - - def handle_chat_completion_request(self, request: ChatCompletionRequest): - try: - print(f"Predicting chat completion using prompt '{request.prompt}'") - if request.stream: - - async def stream_generator(): - for chunk in self.llm.stream(request.prompt): - yield f"data: {chunk.encode()}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - else: - result = self.llm(request.prompt) - response = filter_code_format(result) - except Exception as e: - print(f"An error occurred: {e}") - else: - print("Chat completion finished.") - return ChatCompletionResponse(response=response) - - -tgi_endpoint = os.getenv("TGI_ENDPOINT", "http://localhost:8080") -router = CodeGenAPIRouter(tgi_endpoint) - - -def check_completion_request(request: BaseModel) -> Optional[str]: - if request.temperature is not None and request.temperature < 0: - return f"Param Error: {request.temperature} is less than the minimum of 0 --- 'temperature'" - - if request.temperature is not None and request.temperature > 2: - return f"Param Error: {request.temperature} is greater than the maximum of 2 --- 'temperature'" - - if request.top_p is not None and request.top_p < 0: - return f"Param Error: {request.top_p} is less than the minimum of 0 --- 'top_p'" - - if request.top_p is not None and request.top_p > 1: - return f"Param Error: {request.top_p} is greater than the maximum of 1 --- 'top_p'" - - if request.top_k is not None and (not isinstance(request.top_k, int)): - return f"Param Error: {request.top_k} is not valid under any of the given schemas --- 'top_k'" - - if request.top_k is not None and request.top_k < 1: - return f"Param Error: {request.top_k} is greater than the minimum of 1 --- 'top_k'" - - if request.max_new_tokens is not None and (not isinstance(request.max_new_tokens, int)): - return f"Param Error: {request.max_new_tokens} is not valid under any of the given schemas --- 'max_new_tokens'" - - return None - - -# router /v1/code_generation only supports non-streaming mode. -@router.post("/v1/code_generation") -async def code_generation_endpoint(chat_request: ChatCompletionRequest): - ret = check_completion_request(chat_request) - if ret is not None: - raise RuntimeError("Invalid parameter.") - return router.handle_chat_completion_request(chat_request) - - -# router /v1/code_chat supports both non-streaming and streaming mode. -@router.post("/v1/code_chat") -async def code_chat_endpoint(chat_request: ChatCompletionRequest): - ret = check_completion_request(chat_request) - if ret is not None: - raise RuntimeError("Invalid parameter.") - return router.handle_chat_completion_request(chat_request) - - -app.include_router(router) - - -@app.get("/") -async def redirect_root_to_docs(): - return RedirectResponse("/docs") - - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/CodeGen/deprecated/codegen/requirements.txt b/CodeGen/deprecated/codegen/requirements.txt deleted file mode 100644 index db0abd013..000000000 --- a/CodeGen/deprecated/codegen/requirements.txt +++ /dev/null @@ -1,5 +0,0 @@ -huggingface_hub -langchain==0.1.11 -langchain-cli -pydantic==1.10.13 -shortuuid diff --git a/CodeGen/deprecated/serving/tgi_gaudi/build_docker.sh b/CodeGen/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 7adf71ff0..000000000 --- a/CodeGen/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,9 +0,0 @@ - -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/CodeGen/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/CodeGen/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index 6cb748e18..000000000 --- a/CodeGen/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,40 +0,0 @@ - -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="m-a-p/OpenCodeInterpreter-DS-6.7B" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name="CodeGen_server" -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name="CodeGen_server" -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/CodeGen/deprecated/tests/test_codegen_inference.sh b/CodeGen/deprecated/tests/test_codegen_inference.sh deleted file mode 100644 index e984de901..000000000 --- a/CodeGen/deprecated/tests/test_codegen_inference.sh +++ /dev/null @@ -1,116 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -x - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests/codegen.log" - - COPILOT_CONTAINER_NAME="test-copilot" - CODEGEN_CONTAINER_NAME="test-CodeGen_server" - cd $WORKPATH # go to CodeGen -} - -function rename() { - # Rename the container names - cd ${WORKPATH} - sed -i "s/CodeGen_server/${CODEGEN_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh - sed -i "s/copilot/${COPILOT_CONTAINER_NAME}/g" codegen/build_docker.sh -} - -function docker_setup() { - local card_num=1 - local port=8902 - local model_name="m-a-p/OpenCodeInterpreter-DS-6.7B" - - cd ${WORKPATH} - - # Reset the tgi port - sed -i "s/8080/$port/g" codegen/codegen-app/server.py - - docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 - bash serving/tgi_gaudi/launch_tgi_service.sh $card_num $port $model_name - sleep 3m # Waits 3 minutes -} - -function launch_copilot_docker() { - local port=8903 - sed -i "s/port=8000/port=$port/g" codegen/codegen-app/server.py - - cd $WORKPATH/codegen - bash ./build_docker.sh - - cd $WORKPATH - docker run -dit --name=$COPILOT_CONTAINER_NAME \ - --net=host --ipc=host \ - -v /var/run/docker.sock:/var/run/docker.sock intel/gen-ai-examples:${COPILOT_CONTAINER_NAME} /bin/bash -} - -function launch_server() { - cd $WORKPATH - - # Start the Backend Service - docker exec $COPILOT_CONTAINER_NAME \ - bash -c "export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN;nohup python server.py &" - sleep 1m -} - -function run_tests() { - cd $WORKPATH - local port=8903 - - curl http://localhost:${port}/v1/code_generation \ - -X POST \ - -H "Content-Type: application/json" \ - -d '{"prompt": "def print_hello_world():", "max_new_tokens": 128, "stream": true}' > $LOG_PATH - exit_code=$? - - if [ $exit_code -ne 0 ]; then - echo "Code generation failed, please check the logs in artifacts!" - docker logs $CODEGEN_CONTAINER_NAME >> $LOG_PATH - exit 1 - fi - -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH ]] && [[ $(grep -c "Hello" $LOG_PATH) != 0 ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed, please check the logs in artifacts!" - exit 1 - else - echo "Response check succeed!" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $CODEGEN_CONTAINER_NAME && docker_stop $COPILOT_CONTAINER_NAME && sleep 5s - - docker_setup - launch_copilot_docker - launch_server - - run_tests - check_response - - docker_stop $CODEGEN_CONTAINER_NAME && docker_stop $COPILOT_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main diff --git a/CodeTrans/deprecated/README.md b/CodeTrans/deprecated/README.md deleted file mode 100644 index 90b6197fa..000000000 --- a/CodeTrans/deprecated/README.md +++ /dev/null @@ -1,42 +0,0 @@ -# Code Translation - -Code translation is the process of converting code written in one programming language to another programming language while maintaining the same functionality. This process is also known as code conversion, source-to-source translation, or transpilation. Code translation is often performed when developers want to take advantage of new programming languages, improve code performance, or maintain legacy systems. Some common examples include translating code from Python to Java, or from JavaScript to TypeScript. - -The workflow falls into the following architecture: - -![architecture](https://i.imgur.com/ums0brC.png) - -# Start Backend Service - -1. Start the TGI service to deploy your LLM - -```sh -cd serving/tgi_gaudi -bash build_docker.sh -bash launch_tgi_service.sh -``` - -`launch_tgi_service.sh` by default uses `8080` as the TGI service's port. Please replace it if there are any port conflicts. - -2. Start the CodeTranslation service - -```sh -cd langchain/docker -bash build_docker.sh -docker run -it --name code_trans_server --net=host --ipc=host -e TGI_ENDPOINT=${TGI ENDPOINT} -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACE_API_TOKEN} -e SERVER_PORT=8000 -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} intel/gen-ai-examples:code-translation bash -``` - -Here is the explanation of some of the above parameters: - -- `TGI_ENDPOINT`: The endpoint of your TGI service, usually equal to `:`. -- `HUGGINGFACEHUB_API_TOKEN`: Your HuggingFace hub API token, usually generated [here](https://huggingface.co/settings/tokens). -- `SERVER_PORT`: The port of the CodeTranslation service on the host. - -3. Quick test - -```sh -curl http://localhost:8000/v1/code_translation \ - -X POST \ - -d '{"language_from": "Python","language_to": "Java","source_code": "\ndef hello(name):\n print(\"Hello, \" + name)\n"}' \ - -H 'Content-Type: application/json' -``` diff --git a/CodeTrans/deprecated/langchain/docker/Dockerfile b/CodeTrans/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index bef8e2c34..000000000 --- a/CodeTrans/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,40 +0,0 @@ - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - -FROM intel/intel-optimized-pytorch:2.2.0-pip-jupyter - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev \ - vim - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir --upgrade pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt - -ENV PYTHONPATH=/home/user:/home/user/codetrans-app/app - -WORKDIR /home/user/codetrans-app -COPY --chown=user:user codetrans-app /home/user/codetrans-app - -ENTRYPOINT ["python", "server.py"] \ No newline at end of file diff --git a/CodeTrans/deprecated/langchain/docker/build_docker.sh b/CodeTrans/deprecated/langchain/docker/build_docker.sh deleted file mode 100644 index 9958ec5f7..000000000 --- a/CodeTrans/deprecated/langchain/docker/build_docker.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -docker build . -t intel/gen-ai-examples:code-translation --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/CodeTrans/deprecated/langchain/docker/codetrans-app/prompts.py b/CodeTrans/deprecated/langchain/docker/codetrans-app/prompts.py deleted file mode 100644 index d51fde51f..000000000 --- a/CodeTrans/deprecated/langchain/docker/codetrans-app/prompts.py +++ /dev/null @@ -1,18 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -from langchain.prompts import PromptTemplate - -prompt_template = """ - ### System: Please translate the following {language_from} codes into {language_to} codes. - - ### Original codes: - '''{language_from} - - {source_code} - - ''' - - ### Translated codes: -""" -codetrans_prompt_template = PromptTemplate.from_template(prompt_template) diff --git a/CodeTrans/deprecated/langchain/docker/codetrans-app/server.py b/CodeTrans/deprecated/langchain/docker/codetrans-app/server.py deleted file mode 100644 index 6908a16c0..000000000 --- a/CodeTrans/deprecated/langchain/docker/codetrans-app/server.py +++ /dev/null @@ -1,120 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - -from fastapi import APIRouter, FastAPI, HTTPException, Request -from fastapi.responses import StreamingResponse -from langchain_community.llms import HuggingFaceEndpoint -from prompts import codetrans_prompt_template -from starlette.middleware.cors import CORSMiddleware - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) - -TGI_ENDPOINT = os.getenv("TGI_ENDPOINT", "http://localhost:8080") -SERVICE_PORT = int(os.getenv("SERVER_PORT", 8000)) - - -class CodeTranslationAPIRouter(APIRouter): - """The router for CodeTranslation example.""" - - def __init__(self, entrypoint: str, prompt_template: str) -> None: - super().__init__() - self.entrypoint = entrypoint - - # setup TGI endpoint - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - - self.prompt_template = prompt_template - - def handle_code_translation(self, language_from: str, language_to: str, source_code: str): - prompt = self.prompt_template.format( - language_from=language_from, language_to=language_to, source_code=source_code - ) - print(f"[codetrans - nonstream] prompt:{prompt}") - try: - response = self.llm(prompt) - except Exception as e: - print(f"[codetrans - nonstream] Error occurred: {e}") - raise Exception(f"[codetrans - nonstream] {e}") - print(f"[codetrans - nonstream] response:\n{response}") - return response - - async def handle_code_translation_stream(self, language_from: str, language_to: str, source_code: str): - prompt = self.prompt_template.format( - language_from=language_from, language_to=language_to, source_code=source_code - ) - print(f"[codetrans - stream] prompt:{prompt}") - - async def stream_generator(): - for chunk in self.llm.stream(prompt): - chunk_repr = repr(chunk.encode("utf-8")) - print(f"[codetrans - stream] data: {chunk_repr}") - yield f"data: {chunk_repr}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -router = CodeTranslationAPIRouter(entrypoint=TGI_ENDPOINT, prompt_template=codetrans_prompt_template) - - -@router.post("/v1/code_translation") -async def code_translation(request: Request): - params = await request.json() - print(f"[codetrans - nonstream] POST request: /v1/code_translation, params:{params}") - language_from = params["language_from"] - language_to = params["language_to"] - source_code = params["source_code"] - try: - return router.handle_code_translation( - language_from=language_from, language_to=language_to, source_code=source_code - ) - except Exception as e: - print(f"[codetrans - nonstream] Error occurred: {e}") - raise HTTPException(status_code=500, detail=str(e)) - - -@router.post("/v1/code_translation_stream") -async def code_translation_stream(request: Request): - params = await request.json() - print(f"[codetrans - stream] POST request: /v1/code_translation_stream, params:{params}") - language_from = params["language_from"] - language_to = params["language_to"] - source_code = params["source_code"] - try: - return await router.handle_code_translation_stream( - language_from=language_from, language_to=language_to, source_code=source_code - ) - except Exception as e: - print(f"[codetrans - stream] Error occurred: {e}") - raise HTTPException(status_code=500, detail=str(e)) - - -app.include_router(router) - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=int(SERVICE_PORT)) diff --git a/CodeTrans/deprecated/langchain/docker/requirements.txt b/CodeTrans/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index 7bf9b6131..000000000 --- a/CodeTrans/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,4 +0,0 @@ -fastapi -huggingface_hub -langchain -uvicorn diff --git a/CodeTrans/deprecated/serving/tgi_gaudi/build_docker.sh b/CodeTrans/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 7adf71ff0..000000000 --- a/CodeTrans/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,9 +0,0 @@ - -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/CodeTrans/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/CodeTrans/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index e07822044..000000000 --- a/CodeTrans/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="HuggingFaceH4/mistral-7b-grok" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name tgi-gaudi-server -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name tgi-gaudi-server -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/DocSum/deprecated/README.md b/DocSum/deprecated/README.md deleted file mode 100644 index 43aec6215..000000000 --- a/DocSum/deprecated/README.md +++ /dev/null @@ -1,134 +0,0 @@ -# Document Summarization - -In a world where data, information, and legal complexities is prevalent, the volume of legal documents is growing rapidly. Law firms, legal professionals, and businesses are dealing with an ever-increasing number of legal texts, including contracts, court rulings, statutes, and regulations. -These documents contain important insights, but understanding them can be overwhelming. This is where the demand for legal document summarization comes in. - -Large Language Models (LLMs) have revolutionized the way we interact with text, LLMs can be used to create summaries of news articles, research papers, technical documents, and other types of text. Suppose you have a set of documents (PDFs, Notion pages, customer questions, etc.) and you want to summarize the content. In this example use case, we use LangChain to apply some summarization strategies and run LLM inference using Text Generation Inference on Intel Xeon and Gaudi2. - -The document summarization architecture shows below: - -![Architecture](../assets/img/docsum_architecture.png) - -![Workflow](../assets/img/docsum_workflow.png) - -# Environment Setup - -To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps: - -## Build TGI Gaudi Docker Image - -```bash -bash ./serving/tgi_gaudi/build_docker.sh -``` - -## Launch TGI Gaudi Service - -### Launch a local server instance on 1 Gaudi card: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh -``` - -For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\ to the docker run command above with a valid Hugging Face Hub read token. - -Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token. - -```bash -export HUGGINGFACEHUB_API_TOKEN= -``` - -### Launch a local server instance on 8 Gaudi cards: - -```bash -bash ./serving/tgi_gaudi/launch_tgi_service.sh 8 -``` - -### Customize TGI Gaudi Service - -The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters: - -- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1. -- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080. -- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3". - -You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`: - -```bash -export TGI_ENDPOINT="http://xxx.xxx.xxx.xxx:8080" -``` - -## Launch Document Summary Docker - -### Build Document Summary Docker Image (Optional) - -```bash -cd langchain/docker/ -bash ./build_docker.sh -cd ../../ -``` - -### Launch Document Summary Docker - -```bash -docker run -it --net=host --ipc=host -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} -v /var/run/docker.sock:/var/run/docker.sock intel/gen-ai-examples:document-summarize bash -``` - -# Start Document Summary Server - -## Start the Backend Service - -Make sure TGI-Gaudi service is running. Launch the backend service: - -```bash -export HUGGINGFACEHUB_API_TOKEN= -nohup python app/server.py & -``` - -Then you can make requests like below to check the DocSum backend service status: - -```bash -curl http://127.0.0.1:8000/v1/text_summarize \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"text":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' -``` - -## Start the Frontend Service - -Navigate to the "ui" folder and execute the following commands to start the frontend GUI: - -```bash -cd ui -sudo apt-get install npm && \ - npm install -g n && \ - n stable && \ - hash -r && \ - npm install -g npm@latest -``` - -For CentOS, please use the following commands instead: - -```bash -curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash - -sudo yum install -y nodejs -``` - -Update the `BASIC_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address. - -Run the following command to install the required dependencies: - -```bash -npm install -``` - -Start the development server by executing the following command: - -```bash -nohup npm run dev & -``` - -This will initiate the frontend service and launch the application. - -# - -SCRIPT USAGE NOTICE:  By downloading and using any script file included with the associated software package (such as files with .bat, .cmd, or .JS extensions, Docker files, or any other type of file that, when executed, automatically downloads and/or installs files onto your system) (the “Script File”), it is your obligation to review the Script File to understand what files (e.g.,  other software, AI models, AI Datasets) the Script File will download to your system (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, even if they are installed through a silent install, you agree to any and all terms and conditions associated with such files, including but not limited to, license terms, notices, or disclaimers. diff --git a/DocSum/deprecated/langchain/docker/Dockerfile b/DocSum/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index 935c6d82b..000000000 --- a/DocSum/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,37 +0,0 @@ - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - -FROM intel/intel-optimized-pytorch:2.2.0-pip-jupyter - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir --upgrade pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt - -ENV PYTHONPATH=/home/user:/home/user/summarize-app/app - -WORKDIR /home/user/summarize-app -COPY summarize-app /home/user/summarize-app diff --git a/DocSum/deprecated/langchain/docker/build_docker.sh b/DocSum/deprecated/langchain/docker/build_docker.sh deleted file mode 100644 index bd44695d8..000000000 --- a/DocSum/deprecated/langchain/docker/build_docker.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -docker build . -t intel/gen-ai-examples:document-summarize --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/DocSum/deprecated/langchain/docker/requirements.txt b/DocSum/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index 9bcb7222d..000000000 --- a/DocSum/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,13 +0,0 @@ -beautifulsoup4 -docx2txt -intel-openmp -jupyter -langchain==0.1.12 -langchain-cli -langchain_benchmarks -poetry -pyarrow -pydantic==1.10.13 -pypdf -python-multipart -sentence-transformers diff --git a/DocSum/deprecated/langchain/docker/summarize-app/.gitignore b/DocSum/deprecated/langchain/docker/summarize-app/.gitignore deleted file mode 100644 index bee8a64b7..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/.gitignore +++ /dev/null @@ -1 +0,0 @@ -__pycache__ diff --git a/DocSum/deprecated/langchain/docker/summarize-app/Dockerfile b/DocSum/deprecated/langchain/docker/summarize-app/Dockerfile deleted file mode 100644 index dbd7e0ddf..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/Dockerfile +++ /dev/null @@ -1,26 +0,0 @@ - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -FROM python:3.11-slim - -RUN pip install --no-cache-dir poetry==1.6.1 - -RUN poetry config virtualenvs.create false - -WORKDIR /code - -COPY ./pyproject.toml ./README.md ./poetry.lock* ./ - -COPY ./package[s] ./packages - -RUN poetry install --no-interaction --no-ansi --no-root - -COPY ./app ./app - -RUN poetry install --no-interaction --no-ansi - -EXPOSE 8080 - -CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"] diff --git a/DocSum/deprecated/langchain/docker/summarize-app/README.md b/DocSum/deprecated/langchain/docker/summarize-app/README.md deleted file mode 100644 index c76e0d1af..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/README.md +++ /dev/null @@ -1,79 +0,0 @@ -# my-app - -## Installation - -Install the LangChain CLI if you haven't yet - -```bash -pip install -U langchain-cli -``` - -## Adding packages - -```bash -# adding packages from -# https://github.com/langchain-ai/langchain/tree/master/templates -langchain app add $PROJECT_NAME - -# adding custom GitHub repo packages -langchain app add --repo $OWNER/$REPO -# or with whole git string (supports other git providers): -# langchain app add git+https://github.com/hwchase17/chain-of-verification - -# with a custom api mount point (defaults to `/{package_name}`) -langchain app add $PROJECT_NAME --api_path=/my/custom/path/rag -``` - -Note: you remove packages by their api path - -```bash -langchain app remove my/custom/path/rag -``` - -## Setup LangSmith (Optional) - -LangSmith will help us trace, monitor and debug LangChain applications. -LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). -If you don't have access, you can skip this section - -```shell -export LANGCHAIN_TRACING_V2=true -export LANGCHAIN_API_KEY= -export LANGCHAIN_PROJECT= # if not specified, defaults to "default" -``` - -## Launch LangServe - -```bash -langchain serve -``` - -## Running in Docker - -This project folder includes a Dockerfile that allows you to easily build and host your LangServe app. - -### Building the Image - -To build the image, you simply: - -```shell -docker build . -t my-langserve-app -``` - -If you tag your image with something other than `my-langserve-app`, -note it for use in the next step. - -### Running the Image Locally - -To run the image, you'll need to include any environment variables -necessary for your application. - -In the below example, we inject the `OPENAI_API_KEY` environment -variable with the value set in my local environment -(`$OPENAI_API_KEY`) - -We also expose port 8080 with the `-p 8080:8080` option. - -```shell -docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -p 8080:8080 my-langserve-app -``` diff --git a/DocSum/deprecated/langchain/docker/summarize-app/app/__init__.py b/DocSum/deprecated/langchain/docker/summarize-app/app/__init__.py deleted file mode 100644 index c495d1896..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/app/__init__.py +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# diff --git a/DocSum/deprecated/langchain/docker/summarize-app/app/server.py b/DocSum/deprecated/langchain/docker/summarize-app/app/server.py deleted file mode 100644 index be14b76e3..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/app/server.py +++ /dev/null @@ -1,163 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - -from fastapi import APIRouter, FastAPI, File, Request, UploadFile -from fastapi.responses import RedirectResponse, StreamingResponse -from langchain.chains.summarize import load_summarize_chain -from langchain.docstore.document import Document -from langchain.prompts import PromptTemplate -from langchain.text_splitter import CharacterTextSplitter -from langchain_community.llms import HuggingFaceEndpoint -from starlette.middleware.cors import CORSMiddleware -from utils import get_current_beijing_time, read_text_from_file - -prompt_template = """Write a concise summary of the following: -{text} -CONCISE SUMMARY:""" -prompt = PromptTemplate.from_template(prompt_template) - -refine_template = ( - "Your job is to produce a final summary\n" - "We have provided an existing summary up to a certain point: {existing_answer}\n" - "We have the opportunity to refine the existing summary" - "(only if needed) with some more context below.\n" - "------------\n" - "{text}\n" - "------------\n" - "Given the new context, refine the original summary in Italian" - "If the context isn't useful, return the original summary." -) -refine_prompt = PromptTemplate.from_template(refine_template) - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"] -) - - -class DocSummaryAPIRouter(APIRouter): - - def __init__(self, upload_dir, entrypoint) -> None: - super().__init__() - self.upload_dir = upload_dir - self.entrypoint = entrypoint - print( - f"[rag - router] Initializing API Router, params:\n \ - upload_dir={upload_dir}, entrypoint={entrypoint}" - ) - - # Define LLM - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=512, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - print("[rag - router] LLM initialized.") - - self.llm_chain = load_summarize_chain(llm=self.llm, chain_type="map_reduce") - - print("[rag - router] LLM chain initialized.") - self.doc_sotre = {} - - def handle_rag_chat(self, query: str): - response = self.llm_chain.invoke(query) - result = response["result"].split("")[0].split("\n")[0] - return result - - -upload_dir = os.getenv("RAG_UPLOAD_DIR", "./upload_dir") -tgi_endpoint = os.getenv("TGI_ENDPOINT", "http://localhost:8080") -router = DocSummaryAPIRouter(upload_dir, tgi_endpoint) - - -@router.post("/v1/text_summarize") -async def text_summarize(request: Request): - params = await request.json() - print(f"[docsum - text_summarize] POST request: /v1/text_summarize, params:{params}") - text = params["text"] - - # Split text - text_splitter = CharacterTextSplitter() - texts = text_splitter.split_text(text) - # Create multiple documents - docs = [Document(page_content=t) for t in texts] - - async def stream_generator(): - from langserve.serialization import WellKnownLCSerializer - - _serializer = WellKnownLCSerializer() - async for chunk in router.llm_chain.astream_log(docs): - data = _serializer.dumps({"ops": chunk.ops}).decode("utf-8") - print(f"[docsum - text_summarize] data: {data}") - yield f"data: {data}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -@router.post("/v1/file_summarize") -async def file_summarize(request: Request): - params = await request.json() - print(f"[docsum - file_summarize] POST request: /v1/file_summarize, params:{params}") - doc_id = params["doc_id"] - text = router.doc_sotre[doc_id] - - async def stream_generator(): - from langserve.serialization import WellKnownLCSerializer - - _serializer = WellKnownLCSerializer() - async for chunk in router.llm_chain.astream_log(text): - data = _serializer.dumps({"ops": chunk.ops}).decode("utf-8") - print(f"[docsum - file_summarize] data: {data}") - yield f"data: {data}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -@router.post("/v1/doc_upload") -async def doc_upload(file: UploadFile = File(...)): - filename = file.filename - if "/" in filename: - filename = filename.split("/")[-1] - print(f"[docsum - upload] POST request: /v1/doc_upload, filename:{filename}") - - # save file to local path - cur_time = get_current_beijing_time() - save_file_name = "/tmp/" + cur_time + "-" + filename - with open(save_file_name, "wb") as fout: - content = await file.read() - fout.write(content) - print(f"[rag - create] file saved to local path: {save_file_name}") - - doc_id, text = read_text_from_file(file, save_file_name) - router.doc_sotre[doc_id] = text - print("[docsum - upload] doc created successfully") - - return {"document_id": doc_id} - - -app.include_router(router) - - -@app.get("/") -async def redirect_root_to_docs(): - return RedirectResponse("/docs") - - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/DocSum/deprecated/langchain/docker/summarize-app/app/utils.py b/DocSum/deprecated/langchain/docker/summarize-app/app/utils.py deleted file mode 100644 index 0d47c7951..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/app/utils.py +++ /dev/null @@ -1,107 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import re -import uuid -from datetime import datetime, timedelta, timezone - -import docx2txt -import requests -from bs4 import BeautifulSoup -from langchain.docstore.document import Document -from langchain.document_loaders import PyPDFLoader -from langchain.text_splitter import CharacterTextSplitter - - -def get_current_beijing_time(): - SHA_TZ = timezone(timedelta(hours=8), name="Asia/Shanghai") - utc_now = datetime.utcnow().replace(tzinfo=timezone.utc) - beijing_time = utc_now.astimezone(SHA_TZ).strftime("%Y-%m-%d-%H:%M:%S") - return beijing_time - - -emoji_pattern = re.compile( - "[" - "\U0001F600-\U0001F64F" # emoticons - "\U0001F300-\U0001F5FF" # symbols & pictographs - "\U0001F680-\U0001F6FF" # transport & map symbols - "\U0001F1E0-\U0001F1FF" # flags (iOS) - "\U00002702-\U000027B0" - "\U000024C2-\U0001F251" - "]+", - flags=re.UNICODE, -) - - -def clean_text(x): - x = x.encode("ascii", "ignore").decode() # unicode - x = re.sub(r"https*\S+", " ", x) # url - x = re.sub(r"@\S+", " ", x) # mentions - x = re.sub(r"#\S+", " ", x) # hastags - x = re.sub(r"\s{2,}", " ", x) # over spaces - x = emoji_pattern.sub(r"", x) # emojis - x = re.sub("[^.,!?A-Za-z0-9]+", " ", x) # special characters except .,!? - - return x - - -def fetch_article_text(url: str): - r = requests.get(url) - soup = BeautifulSoup(r.text, "html.parser") - results = soup.find_all(["h1", "p"]) - text = [result.text for result in results] - ARTICLE = " ".join(text) - ARTICLE = ARTICLE.replace(".", ".") - ARTICLE = ARTICLE.replace("!", "!") - ARTICLE = ARTICLE.replace("?", "?") - sentences = ARTICLE.split("") - current_chunk = 0 - chunks = [] - for sentence in sentences: - if len(chunks) == current_chunk + 1: - if len(chunks[current_chunk]) + len(sentence.split(" ")) <= 500: - chunks[current_chunk].extend(sentence.split(" ")) - else: - current_chunk += 1 - chunks.append(sentence.split(" ")) - else: - print(current_chunk) - chunks.append(sentence.split(" ")) - - for chunk_id in range(len(chunks)): - chunks[chunk_id] = " ".join(chunks[chunk_id]) - - return ARTICLE, chunks - - -def read_pdf(file): - loader = PyPDFLoader(file) - docs = loader.load_and_split() - return docs - - -def read_text_from_file(file, save_file_name): - # read text file - if file.headers["content-type"] == "text/plain": - file.file.seek(0) - content = file.file.read().decode("utf-8") - # Split text - text_splitter = CharacterTextSplitter() - texts = text_splitter.split_text(content) - # Create multiple documents - file_content = [Document(page_content=t) for t in texts] - # read pdf file - elif file.headers["content-type"] == "application/pdf": - file_content = read_pdf(save_file_name) - - # read docx file - elif file.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document": - file_content = docx2txt.process(file) - - doc_id = f"doc_{str(uuid.uuid1())[:8]}" - - return doc_id, file_content diff --git a/DocSum/deprecated/langchain/docker/summarize-app/packages/README.md b/DocSum/deprecated/langchain/docker/summarize-app/packages/README.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/DocSum/deprecated/langchain/docker/summarize-app/pyproject.toml b/DocSum/deprecated/langchain/docker/summarize-app/pyproject.toml deleted file mode 100644 index 0c3faea39..000000000 --- a/DocSum/deprecated/langchain/docker/summarize-app/pyproject.toml +++ /dev/null @@ -1,23 +0,0 @@ -[tool.poetry] -name = "my-app" -version = "0.1.0" -description = "" -authors = ["Your Name "] -readme = "README.md" -packages = [ - { include = "app" }, -] - -[tool.poetry.dependencies] -python = "^3.11" -uvicorn = "^0.23.2" -langserve = {extras = ["server"], version = ">=0.0.30"} -pydantic = "<2" - - -[tool.poetry.group.dev.dependencies] -langchain-cli = ">=0.0.15" - -[build-system] -requires = ["poetry-core"] -build-backend = "poetry.core.masonry.api" diff --git a/DocSum/deprecated/serving/tgi_gaudi/build_docker.sh b/DocSum/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 8f49bdc17..000000000 --- a/DocSum/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/DocSum/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/DocSum/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index 778d72f4b..000000000 --- a/DocSum/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,39 +0,0 @@ -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="Intel/neural-chat-7b-v3-3" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name="DocSum_server" -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name="DocSum_server" -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/DocSum/deprecated/tests/test_langchain_inference.sh b/DocSum/deprecated/tests/test_langchain_inference.sh deleted file mode 100644 index 926b657a3..000000000 --- a/DocSum/deprecated/tests/test_langchain_inference.sh +++ /dev/null @@ -1,110 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests/langchain.log" - - DOCUMENT_SUMMARY_CONTAINER_NAME="test-document-summary" - DOCSUM_CONTAINER_NAME="test-DocSum_server" - cd $WORKPATH # go to DocSum -} - -function rename() { - # Rename the container names - cd ${WORKPATH} - sed -i "s/DocSum_server/${DOCSUM_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh -} - -function docker_setup() { - local card_num=1 - local port=8900 - local model_name="Intel/neural-chat-7b-v3-3" - - cd ${WORKPATH} - - # Reset the tgi port - sed -i "s/8080/$port/g" langchain/docker/summarize-app/app/server.py - sed -i "s/8080/$port/g" langchain/docker/summarize-app/Dockerfile - - docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 - bash serving/tgi_gaudi/launch_tgi_service.sh $card_num $port $model_name - sleep 3m # Waits 3 minutes -} - -function launch_document_summary_docker() { - local port=8901 - sed -i "s/port=8000/port=$port/g" langchain/docker/summarize-app/app/server.py - - cd $WORKPATH/langchain/docker/ - bash ./build_docker.sh - - cd $WORKPATH - docker run -dit --net=host --ipc=host \ - --name=$DOCUMENT_SUMMARY_CONTAINER_NAME \ - -v /var/run/docker.sock:/var/run/docker.sock intel/gen-ai-examples:document-summarize /bin/bash -} - -function launch_server() { - cd $WORKPATH - - # Start the Backend Service - docker exec $DOCUMENT_SUMMARY_CONTAINER_NAME \ - bash -c "export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN;nohup python app/server.py &" - sleep 1m -} - -function run_tests() { - cd $WORKPATH - local port=8901 - - status_code=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:$port/v1/text_summarize \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"text":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}') || true - - sleep 5s -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [ "$status_code" -eq 200 ]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed!" - exit 1 - else - echo "Response check succeed!" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $DOCSUM_CONTAINER_NAME && docker_stop $DOCUMENT_SUMMARY_CONTAINER_NAME && sleep 5s - - docker_setup - launch_document_summary_docker - launch_server - - run_tests - check_response - - docker_stop $DOCSUM_CONTAINER_NAME && docker_stop $DOCUMENT_SUMMARY_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main diff --git a/SearchQnA/deprecated/README.md b/SearchQnA/deprecated/README.md deleted file mode 100644 index 3b2b8f25f..000000000 --- a/SearchQnA/deprecated/README.md +++ /dev/null @@ -1,91 +0,0 @@ -# Search Question and Answering - -Search Question and Answering (SearchQnA) harnesses the synergy between search engines, like Google Search, and large language models (LLMs) to enhance QA quality. While LLMs excel at general knowledge, they face limitations in accessing real-time or specific details due to their reliance on prior training data. By integrating a search engine, SearchQnA bridges this gap. - -Operating within the LangChain framework, the Google Search QnA chatbot mimics human behavior by iteratively searching, selecting, and synthesizing information. Here's how it works: - -- Diverse Search Queries: The system employs an LLM to generate multiple search queries from a single prompt, ensuring a wide range of query terms essential for comprehensive results. - -- Parallel Search Execution: Queries are executed simultaneously, accelerating data collection. This concurrent approach enables the bot to 'read' multiple pages concurrently, a unique advantage of AI. - -- Top Link Prioritization: Algorithms identify top K links for each query, and the bot scrapes full page content in parallel. This prioritization ensures the extraction of the most relevant information. - -- Efficient Data Indexing: Extracted data is meticulously indexed into a dedicated vector store (Chroma DB), optimizing retrieval and comparison in subsequent steps. - -- Contextual Result Matching: The bot matches original search queries with relevant documents stored in the vector store, presenting users with accurate and contextually appropriate results. - -By integrating search capabilities with LLMs within the LangChain framework, this Google Search QnA chatbot delivers comprehensive and precise answers, akin to human search behavior. - -The workflow falls into the following architecture: - -![architecture](https://i.imgur.com/Caer3DT.png) - -# Start Backend Service - -1. Start the TGI service to deploy your LLM - -```sh -cd serving/tgi_gaudi -bash build_docker.sh -bash launch_tgi_service.sh -``` - -`launch_tgi_service.sh` by default uses `8080` as the TGI service's port. Please replace it if there are any port conflicts. - -2. Start the SearchQnA application using Google Search - -```sh -cd langchain/docker -docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t intel/gen-ai-examples:searchqna-gaudi --no-cache -docker run -e TGI_ENDPOINT= -e GOOGLE_CSE_ID= -e GOOGLE_API_KEY= -e HUGGINGFACEHUB_API_TOKEN= -p 8085:8000 -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBE_DEVILCES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host intel/gen-ai-examples:searchqna-gaudi -``` - -Here is the explanation of some of the above parameters: - -- `TGI_ENDPOINT`: the endpoint of your TGI service, usually equal to `:` -- `GOOGLE_CSE_ID`: your CSE ID for Google Search Engine, usually generated [here](https://programmablesearchengine.google.com/controlpanel/all) -- `GOOGLE_API_KEY`: your API key for Google Search Engine, usually generated [here](https://console.cloud.google.com/apis/credentials) -- `HUGGINGFACEHUB_API_TOKEN`: your HuggingFace hub API token, usually generated [here](https://huggingface.co/settings/tokens) -- `-p 8085:8000`: This will map the 8000 port of the SearchQnA service inside the container to the 8085 port on the host - -3. Quick test - -```sh -curl http://localhost:8085/v1/rag/web_search_chat_stream -X POST -d '{"query":"Give me some latest news?"}' -H 'Content-Type: application/json' -``` - -# Start Frontend GUI - -Navigate to the "ui" folder and execute the following commands to start the frontend GUI: - -```bash -cd ui -sudo apt-get install npm && \ - npm install -g n && \ - n stable && \ - hash -r && \ - npm install -g npm@latest -``` - -For CentOS, please use the following commands instead: - -```bash -curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash - -sudo yum install -y nodejs -``` - -Update the `BACKEND_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address. - -Run the following command to install the required dependencies: - -```bash -npm install -``` - -Start the development server by executing the following command: - -```bash -nohup npm run dev & -``` - -This will initiate the frontend service and launch the application. diff --git a/SearchQnA/deprecated/langchain/docker/Dockerfile b/SearchQnA/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index 417eb990d..000000000 --- a/SearchQnA/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,35 +0,0 @@ - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# HABANA environment -FROM vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1 AS hpu -RUN rm -rf /etc/ssh/ssh_host* - -# Set environment variables -ENV LANG=en_US.UTF-8 -ENV PYTHONPATH=/home/user:/usr/lib/habanalabs/ - -# Install required branch -RUN git clone https://github.com/Spycsh/langchain.git /langchain -b genai_examples && \ - pip install --no-cache-dir /langchain/libs/langchain && \ - pip install --no-cache-dir /langchain/libs/community && \ - rm -rf /langchain - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -# Install dependency -RUN pip install --no-cache-dir -U -r /tmp/requirements.txt - -# work dir should contains the server -# make sure it can be edited by user -WORKDIR /home/user/qna-app -COPY qna-app /home/user/qna-app - -ENTRYPOINT ["python", "server.py"] diff --git a/SearchQnA/deprecated/langchain/docker/qna-app/server.py b/SearchQnA/deprecated/langchain/docker/qna-app/server.py deleted file mode 100644 index 786541767..000000000 --- a/SearchQnA/deprecated/langchain/docker/qna-app/server.py +++ /dev/null @@ -1,219 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os -import shutil -import sys -from queue import Queue -from threading import Thread - -from fastapi import APIRouter, FastAPI, Request -from fastapi.responses import StreamingResponse -from langchain.callbacks.base import BaseCallbackHandler -from langchain.chains import RetrievalQAWithSourcesChain -from langchain.globals import set_debug -from langchain.retrievers.web_research import WebResearchRetriever -from langchain_community.embeddings import HuggingFaceInstructEmbeddings -from langchain_community.llms import HuggingFaceEndpoint -from langchain_community.utilities import GoogleSearchAPIWrapper -from langchain_community.vectorstores import Chroma -from starlette.middleware.cors import CORSMiddleware - -set_debug(True) -app = FastAPI() - -app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) - -TGI_ENDPOINT = os.getenv("TGI_ENDPOINT", "http://localhost:8080") -SHOW_INTERMEDIATE_LOG = os.getenv("SHOW_INTERMEDIATE_LOG", "True").lower() in ("true", "1") - - -class QueueCallbackHandler(BaseCallbackHandler): - """A queue that holds the result answer token buffer for streaming response.""" - - def __init__(self, queue: Queue): - self.queue = queue - self.enter_answer_phase = False - - def on_llm_new_token(self, token: str, **kwargs): - sys.stdout.write(token) - sys.stdout.flush() - if SHOW_INTERMEDIATE_LOG or self.enter_answer_phase: - self.queue.put( - { - "answer": token, - } - ) - - def on_llm_start(self, *args, **kwargs): - if SHOW_INTERMEDIATE_LOG: - if not self.enter_answer_phase: - msg = "The search engine begin to fetch the HTML pages with these questions:" - else: - msg = "\nGet the answer from Large Language Models:\n" - self.queue.put( - { - "answer": msg, - } - ) - - def on_llm_end(self, *args, **kwargs): - self.enter_answer_phase = not self.enter_answer_phase - return True - - -class SearchQuestionAnsweringAPIRouter(APIRouter): - """The router for SearchQnA example. - - The input request will firstly go through Google Search, and the fetched HTML will be stored in the vector db. - Then the input request together with relevant retrieved documents will be forward to the LLM to get the answers. - """ - - def __init__( - self, - entrypoint: str, - vectordb_embedding_model: str = "hkunlp/instructor-large", - vectordb_persistent_directory: str = "/home/user/chroma_db_oai", - ) -> None: - super().__init__() - self.entrypoint = entrypoint - self.queue = Queue() # For streaming output tokens - - # setup TGI endpoint - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - callbacks=[QueueCallbackHandler(queue=self.queue)], - ) - - # Check that google api key is provided - if "GOOGLE_API_KEY" not in os.environ or "GOOGLE_API_KEY" not in os.environ: - raise Exception("Please make sure to set GOOGLE_API_KEY and GOOGLE_API_KEY environment variables!") - - # Clear the last time searching history, which is useful to avoid interfering with current retrievals - if os.path.exists(vectordb_persistent_directory) and os.path.isdir(vectordb_persistent_directory): - shutil.rmtree(vectordb_persistent_directory) - self.vectorstore = Chroma( - embedding_function=HuggingFaceInstructEmbeddings(model_name=vectordb_embedding_model), - persist_directory=vectordb_persistent_directory, - ) - - # Build up the google search service - self.search = GoogleSearchAPIWrapper() - - # Compose the websearch retriever - self.web_search_retriever = WebResearchRetriever.from_llm( - vectorstore=self.vectorstore, - llm=self.llm, - search=self.search, - trust_env=True, - # num_search_results=3 - ) - - # Compose the whole chain - self.llm_chain = RetrievalQAWithSourcesChain.from_chain_type( - self.llm, - retriever=self.web_search_retriever, - ) - - def handle_search_chat(self, query: str): - try: - response = self.llm_chain({"question": query}) - except Exception as e: - print(f"LLM chain error: {e}") - return "Internal Server Error", "" - return response["answer"], response["sources"] - - -router = SearchQuestionAnsweringAPIRouter( - entrypoint=TGI_ENDPOINT, -) - - -@router.post("/v1/rag/web_search_chat") -async def web_search_chat(request: Request): - params = await request.json() - print(f"[websearch - chat] POST request: /v1/rag/web_search_chat, params:{params}") - query = params["query"] - answer, sources = router.handle_search_chat(query={"question": query}) - print(f"[websearch - chat] answer: {answer}, sources: {sources}") - return {"answer": answer, "sources": sources} - - -@router.post("/v1/rag/web_search_chat_stream") -async def web_search_chat_stream(request: Request): - params = await request.json() - print(f"[websearch - streaming chat] POST request: /v1/rag/web_search_chat_stream, params:{params}") - query = params["query"] - - def stream_callback(query): - finished = object() - - def task(): - try: - _ = router.llm_chain({"question": query}) - router.queue.put(finished) - except Exception as e: - print(f"LLM chain error: {e}") - router.queue.put({"answer": "\nInternal Server Error\n"}) - router.queue.put(finished) - - t = Thread(target=task) - t.start() - while True: - try: - item = router.queue.get() - if item is finished: - break - yield item - except Queue.Empty: - continue - - def stream_generator(): - import codecs - - chat_response = "" - for res_dict in stream_callback(query={"question": query}): - text = res_dict["answer"] - chat_response += text - if text == " ": - yield "data: @#$\n\n" - continue - # if text.isspace(): - # continue - if "\n" in text or "\r" in text: - text = text.replace("\n", "
").replace(" ", "@#$") - yield f"data: {text}\n\n" - continue - text = text.replace(" ", "@#$") - yield f"data: {text}\n\n" - chat_response = chat_response.split("")[0] - print(f"\n\n[rag - chat_stream] stream response: {chat_response}\n\n") - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -app.include_router(router) - -if __name__ == "__main__": - import uvicorn - - fastapi_port = os.getenv("FASTAPI_PORT", "8000") - uvicorn.run(app, host="0.0.0.0", port=int(fastapi_port)) diff --git a/SearchQnA/deprecated/langchain/docker/requirements.txt b/SearchQnA/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index f2e33b696..000000000 --- a/SearchQnA/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,10 +0,0 @@ -beautifulsoup4 -chromadb -eager -fastapi -google-api-python-client>=2.100.0 -html2text -InstructorEmbedding -optimum[habana] -sentence-transformers==2.2.2 -uvicorn diff --git a/SearchQnA/deprecated/serving/tgi_gaudi/build_docker.sh b/SearchQnA/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 7adf71ff0..000000000 --- a/SearchQnA/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,9 +0,0 @@ - -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -git clone https://github.com/huggingface/tgi-gaudi.git -cd ./tgi-gaudi/ -docker build -t ghcr.io/huggingface/tgi-gaudi:1.2.1 . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/SearchQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/SearchQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index 91216966c..000000000 --- a/SearchQnA/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="Intel/neural-chat-7b-v3-3" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name tgi-gaudi-server -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name tgi-gaudi-server -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/SearchQnA/deprecated/tests/test_langchain_inference.sh b/SearchQnA/deprecated/tests/test_langchain_inference.sh deleted file mode 100644 index 9119c8cf5..000000000 --- a/SearchQnA/deprecated/tests/test_langchain_inference.sh +++ /dev/null @@ -1,96 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests/langchain.log" - - TGI_CONTAINER_NAME="test-tgi-gaudi-server" - LANGCHAIN_CONTAINER_NAME="test-searchqna-gaudi" -} - -function rename() { - # Rename the docker container/image names to avoid conflict with local test - cd ${WORKPATH} - sed -i "s/tgi-gaudi-server/${TGI_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh -} - -function launch_tgi_gaudi_service() { - local card_num=1 - local port=8870 - local model_name="Intel/neural-chat-7b-v3-3" - - cd ${WORKPATH} - - docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1 - bash serving/tgi_gaudi/launch_tgi_service.sh $card_num $port $model_name - sleep 2m -} - -function launch_langchain_service() { - cd $WORKPATH - local port=8875 - cd langchain/docker - docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t intel/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME} - - tgi_ip_name=$(echo $(hostname) | tr '[a-z]-' '[A-Z]_')_$(echo 'IP') - tgi_ip=$(eval echo '$'$tgi_ip_name) - docker run -d --name=${LANGCHAIN_CONTAINER_NAME} -e TGI_ENDPOINT=http://${tgi_ip}:8870 -e GOOGLE_CSE_ID=${GOOGLE_CSE_ID} -e GOOGLE_API_KEY=${GOOGLE_API_KEY} -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} \ - -p ${port}:8000 --runtime=habana -e HABANA_VISIBE_DEVILCES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host intel/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME} - - sleep 2m -} - - -function run_tests() { - cd $WORKPATH - local port=8875 - - curl http://localhost:${port}/v1/rag/web_search_chat \ - -X POST \ - -d '{"query":"What is the GitHub Repo link of Intel Neural Compressor?"}' \ - -H 'Content-Type: application/json' > $LOG_PATH - -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH ]] && [[ $(grep -c "Neural Compressor" $LOG_PATH) != 0 ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed, please check the logs in artifacts!" - exit 1 - else - echo "Response check succeed!" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $TGI_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && sleep 5s - - launch_tgi_gaudi_service - launch_langchain_service - - run_tests - check_response - - docker_stop $TGI_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main diff --git a/Translation/deprecated/README.md b/Translation/deprecated/README.md deleted file mode 100644 index 366866231..000000000 --- a/Translation/deprecated/README.md +++ /dev/null @@ -1,51 +0,0 @@ -# Language Translation - -Language Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. - -The workflow falls into the following architecture: - -![architecture](../assets/img/translation_architecture.png) - -# Start Backend Service - -1. Start the TGI Service to deploy your LLM - -```sh -cd serving/tgi_gaudi -bash build_docker.sh -bash launch_tgi_service.sh -``` - -`launch_tgi_service.sh` the script uses `8080` as the TGI service's port by default. Please replace it if any port conflicts detected. - -2. Start the Language Translation Service - -```sh -cd langchain/docker -bash build_docker.sh -docker run -it --name translation_server --net=host --ipc=host -e TGI_ENDPOINT=${TGI_ENDPOINT} -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e SERVER_PORT=8000 -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} translation:latest bash -``` - -**Note**: Set the following parameters before running the above command - -- `TGI_ENDPOINT`: The endpoint of your TGI service, usually equal to `:`. -- `HUGGINGFACEHUB_API_TOKEN`: Your HuggingFace hub API token, usually generated [here](https://huggingface.co/settings/tokens). -- `SERVER_PORT`: The port of the Translation service on the host. - -3. Quick Test - -```sh -curl http://localhost:8000/v1/translation \ - -X POST \ - -d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' \ - -H 'Content-Type: application/json' -``` - -The shortcodes of languages are also supported: - -```sh -curl http://localhost:8000/v1/translation \ - -X POST \ - -d '{"language_from": "de","language_to": "en","source_language": "Maschinelles Lernen"}' \ - -H 'Content-Type: application/json' -``` diff --git a/Translation/deprecated/langchain/docker/Dockerfile b/Translation/deprecated/langchain/docker/Dockerfile deleted file mode 100644 index 699ad2ce8..000000000 --- a/Translation/deprecated/langchain/docker/Dockerfile +++ /dev/null @@ -1,40 +0,0 @@ - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# SCRIPT USAGE NOTICE: By downloading and using any script file included -# with the associated software package (such as files with .bat, .cmd, or -# .JS extensions, Docker files, or any other type of file that, when executed, -# automatically downloads and/or installs files onto your system) (the “Script File”), -# it is your obligation to review the Script File to understand what files (e.g., -# other software, AI models, AI Datasets) the Script File will download to your system -# (“Downloaded Files”). Furthermore, by downloading and using the Downloaded Files, -# even if they are installed through a silent install, you agree to any and all -# terms and conditions associated with such files, including but not limited to, -# license terms, notices, or disclaimers. - -FROM intel/intel-optimized-pytorch:2.2.0-pip-jupyter - -RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ - libgl1-mesa-glx \ - libjemalloc-dev \ - vim - -RUN useradd -m -s /bin/bash user && \ - mkdir -p /home/user && \ - chown -R user /home/user/ - -USER user - -COPY requirements.txt /tmp/requirements.txt - -RUN pip install --no-cache-dir --upgrade pip && \ - pip install --no-cache-dir -r /tmp/requirements.txt - -ENV PYTHONPATH=/home/user:/home/user/translation-app/app - -WORKDIR /home/user/translation-app -COPY --chown=user:user translation-app /home/user/translation-app - -ENTRYPOINT ["python", "server.py"] \ No newline at end of file diff --git a/Translation/deprecated/langchain/docker/build_docker.sh b/Translation/deprecated/langchain/docker/build_docker.sh deleted file mode 100644 index 2fd496c22..000000000 --- a/Translation/deprecated/langchain/docker/build_docker.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -docker build . -t translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/Translation/deprecated/langchain/docker/requirements.txt b/Translation/deprecated/langchain/docker/requirements.txt deleted file mode 100644 index 7bf9b6131..000000000 --- a/Translation/deprecated/langchain/docker/requirements.txt +++ /dev/null @@ -1,4 +0,0 @@ -fastapi -huggingface_hub -langchain -uvicorn diff --git a/Translation/deprecated/langchain/docker/translation-app/prompts.py b/Translation/deprecated/langchain/docker/translation-app/prompts.py deleted file mode 100644 index aefeb8bf4..000000000 --- a/Translation/deprecated/langchain/docker/translation-app/prompts.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -from langchain.prompts import PromptTemplate - -prompt_template = """ - Translate this from {language_from} to {language_to}: - - {language_from}: - {source_language} - - {language_to}: -""" -translation_prompt_template = PromptTemplate.from_template(prompt_template) diff --git a/Translation/deprecated/langchain/docker/translation-app/server.py b/Translation/deprecated/langchain/docker/translation-app/server.py deleted file mode 100644 index 9ebab2572..000000000 --- a/Translation/deprecated/langchain/docker/translation-app/server.py +++ /dev/null @@ -1,169 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# - -import os - -from fastapi import APIRouter, FastAPI, HTTPException, Request -from fastapi.responses import StreamingResponse -from langchain_community.llms import HuggingFaceEndpoint -from prompts import translation_prompt_template -from starlette.middleware.cors import CORSMiddleware - -app = FastAPI() - -app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) - -TGI_ENDPOINT = os.getenv("TGI_ENDPOINT", "http://localhost:8080") -SERVICE_PORT = int(os.getenv("SERVER_PORT", 8000)) - - -short_cut_mapping = { - "en": "English", - "de": "German", - "fr": "French", - "es": "Spanish", - "it": "Italian", - "pt": "Portuguese", - "ru": "Russian", - "zh": "Chinese", - "ja": "Japanese", - "ko": "Korean", - "sv": "Swedish", - "nl": "Dutch", - "no": "Norwegian", - "da": "Danish", - "ar": "Arabic", - "hi": "Hindi", - "tr": "Turkish", - "pl": "Polish", - "fi": "Finnish", - "el": "Greek", - "cs": "Czech", - "hu": "Hungarian", - "id": "Indonesian", - "is": "Icelandic", - "ms": "Malay", - "th": "Thai", - "uk": "Ukrainian", - "vi": "Vietnamese", - "ro": "Romanian", - "he": "Hebrew", - "bn": "Bengali", - "bg": "Bulgarian", - "ca": "Catalan", - "hr": "Croatian", - "pirate": "Pirate", - "yoda": "Yoda", - "minion": "Minion", -} - - -class TranslationAPIRouter(APIRouter): - """The router for Language Translation example.""" - - def __init__(self, entrypoint: str, prompt_template: str) -> None: - super().__init__() - self.entrypoint = entrypoint - - # setup TGI endpoint - self.llm = HuggingFaceEndpoint( - endpoint_url=entrypoint, - max_new_tokens=1024, - top_k=10, - top_p=0.95, - typical_p=0.95, - temperature=0.01, - repetition_penalty=1.03, - streaming=True, - ) - - self.prompt_template = prompt_template - - def handle_translation(self, language_from: str, language_to: str, source_language: str): - if language_from in short_cut_mapping.keys(): - language_from = short_cut_mapping[language_from] - if language_to in short_cut_mapping.keys(): - language_to = short_cut_mapping[language_to] - prompt = self.prompt_template.format( - language_from=language_from, language_to=language_to, source_language=source_language - ) - print(f"[translation - nonstream] prompt:{prompt}") - try: - response = self.llm(prompt) - response = {"target_language": response.replace("", "").lstrip()} - except Exception as e: - print(f"[translation - nonstream] Error occurred: {e}") - raise Exception(f"[translation - nonstream] {e}") - print(f"[translation - nonstream] response:\n{response}") - return response - - async def handle_translation_stream(self, language_from: str, language_to: str, source_language: str): - if language_from in short_cut_mapping.keys(): - language_from = short_cut_mapping[language_from] - if language_to in short_cut_mapping.keys(): - language_to = short_cut_mapping[language_to] - prompt = self.prompt_template.format( - language_from=language_from, language_to=language_to, source_language=source_language - ) - print(f"[translation - stream] prompt:{prompt}") - - async def stream_generator(): - async for chunk in self.llm.astream_log(prompt): - print(f"[translation - stream] data: {chunk}") - yield f"data: {chunk}\n\n" - yield "data: [DONE]\n\n" - - return StreamingResponse(stream_generator(), media_type="text/event-stream") - - -router = TranslationAPIRouter(entrypoint=TGI_ENDPOINT, prompt_template=translation_prompt_template) - - -@router.post("/v1/translation") -async def translation(request: Request): - params = await request.json() - print(f"[translation - nonstream] POST request: /v1/translation, params:{params}") - language_from = params["language_from"] - language_to = params["language_to"] - source_language = params["source_language"] - try: - return router.handle_translation( - language_from=language_from, language_to=language_to, source_language=source_language - ) - except Exception as e: - print(f"[translation - nonstream] Error occurred: {e}") - raise HTTPException(status_code=500, detail=str(e)) - - -@router.post("/v1/translation_stream") -async def translation_stream(request: Request): - params = await request.json() - print(f"[translation - stream] POST request: /v1/translation_stream, params:{params}") - language_from = params["language_from"] - language_to = params["language_to"] - source_language = params["source_language"] - try: - return await router.handle_translation_stream( - language_from=language_from, language_to=language_to, source_language=source_language - ) - except Exception as e: - print(f"[translation - stream] Error occurred: {e}") - raise HTTPException(status_code=500, detail=str(e)) - - -app.include_router(router) - -if __name__ == "__main__": - import uvicorn - - uvicorn.run(app, host="0.0.0.0", port=int(SERVICE_PORT)) diff --git a/Translation/deprecated/serving/tgi_gaudi/Dockerfile b/Translation/deprecated/serving/tgi_gaudi/Dockerfile deleted file mode 100644 index c8d7583fe..000000000 --- a/Translation/deprecated/serving/tgi_gaudi/Dockerfile +++ /dev/null @@ -1,6 +0,0 @@ -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -FROM ghcr.io/huggingface/tgi-gaudi:1.2.1 - -RUN pip install --no-cache-dir peft==0.6.2 \ No newline at end of file diff --git a/Translation/deprecated/serving/tgi_gaudi/build_docker.sh b/Translation/deprecated/serving/tgi_gaudi/build_docker.sh deleted file mode 100644 index 681d705a6..000000000 --- a/Translation/deprecated/serving/tgi_gaudi/build_docker.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -docker build . -t tgi-gaudi-translation:1.2.1 --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy diff --git a/Translation/deprecated/serving/tgi_gaudi/launch_tgi_service.sh b/Translation/deprecated/serving/tgi_gaudi/launch_tgi_service.sh deleted file mode 100644 index 939d90ec2..000000000 --- a/Translation/deprecated/serving/tgi_gaudi/launch_tgi_service.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/bin/bash - - -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -# Set default values -default_port=8080 -default_model="haoranxu/ALMA-13B" -default_num_cards=1 - -# Check if all required arguments are provided -if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then - echo "Usage: $0 [num_cards] [port_number] [model_name]" - exit 1 -fi - -# Assign arguments to variables -num_cards=${1:-$default_num_cards} -port_number=${2:-$default_port} -model_name=${3:-$default_model} - -# Check if num_cards is within the valid range (1-8) -if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then - echo "Error: num_cards must be between 1 and 8." - exit 1 -fi - -# Set the volume variable -volume=$PWD/data - -# Build the Docker run command based on the number of cards -if [ "$num_cards" -eq 1 ]; then - docker_cmd="docker run -d --name tgi-gaudi-server-translation -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi-gaudi-translation:1.2.1 --model-id $model_name" -else - docker_cmd="docker run -d --name tgi-gaudi-server-translation -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi-gaudi-translation:1.2.1 --model-id $model_name --sharded true --num-shard $num_cards" -fi - -# Execute the Docker run command -eval $docker_cmd diff --git a/Translation/deprecated/tests/test_langchain_inference.sh b/Translation/deprecated/tests/test_langchain_inference.sh deleted file mode 100644 index 7b260e89a..000000000 --- a/Translation/deprecated/tests/test_langchain_inference.sh +++ /dev/null @@ -1,151 +0,0 @@ -#!/bin/bash -# Copyright (C) 2024 Intel Corporation -# SPDX-License-Identifier: Apache-2.0 - -set -xe - -function test_env_setup() { - WORKPATH=$(dirname "$PWD") - LOG_PATH="$WORKPATH/tests/langchain.log" - - TGI_CONTAINER_NAME="test-tgi-gaudi-server" - LANGCHAIN_CONTAINER_NAME="test-translation-gaudi" -} - -function rename() { - # Rename the docker container/image names to avoid conflict with local test - cd ${WORKPATH} - sed -i "s/tgi-gaudi-server-translation/${TGI_CONTAINER_NAME}/g" serving/tgi_gaudi/launch_tgi_service.sh -} - -function launch_tgi_gaudi_service() { - local card_num=1 - local port=8870 - local model_name="haoranxu/ALMA-13B" - - cd ${WORKPATH}/serving/tgi_gaudi - - bash build_docker.sh - bash launch_tgi_service.sh $card_num $port $model_name - sleep 2m -} - -function launch_langchain_service() { - cd $WORKPATH - local port=8875 - cd langchain/docker - docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t intel/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME} - - docker run -d --name=${LANGCHAIN_CONTAINER_NAME} --net=host -e TGI_ENDPOINT=http://localhost:8870 -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} \ - -e SERVER_PORT=${port} -e http_proxy=${http_proxy} -e https_proxy=${https_proxy} --ipc=host intel/gen-ai-examples:${LANGCHAIN_CONTAINER_NAME} - sleep 2m -} - - -function run_tests() { - cd $WORKPATH - local port=8875 - - # response: {"target_language":"I love machine translation"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "zh","language_to": "en","source_language": "我爱机器翻译。"}' \ - -H 'Content-Type: application/json' > $LOG_PATH - - #response: {"target_language":"我是一名翻译"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "English","language_to": "Chinese","source_language": "I am a translator"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Hallo Welt"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "en","language_to": "de","source_language": "hello world"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Machine learning"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "German","language_to": "English","source_language": "Maschinelles Lernen"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Ég er glöð"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "en","language_to": "is","source_language": "I am happy"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Hello world"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "Icelandic","language_to": "English","source_language": "Halló heimur"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Velká jazyková model"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "en","language_to": "cs","source_language": "Large Language Model"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"I'm glad to see you"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "Czech","language_to": "English","source_language": "rád tě vidím"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"Хотите танцевать"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "English","language_to": "ru","source_language": "Shall we dance?"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH - - # response: {"target_language":"operating system"} - curl http://localhost:${port}/v1/translation \ - -X POST \ - -d '{"language_from": "Russian","language_to": "English","source_language": "операционная система"}' \ - -H 'Content-Type: application/json' >> $LOG_PATH -} - -function check_response() { - cd $WORKPATH - echo "Checking response" - local status=false - if [[ -f $LOG_PATH ]] && [[ $(grep -c "I love machine translation" $LOG_PATH) != 0 ]] && \ - [[ $(grep -c "我是一名翻译" $LOG_PATH) != 0 ]] && [[ $(grep -c "Hallo Welt" $LOG_PATH) != 0 ]] && \ - [[ $(grep -c "Machine learning" $LOG_PATH) != 0 ]] && [[ $(grep -c "Ég er glöð" $LOG_PATH) != 0 ]] && \ - [[ $(grep -c "Velká jazyková model" $LOG_PATH) != 0 ]] && [[ $(grep -c "I'm glad to see you" $LOG_PATH) != 0 ]] && \ - [[ $(grep -c "operating system" $LOG_PATH) != 0 ]]; then - status=true - fi - - if [ $status == false ]; then - echo "Response check failed, please check the logs in artifacts!" - exit 1 - else - echo "Response check succeed!" - fi -} - -function docker_stop() { - local container_name=$1 - cid=$(docker ps -aq --filter "name=$container_name") - if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi -} - -function main() { - test_env_setup - rename - docker_stop $TGI_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && sleep 5s - - launch_tgi_gaudi_service - launch_langchain_service - - run_tests - check_response - - docker_stop $TGI_CONTAINER_NAME && docker_stop $LANGCHAIN_CONTAINER_NAME && sleep 5s - echo y | docker system prune -} - -main