This project was built as part of the Data-Driven VC Hackathon organized by Red River West & Bivwak! by BNP Paribas
python -m venv .
pip install -r requirements.txt
source ./bin/activate
- Add your Harmonic and OpenAI keys to the
.env
file - Launch the project with
streamlit run interface.py
- UI entry: Interface.py
- (The innovative part of our solution):
backend.py
(entry point) +BossExecutor.py
(main logic) The innovative part of the solution is Use LLM to generate steps and queries from the question, in chain of thoughts. This is not integrated with the UI yet, but you can run it and see the process withpython backend.py
In today's VC world, in order to extract insights from data, it takes time. People either manually pull data from different vendor platforms and work in Excel, or the more tech-driven VCs build their own database by integrating with data from the vendors, and translate the business needs to big data queries to extract insights. Both approaches cost time and money, probably technical knowledge too.
TenX facilitates any non-technical VC analyst to extract insights from data, using natural language.
TenX will interpret your question, break the problem down into intelligent APIs (in the future, potentially queries too, but it would be more complicated), run validations, to ensure it answers what users want.
The important requirement for TenX to work is: you will need well-documented API contracts, or a Swagger spec, like
harmonic_api_doc.txt
in our example.
We only support Harmonic and OpenAI API for now.
To integrate your own data provider APIs, you must:
- Include the secrets to the .env file
- Add the name of the endpoint to
api_config.py
- Provide a list of callable endpoints, or a Swagger spec
For SQL queries
We still don't support SQL queries
We created a mock of how the backend will work, the file is main.py
.
We perform a query search for competitors of a specified domain using various client APIs.
The main function processes user input to find similar companies for a given comapny. It retrieves company information, finds similar sites, and evaluates them using a vector search engine.
- Domain Extraction: Uses a regular expression to extract the domain from the user input.
- Client Initialization: Initializes the
HarmonicClient
to fetch company information based on the domain. - API Calls:
- Retrieves company info using Harmonic API
- Fetches similar sites using Harmonic API
- Gathers detailed information for similar companies using Harmonic API
- Validation: Initializes
OpenAIClient
and validated the results from API calls with LLM.
The BossExecutor
class orchestrates the execution of tasks using various executors, such as API calls, generic tasks,
and theoretically SQL queries too (but sql would be difficult).
For now we are only implementing HarmonicClient
and OpenAIClient
to generate and execute steps based on user queries.
This class is particularly useful for handling workflows that involve stepwise execution.
- Accuracy of the the steps and queries generated by the LLM model: The current model is not perfect, and it can sometimes generate API calls that have unwanted tokens (i.e., placeholders rather than real values, bad json strings). In our design, we need a LLM validator (which we have not implemented), which should validate the generated APIs/queries before they are executed. In retrospect, we can also ask the orchestrator to be smarter: after each step is executed and results are returned, we can feed the results back to the orchestrator, and ask it to update the subsequent steps based on the results. This will ensure that the steps are more accurate as the results return per step.
-2. Scalability: As TenX support more and more data providers, we will have more API documentations that need to be fed into OpenAI (or any other LLM model) context to generate steps and queries. This can be costly over time. However, we have an alternative solution to retrieve the relevant APIs to each question: a combination of LLM and a vector database. Vector database will allow us to handle large amount of documents without the limitation of the context window.