From 1f0f9978541b00ef493401cd4d77c4d57e848979 Mon Sep 17 00:00:00 2001 From: Raymond Cheng Date: Tue, 3 Sep 2024 13:13:37 -0700 Subject: [PATCH] docs: update technical architecture (#2047) * We've added a lot since this was last touched, including Clickhouse, sqlmesh, Apollo, Dagster. --- apps/docs/docs/how-oso-works/architecture.md | 83 +++++++++++--------- apps/docs/package.json | 2 +- apps/docs/plasmic.json | 8 +- apps/docs/plasmic.lock | 14 ++-- pnpm-lock.yaml | 50 ++++++++---- 5 files changed, 94 insertions(+), 63 deletions(-) diff --git a/apps/docs/docs/how-oso-works/architecture.md b/apps/docs/docs/how-oso-works/architecture.md index d066e0d80..da1ce0034 100644 --- a/apps/docs/docs/how-oso-works/architecture.md +++ b/apps/docs/docs/how-oso-works/architecture.md @@ -9,68 +9,79 @@ deployed data pipeline so that the community can build this open data warehouse together. All of the code for this architecture is available to view/copy/redeploy from the [OSO Monorepo](https://github.com/opensource-observer/oso). ::: -## Diagram +## Pipeline Overview -The following diagram illustrates Open Source Observer's technical architecture. +OSO maintains an [ETL](https://en.wikipedia.org/wiki/Extract%2C_load%2C_transform) data pipeline that is continuously deployed from our [monorepo](https://github.com/opensource-observer/oso/) and regularly indexes all available event data about projects in the [oss-directory](https://github.com/opensource-observer/oss-directory). + +- **Extract**: raw event data from a variety of public data sources (e.g., GitHub, blockchains, npm, Open Collective) +- **Transform**: the raw data into impact metrics and impact vectors per project (e.g., # of active developers) +- **Load**: the results into various OSO data products (e.g., our API, website, widgets) -[![OSO Architecture Diagram](https://mermaid.ink/img/pako:eNqNVMtu2zAQ_BWCJxuI0rsPAfLorU4cuO3F7GFFbS0iEinwYcUN8u9dSpQly00RHiRqNcNZzi72jUtTIF_xLMuE9spXuGKCPzWo2dYEK5E95Q7tAS27tbJUHqUPFgUXuqMI_bsyrSzBevb9TmhGy4V8b6EpWRPySsmd4LfSG-vY4tHoDF9LCM6rAy4F_9Uz4gok4wj7I77P_hR4iD8ewAP7qvdK4xwBGqqj8yfUViqk21DkBENd9JtOh2UZXfOnwpZ9Yc8B7VFwit0w40yLeY-Muj1whoBGzRH3Rnur8uCRRUcTVukCX9H24CHJC8L2-VvCS1M3FMnBy3Lm5QsQGnb_rk13y9GOsQCqwYr8ItoalGadOZsUPHMwrpTtYKI0VYWpcPeVCUXnwvKC1oLF0pCtu0ViniJscaf2A205400vu0ses8Yaic51okXuz9VONYxr3KW8L3wtKJdk7CmjkdVXmoTbtr02ZKrrPL02U0_Pwf9vhoSjbzqUnp85lGCXh8a8c5jmOkQGh-OeLTbG-b1Fap-Zu-Nu6nEvtAaPVkGl_uCHcp9hTRwdiWPhO9amL-aHzU315Fe8RluDKmgIvcWw4L7EmmTiICrAvsRZ8044CN5sj1rylbcBr3hoKHN8UEC9Xg9BLBR17Lofat1se_8LDu-M4g?type=png)](https://mermaid.live/edit#pako:eNqNVMtu2zAQ_BWCJxuI0rsPAfLorU4cuO3F7GFFbS0iEinwYcUN8u9dSpQly00RHiRqNcNZzi72jUtTIF_xLMuE9spXuGKCPzWo2dYEK5E95Q7tAS27tbJUHqUPFgUXuqMI_bsyrSzBevb9TmhGy4V8b6EpWRPySsmd4LfSG-vY4tHoDF9LCM6rAy4F_9Uz4gok4wj7I77P_hR4iD8ewAP7qvdK4xwBGqqj8yfUViqk21DkBENd9JtOh2UZXfOnwpZ9Yc8B7VFwit0w40yLeY-Muj1whoBGzRH3Rnur8uCRRUcTVukCX9H24CHJC8L2-VvCS1M3FMnBy3Lm5QsQGnb_rk13y9GOsQCqwYr8ItoalGadOZsUPHMwrpTtYKI0VYWpcPeVCUXnwvKC1oLF0pCtu0ViniJscaf2A205400vu0ses8Yaic51okXuz9VONYxr3KW8L3wtKJdk7CmjkdVXmoTbtr02ZKrrPL02U0_Pwf9vhoSjbzqUnp85lGCXh8a8c5jmOkQGh-OeLTbG-b1Fap-Zu-Nu6nEvtAaPVkGl_uCHcp9hTRwdiWPhO9amL-aHzU315Fe8RluDKmgIvcWw4L7EmmTiICrAvsRZ8044CN5sj1rylbcBr3hoKHN8UEC9Xg9BLBR17Lofat1se_8LDu-M4g) +The following diagram illustrates Open Source Observer's technical architecture. +[![OSO Architecture](https://mermaid.ink/img/pako:eNqVVU1z2jAQ_SsancwktNMcOXSGhHbaTgikNOkB9yDLC1ZjS44-gDST_96VLXDAhja-gFfvzdt9u149U65SoAPa7_djaYXNYUBiOilBkplymgOZJAb0CjQZap4JC9w6DTGNZUWJ5SJXa54xbcmPy1gSfIxLlpqVGSldkgs-j-mQW6UNiW6U7MMmY85YsYJeTH_VDP84lDGIvfO_eycprPzBiFlGPsmlkHCIYJLlT8buUDMuAKvByA4GMq3_VDqk38cy7wWsyXty60A_xRRjH4kyag1JjfS6NfAAwUpxiLhS0mqROAvEOxqwQqawAV2Dt0m2CLPb64DnqigxkjDLsy7SdyfJI-YiwATGmmnIFNZUww8agLnOu7tZ-dIY2LRMlJCjwx-QN2ZCksrPaYjume4fhTMBxmpmhZKe800l-0ESjdjSWNC9Fjm4s20aV3kOYVDS3Lbxu1rnUaDsIiS6FMuqS0jrHfBeuzoPzSSlVhyMqdUSezS7VrdSFO42v0uvpo8ZGiBYLv7AaWpTUMWb1kkenY7dYDev_yPuS0iY1z4xARdo1WeNdaPAW6bg4o1TsE3GN3VyPZxWWj5AoivcHg-VHR1dNY95Aaa7oeHsuFiXvYHUqfJvJ4-0o94W7S1ysgWLYPvrDvjPVnAwbfOrjYXQn5AQVpboWbD8HvCLz89uYGPf_Ta9Libm5tfz9CsJSz76wozT7GxY4teoekfUTm_FvXXbbKXmHx7Sc1qALphI8fJ59uGY2gwKNNZfQCnTD_6OeUEcc1bNniSnA6sdnFOt3DKjgwXLDb65Eq2EkWBoXbGFQCpwj4zrq6264V7-AjpFJyI?type=png)](https://mermaid.live/edit#pako:eNqVVU1z2jAQ_SsancwktNMcOXSGhHbaTgikNOkB9yDLC1ZjS44-gDST_96VLXDAhja-gFfvzdt9u149U65SoAPa7_djaYXNYUBiOilBkplymgOZJAb0CjQZap4JC9w6DTGNZUWJ5SJXa54xbcmPy1gSfIxLlpqVGSldkgs-j-mQW6UNiW6U7MMmY85YsYJeTH_VDP84lDGIvfO_eycprPzBiFlGPsmlkHCIYJLlT8buUDMuAKvByA4GMq3_VDqk38cy7wWsyXty60A_xRRjH4kyag1JjfS6NfAAwUpxiLhS0mqROAvEOxqwQqawAV2Dt0m2CLPb64DnqigxkjDLsy7SdyfJI-YiwATGmmnIFNZUww8agLnOu7tZ-dIY2LRMlJCjwx-QN2ZCksrPaYjume4fhTMBxmpmhZKe800l-0ESjdjSWNC9Fjm4s20aV3kOYVDS3Lbxu1rnUaDsIiS6FMuqS0jrHfBeuzoPzSSlVhyMqdUSezS7VrdSFO42v0uvpo8ZGiBYLv7AaWpTUMWb1kkenY7dYDev_yPuS0iY1z4xARdo1WeNdaPAW6bg4o1TsE3GN3VyPZxWWj5AoivcHg-VHR1dNY95Aaa7oeHsuFiXvYHUqfJvJ4-0o94W7S1ysgWLYPvrDvjPVnAwbfOrjYXQn5AQVpboWbD8HvCLz89uYGPf_Ta9Libm5tfz9CsJSz76wozT7GxY4teoekfUTm_FvXXbbKXmHx7Sc1qALphI8fJ59uGY2gwKNNZfQCnTD_6OeUEcc1bNniSnA6sdnFOt3DKjgwXLDb65Eq2EkWBoXbGFQCpwj4zrq6264V7-AjpFJyI) ## Major Components The architecture has the following major components. +### Data Orchestration + +Dagster is the central data orchestration system, which manages the entire pipeline, +from the data ingestion (e.g. via [dlt](https://docs.dagster.io/integrations/embedded-elt/dlt) connectors), the [dbt](https://docs.dagster.io/integrations/dbt) pipeline, the [sqlmesh](https://github.com/opensource-observer/dagster-sqlmesh) pipeline, to copying mart models to data serving infrastructure. + +You can see our public Dagster dashboard at +[https://dagster.opensource.observer/](https://dagster.opensource.observer/). + ### Data Warehouse -Currently all data is stored and processed in Google BigQuery. +Currently all data is stored and processed in +[Google BigQuery](https://cloud.google.com/bigquery/?hl=en). All of the collected data or aggregated views used by OSO is also made publicly available here (if it is not already a public dataset on BigQuery). Anyone with can view, query, or build off of any stage in the pipeline. In the future we plan to explore a decentralized lakehouse. -### Data Orchestration +To see all datasets that you can subscribe to, check out our +[Data Overview](../integrate/overview/index.mdx). -Dagster is the central orchestration system, which manages the entire pipeline, -from the data ingestion, the dbt pipeline, to copying marts to data serving infrastructure. +### dbt pipeline -### API +We use a [dbt](https://www.getdbt.com/) pipeline to clean and normalize the data +into a universal event table. You can read more about our event model +[here](./event.md). -The API can be used by external developers to integrate insights from OSO. -Rate limits or cost sharing subscriptions may apply to it's usage depending -on the systems used. This also powers the OSO website. +### OLAP database -### Website +We use [Clickhouse](https://clickhouse.com/) +as a frontend database for serving live queries to the API server +and frontend website, as well as running a sqlmesh data pipeline. -This is the OSO website at [https://www.opensource.observer](https://www.opensource.observer). This website provides an easy to use public view into the data. +### sqlmesh pipeline -## Dependent Technologies +A [sqlmesh](https://sqlmesh.com/) pipeline +is used for computing time series metrics from +the universal event table, which is copied from the BigQuery dbt pipeline. -Our infrastructure is based on many wonderful existing tools. Our major -dependencies are: +### API service -- Google BigQuery - - As explained above, all of the data that OSO collects and materializes lives - in public datasets in BigQuery. -- Dagster - - Dagster orchestrates all data jobs, including the collection of data - from external sources as well as handling the flow of data through the - main data pipeline. -- dbt - - This is used for data transformations to turn collected data into useful - materializations for the OSO API and website. -- OLAP database - - All dbt mart models are copied to an OLAP database for real-time queries. - This database powers the OSO API, which in turn powers the OSO website. - -## Indexing Pipeline +We use [Hasura](https://hasura.io/) to automatically generate +a GraphQL API from our Clickhouse database. +We then use an [Apollo Router](https://www.apollographql.com/docs/router/) +to service user queries to the public. +The API can be used by external developers to integrate insights from OSO. +Rate limits or cost sharing subscriptions may apply to it's usage depending +on the systems used. This also powers the OSO website. -OSO maintains an [ETL](https://en.wikipedia.org/wiki/Extract%2C_load%2C_transform) data pipeline that is continuously deployed from our [monorepo](https://github.com/opensource-observer/oso/) and regularly indexes all available event data about projects in the [oss-directory](https://github.com/opensource-observer/oss-directory). +### OSO Website -- **Extract**: raw event data from a variety of public data sources (e.g., GitHub, blockchains, npm, Open Collective) -- **Transform**: the raw data into impact metrics and impact vectors per project (e.g., # of active developers) -- **Load**: the results into various OSO data products (e.g., our API, website, widgets) +The OSO website is served at +[https://www.opensource.observer](https://www.opensource.observer). +This website provides an easy to use public view into the data. +We currently use [Next.js](https://nextjs.org/) +hosted by [Vercel](https://vercel.com/). ## Open Architecture for Open Source Data -The architecture is designed to fully open to open source collaboration. +The architecture is designed to be fully open to maximum open source collaboration. With contributions and guidance from the community, we want Open Source Observer to evolve as we better understand what impact looks like in different domains. diff --git a/apps/docs/package.json b/apps/docs/package.json index 1e4846f4b..e2031c81c 100644 --- a/apps/docs/package.json +++ b/apps/docs/package.json @@ -27,7 +27,7 @@ "@docusaurus/theme-common": "3.4.0", "@laxels/docusaurus-plugin-segment": "^1.0.6", "@mdx-js/react": "^3.0.1", - "@plasmicapp/react-web": "^0.2.346", + "@plasmicapp/react-web": "^0.2.350", "clsx": "^2.1.1", "prism-react-renderer": "^2.3.1", "react": "^18.3.1", diff --git a/apps/docs/plasmic.json b/apps/docs/plasmic.json index 111304648..c20e9ffee 100644 --- a/apps/docs/plasmic.json +++ b/apps/docs/plasmic.json @@ -72,8 +72,8 @@ "icons": [ { "id": "bBICBhwqdJEn", - "name": "ChecksvgIcon", - "moduleFilePath": "generated/docs_opensource_observer/icons/PlasmicIcon__Checksvg.tsx" + "name": "CheckSvgIcon", + "moduleFilePath": "generated/docs_opensource_observer/icons/PlasmicIcon__CheckSvg.tsx" }, { "id": "4hUpITJttWoK", @@ -82,8 +82,8 @@ }, { "id": "IUL_Z3b2VK27", - "name": "IdeaSvgrepoComsvgIcon", - "moduleFilePath": "generated/docs_opensource_observer/icons/PlasmicIcon__IdeaSvgrepoComsvg.tsx" + "name": "IdeaSvgrepoComSvgIcon", + "moduleFilePath": "generated/docs_opensource_observer/icons/PlasmicIcon__IdeaSvgrepoComSvg.tsx" } ], "images": [ diff --git a/apps/docs/plasmic.lock b/apps/docs/plasmic.lock index 1e77f5c7a..d131ca25b 100644 --- a/apps/docs/plasmic.lock +++ b/apps/docs/plasmic.lock @@ -14,22 +14,22 @@ { "type": "renderModule", "assetId": "z50hW5Ihi9k5", - "checksum": "8041afb9d5261cb9dcfdde02df90afa5" + "checksum": "25dc6f80fe7c254e792c51fe2d43b4a9" }, { "type": "cssRules", "assetId": "z50hW5Ihi9k5", - "checksum": "8041afb9d5261cb9dcfdde02df90afa5" + "checksum": "25dc6f80fe7c254e792c51fe2d43b4a9" }, { "type": "renderModule", "assetId": "8u0yNVg3vXsq", - "checksum": "2d14ebf9cc6b71d89f1731028117ffc5" + "checksum": "4deb8270912232ef29700376c2ee2340" }, { "type": "cssRules", "assetId": "8u0yNVg3vXsq", - "checksum": "2d14ebf9cc6b71d89f1731028117ffc5" + "checksum": "4deb8270912232ef29700376c2ee2340" }, { "type": "renderModule", @@ -44,7 +44,7 @@ { "type": "icon", "assetId": "bBICBhwqdJEn", - "checksum": "6c340bbb97a866e45667be367e634e39" + "checksum": "f69ac7871123b31bcfd09c527de66a36" }, { "type": "icon", @@ -54,7 +54,7 @@ { "type": "icon", "assetId": "IUL_Z3b2VK27", - "checksum": "f747464e248d0fd06360dae746c9d7f8" + "checksum": "f8ee83a372a68c5c28e98fcdd36a7be6" }, { "type": "image", @@ -64,7 +64,7 @@ { "assetId": "2CtczDeUz9jL9qnFi6NWuQ", "type": "projectCss", - "checksum": "fb9ec7d982cabcf310fdbed935cf6a98" + "checksum": "ca22df23535ee01348fd2e1587fefcbb" } ], "codegenVersion": "0.0.1" diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index de1cffe76..3a268066e 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -46,8 +46,8 @@ importers: specifier: ^3.0.1 version: 3.0.1(@types/react@18.3.3)(react@18.3.1) '@plasmicapp/react-web': - specifier: ^0.2.346 - version: 0.2.346(@types/react@18.3.3)(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + specifier: ^0.2.350 + version: 0.2.350(@types/react@18.3.3)(react-dom@18.3.1(react@18.3.1))(react@18.3.1) clsx: specifier: ^2.1.1 version: 2.1.1 @@ -3255,8 +3255,8 @@ packages: peerDependencies: react: '>=16.8.0' - '@plasmicapp/data-sources@0.1.162': - resolution: {integrity: sha512-w52eO82H1qmsNNoEwOovh/VIfKip/0chB978fStplzBwlQzHxMZGwLjB1Q2fBcoPR25V+ToI04ABy7zJIMC4Pw==} + '@plasmicapp/data-sources@0.1.165': + resolution: {integrity: sha512-q4vlJyAGVDmrp3AhAh0odAtVgEZxV8KyEFseZteR8GEADlGBJdUNDrDuK1RHIBe+TD5ecvG9Kcj56/qpv6MKGQ==} engines: {node: '>=10'} peerDependencies: react: '>=16.8.0' @@ -3267,8 +3267,8 @@ packages: react: '>=16.8.0' react-dom: '>=16.8.0' - '@plasmicapp/host@1.0.203': - resolution: {integrity: sha512-MPRypNz8xihrtVmbCkBj5qePdhSNBNERwJcX29v5aA4pcmC1jV+tNAQPmlwJv5OprHvh5nhVRoy12LVqIE5isg==} + '@plasmicapp/host@1.0.206': + resolution: {integrity: sha512-XPsbk+MPW+QIUZPZLz9L/1D3MlyggG7CNSFP6lXXMWgJDbEiklqNlIujVPp4JxnqdfOm1Sia7miOl8CwfVq6Qg==} peerDependencies: react: '>=16.8.0' react-dom: '>=16.8.0' @@ -3315,6 +3315,14 @@ packages: react: '>=16.8.0' react-dom: '>=16.8.0' + '@plasmicapp/nextjs-app-router@1.0.12': + resolution: {integrity: sha512-D3h90ie5eTCiaSEHoUumu/dJEgw+O7E+P1mZBqwj2QIETIWFhqa7lQAh4c4IlJEyqpLJd3Yvzsdsp2IamW/lmQ==} + engines: {node: '>=16'} + hasBin: true + peerDependencies: + react: '>=16.8.0' + react-dom: '>=16.8.0' + '@plasmicapp/prepass@1.0.17': resolution: {integrity: sha512-xmQdVSa28EHfdVkKfnStbZjn/g8ZWzXBQWqGte6psA2lGOlSCY0kK4UjM2/9xFnhhnIMxCk9am4KtSdruxc0zg==} engines: {node: '>=12'} @@ -3333,8 +3341,8 @@ packages: peerDependencies: react: ^16.8.0 || ^17.0.0 || ^18.0.0 - '@plasmicapp/react-web@0.2.346': - resolution: {integrity: sha512-LbIUg6HFUlvH67O+y1/WUysHLDhPNWae8PtvCcGZapYe1P7ubeOZVQ5Q1ruJD9D/zLBBfX8u48CUABuU1Xufow==} + '@plasmicapp/react-web@0.2.350': + resolution: {integrity: sha512-5Du484Cy+sv8RwYE7neBlKioFPJL9FVi9qvdx8li9ZLNUPEtdCeyDyT6ClMBaXdjJKMuR1bVNBpafKsEAU4YUw==} peerDependencies: react: '>=16.8.0' react-dom: '>=16.8.0' @@ -17393,10 +17401,10 @@ snapshots: dependencies: react: 18.3.1 - '@plasmicapp/data-sources@0.1.162(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': + '@plasmicapp/data-sources@0.1.165(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': dependencies: '@plasmicapp/data-sources-context': 0.1.21(react@18.3.1) - '@plasmicapp/host': 1.0.203(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + '@plasmicapp/host': 1.0.206(react-dom@18.3.1(react@18.3.1))(react@18.3.1) '@plasmicapp/isomorphic-unfetch': 1.0.3 '@plasmicapp/query': 0.1.79(react@18.3.1) fast-stringify: 2.0.0 @@ -17412,7 +17420,7 @@ snapshots: react-dom: 18.3.1(react@18.3.1) window-or-global: 1.0.1 - '@plasmicapp/host@1.0.203(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': + '@plasmicapp/host@1.0.206(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': dependencies: '@plasmicapp/query': 0.1.79(react@18.3.1) csstype: 3.1.3 @@ -17482,6 +17490,18 @@ snapshots: react-dom: 18.3.1(react@18.3.1) yargs: 17.7.2 + '@plasmicapp/nextjs-app-router@1.0.12(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': + dependencies: + '@plasmicapp/prepass': 1.0.17(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + '@plasmicapp/query': 0.1.79(react@18.3.1) + cross-spawn: 7.0.3 + fkill: 8.1.1 + get-port: 7.1.0 + node-html-parser: 6.1.13 + react: 18.3.1 + react-dom: 18.3.1(react@18.3.1) + yargs: 17.7.2 + '@plasmicapp/prepass@1.0.17(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': dependencies: '@plasmicapp/query': 0.1.79(react@18.3.1) @@ -17498,14 +17518,14 @@ snapshots: dependencies: react: 18.3.1 - '@plasmicapp/react-web@0.2.346(@types/react@18.3.3)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': + '@plasmicapp/react-web@0.2.350(@types/react@18.3.3)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)': dependencies: '@plasmicapp/auth-react': 0.0.21(react@18.3.1) - '@plasmicapp/data-sources': 0.1.162(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + '@plasmicapp/data-sources': 0.1.165(react-dom@18.3.1(react@18.3.1))(react@18.3.1) '@plasmicapp/data-sources-context': 0.1.21(react@18.3.1) - '@plasmicapp/host': 1.0.203(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + '@plasmicapp/host': 1.0.206(react-dom@18.3.1(react@18.3.1))(react@18.3.1) '@plasmicapp/loader-splits': 1.0.62 - '@plasmicapp/nextjs-app-router': 1.0.11(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + '@plasmicapp/nextjs-app-router': 1.0.12(react-dom@18.3.1(react@18.3.1))(react@18.3.1) '@plasmicapp/prepass': 1.0.17(react-dom@18.3.1(react@18.3.1))(react@18.3.1) '@plasmicapp/query': 0.1.79(react@18.3.1) '@react-aria/checkbox': 3.14.3(react@18.3.1)