Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
RowanTrickett committed Aug 14, 2023
1 parent b9b5fa0 commit 023b315
Showing 1 changed file with 147 additions and 14 deletions.
161 changes: 147 additions & 14 deletions nbs/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,32 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# database_compendium\n",
"#| hide\n",
"from database_compendium.utils import (\n",
" ONS_scraper_functions as osf,\n",
" Nomis_scraper_functions as nsf,\n",
" insolvency_stats_scrapers as iss,\n",
" NHS_QualityOutcomes_scrapers as qos,\n",
" police_data_scrapers as pds,\n",
" column_matching as cm\n",
")\n",
"\n",
"> Collecting, storing, and exploring dataset metadata."
"import requests\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This file will become your README and also the index of your documentation."
"# Database Compendium\n",
"\n",
"> Collecting, storing, and exploring metadata of datasets from the ONS API, Nomis API, gov.uk, police data API, and nhs.uk"
]
},
{
Expand All @@ -42,6 +55,25 @@
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Requirements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Python 3.x\n",
"- requests library\n",
"- pandas library\n",
"- BeautifulSoup library\n",
"- re (Regular Expression) module\n",
"- Math"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -53,7 +85,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Fill me in please! Don't forget code examples:"
"These scripts are aimed at retrieving dataset titles, two descriptions (long and short), column titles, unique non-numeric column values, and the release date / date of last update."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ONS Functions\n",
"> ONS functions include: get_ONS_datasets_titles_descriptions(), get_ONS_long_description(), get_ONS_datasets_urls(), find_ONS_cols(), find_ONS_cols_and_unique_vals(). For more information check the specific documentation for the ONS functions."
]
},
{
Expand All @@ -62,18 +102,74 @@
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"Title: Quarterly personal well-being estimates\n",
"\n",
"Description: Seasonally and non seasonally-adjusted quarterly estimates of life satisfaction, feeling that the things done in life are worthwhile, happiness and anxiety in the UK.\n",
"\n",
"Long_description: We are currently reviewing the measures of national well-being and will update this page in summer 2\n",
"\n",
"Columns: ['v4_2', 'LCL', 'UCL', 'yyyy-qq', 'Time', 'uk-only', 'Geography', 'measure-of-wellbeing', 'MeasureOfWellbeing', 'wellbeing-estimate', 'Estimate', 'seasonal-adjustment', 'SeasonalAdjustment']\n",
"\n",
"Unique_parameters: {'v4_2': None, 'LCL': None, 'UCL': None, 'yyyy-qq': ['2017-q2', '2011-q2', '2020-q2', '2016-q4', '2021-q3', '2020-q3', '2014-q3', '2019-q4', '2022-q2', '2021-q1', '2015-q1', '2013-q4', '2022-q4', '2018-q2', '2017-q3', '2020-q1', '2017-q1', '2012-q4', '2021-q4', '2015-q4', '2019-q2', '2022-q1', '2018-q4', '2015-q3', '2014-q2', '2013-q1', '2016-q1', '2022-q3', '2018-q3', '2015-q2', '2019-q1', '2020-q4', '2014-q1', '2014-q4', '2021-q2', '2018-q1', '2013-q3', '2019-q3', '2012-q1', '2011-q3', '2016-q3', '2016-q2', '2012-q3', '2012-q2', '2017-q4', '2013-q2', '2011-q4'], 'Time': ['2017 Q2', '2011 Q2', '2020 Q2', '2016 Q4', '2021 Q3', '2020 Q3', '2014 Q3', '2019 Q4', '2022 Q2', '2021 Q1', '2015 Q1', '2013 Q4', '2022 Q4', '2018 Q2', '2017 Q3', '2020 Q1', '2017 Q1', '2012 Q4', '2021 Q4', '2015 Q4', '2019 Q2', '2022 Q1', '2018 Q4', '2015 Q3', '2014 Q2', '2013 Q1', '2016 Q1', '2022 Q3', '2018 Q3', '2015 Q2', '2019 Q1', '2020 Q4', '2014 Q1', '2014 Q4', '2021 Q2', '2018 Q1', '2013 Q3', '2019 Q3', '2012 Q1', '2011 Q3', '2016 Q3', '2016 Q2', '2012 Q3', '2012 Q2', '2017 Q4', '2013 Q2', '2011 Q4'], 'uk-only': ['K02000001'], 'Geography': ['United Kingdom'], 'measure-of-wellbeing': ['anxiety', 'worthwhile', 'happiness', 'life-satisfaction'], 'MeasureOfWellbeing': ['Anxiety', 'Worthwhile', 'Happiness', 'Life satisfaction'], 'wellbeing-estimate': ['poor', 'very-good', 'good', 'average-mean', 'fair'], 'Estimate': ['Poor', 'Very good', 'Good', 'Average (mean)', 'Fair'], 'seasonal-adjustment': ['non-seasonal-adjustment', 'seasonal-adjustment'], 'SeasonalAdjustment': ['Non-seasonally adjusted', 'Seasonally adjusted']}\n",
"\n",
"Latest_release: 2023-05-12T00:00:00.000Z\n"
]
}
],
"source": [
"1+1"
"from database_compendium.utils.ONS_scraper_functions import *\n",
"\n",
"# Titles and Descriptions\n",
"titles, descriptions = get_ONS_datasets_titles_descriptions()\n",
"\n",
"# Long Descriptions\n",
"long_description = get_ONS_long_description()\n",
"\n",
"# Dataset URLs which are used to open the dataset and read the column data\n",
"urls = get_ONS_datasets_urls()\n",
"\n",
"# Dataset Columns\n",
"columns = find_ONS_cols(urls[0])\n",
"\n",
"# Dataset Unique Column Values\n",
"unique_column_vals = find_ONS_cols_and_unique_vals(urls[0])\n",
"\n",
"# In this case there isn't a function that gets the date but it can easily be done \n",
"# using the urls as follows\n",
"response = requests.get(urls[0], timeout=1)\n",
"latest_release = str(response.json()['release_date'])\n",
"\n",
"# Display the output of each function\n",
"print('Title: ', titles[0])\n",
"print('\\nDescription: ', descriptions[0]) \n",
"print('\\nLong_description: ', long_description[0][:100])\n",
"print('\\nColumns: ', columns)\n",
"print('\\nUnique_parameters: ', unique_column_vals)\n",
"print('\\nLatest_release: ', latest_release)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Nomis Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Insolvency Functions"
]
},
{
Expand All @@ -82,6 +178,43 @@
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Police Data Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These scripts interact with the ONS, Nomis, and Police APIs and various web pages. Be mindful of their usage to avoid overloading the servers or violating any terms of use of the APIs.\n",
"\n",
"Remember that web scraping can be subject to changes in the website's structure and terms of use. Keep the scripts up-to-date and adapt them as needed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please note that the provided documentation is a general guide based on the information available in the code snippet. You might need to adjust the documentation according to the specific needs of your project and any further developments made to the code."
]
}
],
"metadata": {
Expand Down

0 comments on commit 023b315

Please sign in to comment.