Healthcare Analytics

This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

Load Dataset

# Import library
import pyspark
import pandas as pd
import matplotlib.pyplot as plt
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("Healthcare").getOrCreate()

# Load dataset
df = spark.read.load("/content/healthcare_dataset.csv",format="csv", sep=",", inferSchema="true", header="true")
df.show(10)

Name	Age	Gender	Blood Type	Medical Condition	Date of Admission	Doctor	Hospital	Insurance Provider	Billing Amount	Room Number	Admission Type	Discharge Date	Medication	Test Results
Bobby Jackson	30	Male	B-	Cancer	2024-01-31	Matthew Smith	Sons and Miller	Blue Cross	18856.28	328	Urgent	2024-02-02	Paracetamol	Normal
Leslie Terry	62	Male	A+	Obesity	2019-08-20	Samantha Davies	Kim Inc	Medicare	33643.33	265	Emergency	2019-08-26	Ibuprofen	Inconclusive
Danny Smith	76	Female	A-	Obesity	2022-09-22	Tiffany Mitchell	Cook PLC	Aetna	27955.10	205	Emergency	2022-10-07	Aspirin	Normal
Andrew Watts	28	Female	O+	Diabetes	2020-11-18	Kevin Wells	Hernandez Rogers	Medicare	37909.78	450	Elective	2020-12-18	Ibuprofen	Abnormal
Adrienne Bell	43	Female	AB+	Cancer	2022-09-19	Kathleen Hanna	White-White	Aetna	14238.32	458	Urgent	2022-10-09	Penicillin	Abnormal
Emily Johnson	36	Male	A+	Asthma	2023-12-20	Taylor Newton	Nunez-Humphrey	UnitedHealthcare	48145.11	389	Urgent	2023-12-24	Ibuprofen	Normal
Edward Edwards	21	Female	AB-	Diabetes	2020-11-03	Kelly Olson	Group Middleton	Medicare	19580.87	389	Emergency	2020-11-15	Paracetamol	Inconclusive
Christina Martinez	20	Female	A+	Cancer	2021-12-28	Suzanne Thomas	Powell Robinson	Cigna	45820.46	277	Emergency	2022-01-07	Paracetamol	Inconclusive
Jasmine Aguilar	82	Male	AB+	Asthma	2020-07-01	Daniel Ferguson	Sons Rich and	Cigna	50119.22	316	Elective	2020-07-14	Aspirin	Abnormal
Christopher Berg	58	Female	AB-	Cancer	2021-05-23	Heather Day	Padilla-Walker	UnitedHealthcare	19784.63	249	Elective	2021-06-22	Paracetamol	Inconclusive

How many avg. billing by discharge year & test results

# How many avg. billing by discharge year & test results
billing_discharge_year_test = spark.sql("""
  SELECT
    YEAR(`Discharge Date`) AS `Discharge Year`,
    `Test Results`,
    ROUND(AVG(`Billing Amount`),2) AS `Avg. Billing`
  FROM Healthcare
  GROUP BY YEAR(`Discharge Date`), `Test Results`
  ORDER BY YEAR(`Discharge Date`);
""")
# Display the result
billing_discharge_year_test.show()

# Convert Spark DataFrame to Pandas DataFrame
billing_discharge_year_test_pd = billing_discharge_year_test.toPandas()
# Visualize Total Sales by Order Year & Product Category
billing_discharge_year_test_pd.groupby(['Discharge Year', 'Test Results'])['Avg. Billing'].mean().unstack().plot(marker = '.', cmap = 'viridis',
                                                                                                        figsize = (11,4))
plt.title('Avg. Billing by Discharge Year & Test Results', loc = 'center')
plt.xlabel('Discharge Year')
plt.ylabel('Avg. Billing')
plt.grid(color = 'darkgray', linestyle = ':', linewidth = 0.5)
labels, locations = plt.yticks()
plt.yticks(labels, (labels/10).astype(int))
plt.show()

Discharge Year	Test Results	Avg. Billing
2019	Abnormal	25982.14
2019	Normal	25900.65
2019	Inconclusive	25389.02
2020	Normal	25527.27
2020	Abnormal	24999.46
2020	Inconclusive	25472.13
2021	Inconclusive	25833.06
2021	Abnormal	25789.48
2021	Normal	25449.27
2022	Inconclusive	25713.45
2022	Abnormal	25667.41
2022	Normal	25202.89
2023	Inconclusive	25736.98
2023	Abnormal	25580.31
2023	Normal	25312.44
2024	Abnormal	25149.97
2024	Normal	25599.08
2024	Inconclusive	25350.01

How many avg. billing by test result

# How many avg. billing by test result
billing_test = spark.sql("""
  SELECT
    `Test Results`,
    ROUND(AVG(`Billing Amount`),2) AS `Avg. Billing`
  FROM Healthcare
  GROUP BY `Test Results`
  ORDER BY `Avg. Billing` DESC;
""")
# Display the result
billing_test.show()

# Convert Spark DataFrame to Pandas DataFrame
billing_test_pd = billing_test.toPandas()

# Plotting the pie chart
plt.figure(figsize=(5, 4))
plt.pie(
    billing_test_pd['Avg. Billing'],
    labels=billing_test_pd['Test Results'],
    autopct='%1.1f%%',
    colors=['lightgreen', 'lightblue', 'coral'],  # Add more colors if necessary
    startangle=140
)
plt.title('Avg. Billing by Test Results')
plt.show()

Test Results	Avg. Billing
Inconclusive	25623.69
Abnormal	25538.35
Normal	25456.65

Inspiration

The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Healthcare_Analytics.ipynb		Healthcare_Analytics.ipynb
README.md		README.md
discharge year & test results.png		discharge year & test results.png
healthcare_cover.jpg		healthcare_cover.jpg
healthcare_dataset.csv		healthcare_dataset.csv
pie chart.png		pie chart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Analytics

Load Dataset

How many avg. billing by discharge year & test results

How many avg. billing by test result

Inspiration

About

Releases

Packages

Languages

Bayunova28/Healthcare_Analytics

Folders and files

Latest commit

History

Repository files navigation

Healthcare Analytics

Load Dataset

How many avg. billing by discharge year & test results

How many avg. billing by test result

Inspiration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages