ASSOCIATION MINING

Description :-

Reference: https://en.wikipedia.org/wiki/Association_rule_learning

Association mining falls into the category of UNSUPERVISED LEARNING. Association mining is useful to find patterns or rules for 2 or more items in a dataset. In this sample, associations are calculated as follows:-

Association from Diagnoses to Services ,
Association from Diagnosis and Provider to Service.

Screenshots towards the bottom of this page show that even if one does not have a medical background, one can get a pretty good understanding of associated diagnoses and services.

An association may be POSITIVE i.e. presence of an item implies presence of another item, or NEGATIVE i.e. presence of an item implies absence of another item. This positive/negative association is derived from a ratio of two components :-

LIFT = ACTUAL / EXPECTED , tells us - " How much more than expected is our Association ? "
When LIFT = 1, it means there is neither positive nor negative association, i.e. items compared are independent.
When LIFT > 1, it means there is positive association, i.e. presence of one item implies presence of the other item.
When LIFT < 1, it means there is negative association, i.e. presence of one item implies absence of the other item.

The ACTUAL and EXPECTED metrics are calculated using concepts called SUPPORT and CONFIDENCE.

SUPPORT represents the frequency of an item in the dataset.
CONFIDENCE represents conditional probability, i.e. probability of finding RHS item provided LHS item already exists.
Support A = (No. of transactions containing A) / (Total No. of transactions)
Support A to B = (No. of transactions containing A and B) / (Total No. of transactions)
Confidence A to B = (No. of transactions containing A and B) / (No. of transactions containing A)
Expected Confidence A to B = (No. of transactions containing B) / (Total No. of transactions)
Lift A to B = (Confidence A to B) / (Expected Confidence A to B)

Python programs :-

(1) Clean raw data:-

Python program to clean raw csv files: ~/association_mining/step01_clean_raw_data/CleanRawData.py
INPUT: Raw input csv files at ~/association_mining/step01_clean_raw_data/raw_csv_files/*.csv
OUTPUT: Clean csv files at ~/association_mining/step02_association_mining/clean_csv_files/

(2) Association Mining:-

Python program to find associations: ~/association_mining/step02_association_mining/AssociationMining.py
INPUT: Clean csv files at ~/association_mining/step02_association_mining/clean_csv_files/*.csv
OUTPUT: ~/association_mining/step02_association_mining/clean_csv_files/tran_df.csv

Data explained :- (Input data intentionally not provided)

Data is a sample of claims data. Columns explained below:-

RAW CSV FILES

(1) raw_csv_files/tran.csv

tid: Transaction ID. This is equivalent to a claim id. A claim is submitted by a provider for receiving payment. This tid is the metric counted for finding associations.
servprov: Servicing Provider ID. This is just an ID column.
diagcode: Diagnosis ID. This is just an ID column, not the actual diagnosis code.
servcode: Service Code. This is just an ID column, not the actual service code. The claim tells us which provider rendered what service against which diagnoses.

(2) raw_csv_files/diag.csv

dimDiagnosisID: same as diagcode in the transactions file. This is just an ID column.
DiagnosisCode: Diagnosis code present on the claim.
DiagnosisShortDesc: Short description of the diagnosis.
DiagnosisLongDesc: Long description of the diagnosis.

(3) raw_csv_files/prov.csv

dimProviderID: same as servprov in the transactions file. This is just an ID column.
ProviderName: Provider's name

(4) raw_csv_files/serv.csv

dimServiceCodeID: same as servcode in the transactions file. This is just an ID column.
ServiceCode: Service code present on the claim.
ServiceCodeShortDesc: Short description of the service rendered.
ServiceCodeLongDesc: Long description of the diagnosis rendered.

CLEAN CSV FILES

(1) clean_csv_files/clean_tran.csv. This is generated by cleaning the raw csv file.

tid: Transaction ID. This is equivalent to a claim id. A claim is submitted by a provider for receiving payment. This tid is the metric counted for finding associations.
servprov: Servicing Provider ID. This is just an ID column.
diagcode: Diagnosis ID. This is just an ID column, not the actual diagnosis code.
servcode: Service Code. This is just an ID column, not the actual service code. The claim tells us which provider rendered what service against which diagnoses.

(2) clean_csv_files/clean_diag.csv. This is generated by cleaning the raw csv file.

dimDiagnosisID: same as diagcode in the transactions file. This is just an ID column.
DiagnosisCode: Diagnosis code present on the claim.
DiagnosisShortDesc: Short description of the diagnosis.
DiagnosisLongDesc: Long description of the diagnosis.

(3) clean_csv_files/clean_prov.csv. This is generated by cleaning the raw csv file.

dimProviderID: same as servprov in the transactions file. This is just an ID column.
ProvName: Provider's name randomly scrambled.

(4) clean_csv_files/clean_serv.csv. This is generated by cleaning the raw csv file.

dimServiceCodeID: same as servcode in the transactions file. This is just an ID column.
ServiceCode: Service code present on the claim.
ServiceCodeShortDesc: Short description of the service rendered.
ServiceCodeLongDesc: Long description of the diagnosis rendered.

OUTPUT CSV FILE: clean_csv_files/tran_df.csv This file is the final output with all association mining metrics calculated.

Few PowerBI Report screenshots of the final output :-

(1) Services associated with diagnosis TYPE 2 DIABETES MELLITUS PDR MACULAR EDEMA BILATERAL

(2) Diagnoses associated with service Treatment of extensive or progressive retinopathy (eg, diabetic retinopathy), photocoagulation

(3) Services associated with diagnosis Primary osteoarthritis, right hand

(4) Diagnoses associated with service APPLICATION CAST ELBOW FINGER SHORT ARM

(5) Services associated with diagnosis Osteonecrosis in diseases classified elsewhere, left thigh

**

(6) Services associated with diagnosis Eyelid retraction left upper eyelid

These associations have also been deployed as an API Web Application. See https://github.com/nsb700/association-mining-webapp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASSOCIATION MINING

Description :-

Python programs :-

Data explained :- (Input data intentionally not provided)

Few PowerBI Report screenshots of the final output :-

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
screenshots		screenshots
step01_clean_raw_data		step01_clean_raw_data
step02_association_mining		step02_association_mining
ConstantsNamespace.py		ConstantsNamespace.py
readme.md		readme.md

nsb700/association_mining

Folders and files

Latest commit

History

Repository files navigation

ASSOCIATION MINING

Description :-

Python programs :-

Data explained :- (Input data intentionally not provided)

Few PowerBI Report screenshots of the final output :-

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages