upgma

Demonstration of the UPGMA hierarchal clustering algorithm in Pandas, Seaborn, and Scipy.

Introduction

The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm is a bottom up agglomerative/hierarchical clustering algorithm commonly performed on genetic distance matrices. Running the UPGMA algorithm generally allows for construction of a dendrogram. The code in this repository utilizes Pandas and Seaborn for data visualization and vectorization capabilities.

In the context of this repository, UPGMA performs deterministically. Therefore, results will always be the same for every run. In addition, as long as the data integrity is preserved, the data may be organized in any order and the results will still remain the same.

Start

Finish

Results

{('Man', 'Monkey'): 0.5,
 ('Turtle', 'Chicken'): 4.0,
 (('Man', 'Monkey'), 'Dog'): 6.25,
 (('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')): 7.875,
 ((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna'): 14.1875,
 (((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna'), 'Moth'): 18.21875}

Dendrogram

Dependencies

python3-numpy
python3-pandas
python3-scipy
python3-seaborn

Running the Code

Execute the upgma.py file in an IPython environment.

Tables may be viewed by running commands such as:

upgma.upgma_records[('Man', 'Monkey')]
upgma.upgma_records[(((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna'),'Moth')]

The phylogenetic distances may be viewed by running:

upgma.phylogeny

Notes

The Pandas styler contains a bug that affects one of the intermediate steps of this program. When the index is [((('Turtle', 'Chicken'), (('Man', 'Monkey'), 'Dog')), 'Tuna')], the original dataframe cannot be properly stylized.

See the created issue: pandas-dev/pandas#24687

ValueError: Buffer has wrong number of dimensions (expected 1, got 3)

FIX: The tuples have been stringified to prevent this strange, unpredictable behavior. However, this could represent greater problems in the pandas cython code base.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Finish.png		Finish.png
LICENSE		LICENSE
README.md		README.md
Start.png		Start.png
UPGMA.pptx		UPGMA.pptx
dendrogram.png		dendrogram.png
protein_diff.py		protein_diff.py
upgma.py		upgma.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

upgma

Introduction

Start

Finish

Results

Dendrogram

Dependencies

Running the Code

Notes

About

Releases

Packages

Languages

License

summonholmes/upgma

Folders and files

Latest commit

History

Repository files navigation

upgma

Introduction

Start

Finish

Results

Dendrogram

Dependencies

Running the Code

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages