Assignments as given in the course of CSE545.
Dataset: http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
In this assignment, we were supposed to parse the blog corpus and run on spark to
- Find all the industries.
- Get the number of times a industry was mentioned in all the blogs
Dataset: https://lta.cr.usgs.gov/high_res_ortho
The goal of the assignment was to find similar regions in Long island using satellite image and running the code as in AWS.
Objectives:
- Implement Locality Sensitive Hashing to find similar regions.
- Implement dimensionality reduction to reduce the size of the images.
- Preprocess the data to split and reduce the resolution of the images, flatten them, calculate the intensity and clip the image values.
Here I used Incremental SVD as proposed in
"Sarwar, Badrul, et al. "Incremental singular value decomposition algorithms for highly scalable recommender systems." Fifth International Conference on Computer and Information Science. Citeseer, 2002."