Skip to content

Finding similar documents using LSH with MapReduce on multi-node Spark Cluster

License

Notifications You must be signed in to change notification settings

Yasar2019/BigData-HW03

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigData-HW03

Class : BigData Mining And Applications 321524 at NTUT

By: Yasar Nazzarian 1120120349 IEECS and 郭柏辰 PoChenKuo 112599003 資工博一/computer science first-year PhD student


Cluser Info

Alt text

1. Project Setup and Information

1.1. Environment

  • Python 3.11.4
  • Java: openjdk version "17.0.1" 2021-10-19 OpenJDK Runtime Environment Temurin-17.0.1+12 (build 17.0.1+12) OpenJDK 64-Bit Server VM Temurin-17.0.1+12 (build 17.0.1+12, mixed mode, sharing)
  • Spark 3.5.0
  • Scala 3.3.1

1.2. Input

  • Input file: Data
  • Input file size: 27.0 MB
  • Input file format: sgm

1.3. Output

You can find all the output files in the folder named "output"

  • Output files: output
  • Output files format: csv

1.4. Code

2. How to run the code

spark-submit --master spark://96.9.210.170:7077 HW03.py k

About

Finding similar documents using LSH with MapReduce on multi-node Spark Cluster

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages