Skip to content

WebClub-NITK/100M-Rows-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

🚀100M-Rows-Challenge

🎯 The Challenge

Process 100 million rows of real-world measurement data as fast as possible on a single machine. Push the boundaries of data processing optimization while maintaining accuracy and showcasing your engineering prowess.

📊 The Dataset

  • The dataset will consist of 100 million rows with varied place names and temperatures.

📝 Input format

  • The dataset should be generated by running CreateMeasurements.java file in the repository (Make sure you have JDK 17+).
  • To generate 100 million rows, compile the java file and run java CreateMeasurements 100000000.
  • Each row follows the format: Place;Measurement, with the measurement value having exactly one fractional digit.
  • Example rows:
    • New York;23.5
    • Los Angeles;45.1
    • New York;19.7

🎮 The Task

Calculate the following for each city:

  • ❄️ Minimum temperature
  • 🔥 Maximum temperature
  • ⚖️ Mean temperature

⚔️ Rules

  • Single machine only - no distributed computing
  • Any programming language allowed

There are sample solutions in Java, C++ and Go in the submissions folder. You can use that as reference.

📈 Output format

The output should list each unique place with its corresponding minimum, maximum, and average measurements. The values should be output to stdout and it should be in the following format.

Place: <place_name>
Min: <min_value>
Max: <max_value>
Average: <average_value>

🎯 Accuracy: The results must be accurate up to 6 decimal places

🏆 Ready to Rock?

  • 🍴 Fork this repository
  • 💻 Code your heart out
  • 🚀 Upload your solution in the submissions folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published