Process 100 million rows of real-world measurement data as fast as possible on a single machine. Push the boundaries of data processing optimization while maintaining accuracy and showcasing your engineering prowess.
- The dataset will consist of 100 million rows with varied place names and temperatures.
- The dataset should be generated by running
CreateMeasurements.java
file in the repository (Make sure you have JDK 17+). - To generate 100 million rows, compile the java file and run
java CreateMeasurements 100000000
. - Each row follows the format: Place;Measurement, with the measurement value having exactly one fractional digit.
- Example rows:
- New York;23.5
- Los Angeles;45.1
- New York;19.7
Calculate the following for each city:
- ❄️ Minimum temperature
- 🔥 Maximum temperature
- ⚖️ Mean temperature
- Single machine only - no distributed computing
- Any programming language allowed
There are sample solutions in Java, C++ and Go in the submissions folder. You can use that as reference.
The output should list each unique place with its corresponding minimum, maximum, and average measurements. The values should be output to stdout and it should be in the following format.
Place: <place_name>
Min: <min_value>
Max: <max_value>
Average: <average_value>
🎯 Accuracy: The results must be accurate up to 6 decimal places
🏆 Ready to Rock?
- 🍴 Fork this repository
- 💻 Code your heart out
- 🚀 Upload your solution in the submissions folder