Skip to content

DenisOgr/sort-very-large-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project focus

This project implements one of the prevalent tasks in Data Engineer interviews.

Task: We have a very large file with numbers. Need to sort numbers and store another single file. Main restrictions: the size of the RAM is very small. For example, the size of the input file equals to 3Gb, available RAM equals to 20Mb.

This task was resolved using External sorting algo.

Results

This application was tested using a 3Gb input file and 20Mb of memory limits. It takes 10 minutes for sorting file. With 100 Mb of memory limits, it takes ~ 3min.

Requirements

  • Python >= 3.6.x

Installation

git clone git@github.com:DenisOgr/sort-very-large-files.git

Run

  1. Put large file to directory data/input. This file should consist of numbers divided by newstring (\n) For example:
123
321
321
12
3
4
  1. Run command:
python main filename --mem_limit 20

About

This project implements External sorting algorithms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages