coursera-de-c2-truncate-file

Using Linux Lab

Goal: Truncate file with Bash

For this project you will create a shell pipeline that truncates a file via random shuffling, then verifies the correct number of lines. Many times large files are so big that traditional data science libraries like pandas or jupyter cannot process them. One approach to dealing with this problem is to sample the file by truncating and shuffling the results.

Part 1: Count the lines in the file and inspect the contents

Run wc -l nba_2017.csv
How many lines are in the file?
Run the head nba_2017.csv and inspect the first few rows of the file.

Part 2: Truncate and shuffle the file

Truncate and shuffle the file shuf -n 100 nba_2017.csv > small_nba_2017.csv
Count the number of lines. How many are there?
If you inspect the first few lines what do you see? `head nba_2017.csv

Part 3: Remove Column Names Before Shuffle

What happens when you run tail -n +2 nba_2017.csv | head?
How could use this approach to remove the column heads before shuffling?
Why would want to do this and how could you append them back on after you shuffle?

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
nba_2017.csv		nba_2017.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coursera-de-c2-truncate-file

Goal: Truncate file with Bash

Part 1: Count the lines in the file and inspect the contents

Part 2: Truncate and shuffle the file

Part 3: Remove Column Names Before Shuffle

About

Releases

Packages

noahgift/coursera-de-c2-truncate-file

Folders and files

Latest commit

History

Repository files navigation

coursera-de-c2-truncate-file

Goal: Truncate file with Bash

Part 1: Count the lines in the file and inspect the contents

Part 2: Truncate and shuffle the file

Part 3: Remove Column Names Before Shuffle

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages