Skip to content

Tools for doing performance tuning of Horovod training

Notifications You must be signed in to change notification settings

armandmcqueen/horovod-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

horovod-utils

Tools for working with Horovod

htimeline

Command line tool for working with very large (100+ GB) Horovod timeline files. Allows you to get a summary of the timeline (size, duration) and extract a slice of the timeline that will fit in memory for chrome://tracing.

Highly-optimized. Naive approaches to extracting a slice can take 15+ minutes. htimeline takes under 1 minute to extract the first slice and uses cached indexes so extracting additional slices takes seconds.

TIG

Utilities for installing the Telegraf-Influx-Grafana stack to monitor machine performance during training.

Network Utilization

Utility for recording and graphing high-granularity network usage to determine if training is network bottlenecked. Functional, but TIG is now the recommended approach for examining network utilization.

About

Tools for doing performance tuning of Horovod training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published