Skip to content
@PerfLab-EXaCT

Performance Lab for EXtreme Computing and daTa

Performance Lab for EXtreme Computing and daTa

BigFlow Suite: Performance analysis and scheduling for distributed scientific workflows

  • DataLife: The combination of ever-growing scientific datasets and distributed workflow complexity creates I/O performance bottlenecks due to data volume, velocity, and variety. DataLife is a measurement and analysis toolset for distributed scientific workflows comprised of tasks that interact using files and storage. DataLife performs data flow lifecycle (DFL) analysis to guide decisions regarding coordinating task and data flows on distributed resources. DataLife provides tools for measuring, analyzing, visualizing, and estimating the severity of flow bottlenecks based on I/O and storage.
  • DaYu: The increasing use of descriptive data formats (e.g., HDF5, netCDF) helps organize scientific datasets, but it also creates obscure bottlenecks due to the need to translate high level operations into file addresses and then into low-level I/O operations. DaYu is a method and toolset for analyzing (a) semantic relationships between logical datasets and file addresses, (b) how dataset operations translate into I/O, and (c) the combination across entire workflows. DaYu's analysis and visualization enables identification of critical bottlenecks and reasoning about remediation. With DaYu, one can extract workflow data patterns, develop insights into the behavior of data flows, and identify opportunities for both users and I/O libraries to optimize the applications.
  • TAZeR: TAZeR (Transparent Asynchronous Zero-copy Remote I/O) is a remote I/O framework for transparently minimizing the access latencies of remote I/O in workflows. TAZeR captures dynamic and irregular inter-task locality, both temporal and spatial, via adaptive hierarchical staging that ensures most frequently accessed data is `close'.

  • BigFlowSim: BigFlowSim is a workflow I/O simulator-emulator and trace generator that captures several parameters that affect local and remote I/O performance. BigFlowSim generates a large variety of flows within and between tasks of distributed workflows. The BigFlowSim Driver is helpful for conducting experiments.

Distributed AI Services

  • MassiveGNN: Graph Neural Networks (GNN) based on massively connected (distributed) GNNs pose significant challenges as even with the best methods, GNN training usually suffers from communication bottlenecks and load imbalance. MassiveGNN introduces performant and productive training for massively connected (distributed) GNNs within the state-of-the-art Amazon DistDGL (distributed Deep Graph Library). It brings practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs by developing a parameterized continuous prefetch and eviction scheme.

  • SamIAm: (Demo) Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics -- where are boundaries located? what segments are logically similar? -- change depending on the domain, such that state-of-the-art foundation models can generate meaningless and incorrect results. Moreover, in certain domains, fine-tuning and retraining techniques are infeasible: obtaining labels is costly and time-consuming; domain images (micrographs) can be exponentially diverse; and data sharing (for third-party retraining) is restricted. To enable rapid adaptation of the best segmentation technology, we define semantic boosting: given a zero-shot foundation model, guide its segmentation and adjust results to match domain expectations. We apply semantic boosting to the Segment Anything Model (SAM) to obtain microstructure segmentation for transmission electron microscopy. Our booster, SAM-I-Am, extracts geometric and textural features of various intermediate masks to perform mask removal and mask merging operations.

Application Performance Analysis and Prediction

  • MemGaze: MemGaze is a memory analysis toolset that combines high-resolution trace analysis and low overhead measurement, both with respect to time and space.

  • Palm: Palm is a suite of performance modeling tools (Palm, Palm-Task, Representative-Paths, Palm/FastFootprints, MIAMI-NW) to assist performance analysis and predictive model generation. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. Palm has been used to model irregular applications with sparse data structures and unpredictable access patterns. Recent additions focus on rapid characterization of memory behavior.

  • QuaL²M (QuaLM): Quantitative Learned Latency Model [Extra datasets]

Workload Benchmarking and Characterization

  • Scientific workflows: A suite of distributed scientific workflows with a range of workload characteristics

  • SEAK Suite: The SEAK Suite is a collection of constraining problems for common embedded computing challenges. A constraining problem is a mission-centric and goal-oriented problem specification that separate problem-domain constraints from solution implementations so as to encourage creative solutions that meet goals but that may deviate from standard implementations.

  • PERFECT Suite: The PERFECT Suite consists of kernels and applications for evaluating tradeoffs between performance, power, and architecture within the domains of radar and image processing.

  • miniVite-x: Mini-application to demonstrate different memory patterns and test memory analysis tools.

Miscellaneous tools for performance analysis and modeling

Misc work in progress

Pinned Loading

  1. ubench ubench Public

    C 1

  2. utools utools Public

    Python 1

Repositories

Showing 10 of 13 repositories
  • llm-perf Public
    PerfLab-EXaCT/llm-perf’s past year of commit activity
    1 0 0 0 Updated Jan 11, 2025
  • .github Public
    PerfLab-EXaCT/.github’s past year of commit activity
    0 0 0 0 Updated Dec 26, 2024
  • SamIAm Public
    PerfLab-EXaCT/SamIAm’s past year of commit activity
    Python 0 1 0 1 Updated Aug 21, 2024
  • ubench Public
    PerfLab-EXaCT/ubench’s past year of commit activity
    C 1 0 0 0 Updated Aug 14, 2024
  • llm-agent Public
    PerfLab-EXaCT/llm-agent’s past year of commit activity
    0 0 0 0 Updated Jul 25, 2024
  • PerfLab-EXaCT/fl-carbon-efficient’s past year of commit activity
    Python 0 0 0 0 Updated Jul 23, 2024
  • DaYu Public
    PerfLab-EXaCT/DaYu’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jul 17, 2024
  • PerfLab-EXaCT/fl-variable-power’s past year of commit activity
    0 0 0 0 Updated Jul 10, 2024
  • PerfLab-EXaCT/SamIAm-LabelStudio’s past year of commit activity
    Jupyter Notebook 1 0 0 0 Updated Mar 5, 2024
  • utools Public
    PerfLab-EXaCT/utools’s past year of commit activity
    Python 1 0 0 0 Updated Feb 10, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…