Hands on labs and code to help you learn, measure, and build using architectural best practices.
-
Updated
Sep 12, 2024 - Python
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
a general library for fatigue and reliability
Fast computation of Krippendorff's alpha agreement measure in Python.
Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.
[AAAI22 Oral] Reliable Propagation-Correction Modulation for Video Object Segmentation
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
Weibull Analysis Tools
GoldenEye is a functional simulator with fault injection capabilities for common and emerging numerical formats, implemented for the PyTorch deep learning framework.
Structural Fire Engineering - Probabilistic Reliability Assessment
Safe Init is a Python library that enhances AWS Lambda functions with advanced error handling, logging, monitoring, and resilience features, providing comprehensive observability and reliability for serverless applications.
PROTON - A Python Framework for Physics-Based Electromigration Assessment on Contemporary VLSI Power Grids
CausIL is an approach to estimate the causal graph for a cloud microservice system, where the nodes are the service-specific metrics while edges indicate causal dependency among the metrics. The approach considers metric variations for all the instances deployed in the system to build the causal graph and can account for auto-scaling decisions.
A Python package for reliability assessment of modern distribution systems.
High reliability asynchronous task queue using mysql
Computation of minimal cutsets using MOCUS Algorithm
Add a description, image, and links to the reliability topic page so that developers can more easily learn about it.
To associate your repository with the reliability topic, visit your repo's landing page and select "manage topics."