Survival Analysis in Python: A Quick Guide to The Weibull Analysis (2024)

The Weibull Analysis is very popular among reliability engineers due to its flexibility and straightforwardness. This guide will demonstrate basic concepts of the Weibull Analysis with sample code. In order to conduct the Weibull Analysis, we will be using the open source Python package predictr.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (3)
  1. Installing and Using predictr
  2. The Weibull Plot
  3. Parameter Estimation: MRR vs. MLE
  4. Confidence Intervals
  5. Bias-Corrections
  6. Comprehensive Plots
  7. Conclusion

You need to have Python 3 installed (version >3.5). If you’re new to Python, just download anaconda and set up a virtual environment according to the anaconda documentation, e.g. paste this code into terminal (macOS, Linux) and command (Windows), respectively:

conda create -n my_env python=3.10

This code creates a new virtual environment called my_env with Python 3.10. Feel free to change the name and Python version.

The next step involves the installation predictr using pip in terminal (or command):

conda activate my_env
pip install predictr

In order to use predictr in your IDE or text editor of choice, just import the predictr module in your python file:

import predictr

predictr has two classes: Analysis for Weibull analyses, and PlotAll for detailed plots. For a comprehensive documentation of predictr with many examples check out the official documentation.

Probability plots allow to grasp an idea about the present data and compare regression lines, i.e. failure modes and failure data, with each other. In Weibull Analysis the plot is called Weibull Probability Plot. It is essential to understand the plot. Usually, the plot consists of…

  • a double-logarithmic y-axis (unreliability),
  • a logarithmic x-axis (time to failure, e.g. number of cycles),
  • a Weibull line (parameter estimation of the Weibull shape and scale parameter) and median ranks of the given data,
  • and confidence bounds (one-sided or two-sided).
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (4)

The legend is optional, however it is recommended to show information like sample size n (=number of failures f + number of suspensions s), the parameter estimation method that is being used (Maximum Likelihood Estimation (MLE) or Median Rank Regression (MRR) or other), the actual estimated Weibull parameters (β and η), which confidence bounds method is being used (Fisher Bounds, Likelihood Ratio Bounds, Bootstrap Bounds, Beta-Binomial Bounds, Monte-Carlo Pivotal Bounds, …), and the confidence level.

Both MLE and MRR can be used to estimate the Weibull shape and scale parameter. In this tutorial, we consider the Weibull location parameter to be zero, i.e. a two-parameter Weibull distribution:

  • The shape parameter β represents the slope of the Weibull line and describes the failure mode (-> the famous bathtub curve)
  • The scale parameter η is defined as the x-axis value for an unreliability of 63.2 %

Let’s assume we gathered following, type-II right-censored data from testing:

  1. Failures: 0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357 (6 failures in total)
  2. Suspensions: 1.62876357, 1.62876357, 1.62876357, 1.62876357 (4 suspensions in total)

Our data is censored, and therefore we have to deal with suspensions. Suspensions are units that did not fail during testing. MRR and MLE handle this information differently.

MRR

The Median Rank Regression uses so called median ranks and the method of least squares in order to determine the Weibull parameters. Median ranks are estimated unreliability (or reliability) values for each failure (censored times cannot be considered by MRR, only the total number of suspensions). More precisely, MRR estimates are based on the median ranks of the individual failure time and not the actual failure time values.

We will use the Analysis class in predictr in order to conduct the Weibull Analysis.

from predictr import Analysis# Data from testing
# Failures and suspensions are lists containing the values
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]
# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True)
x.mrr()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (5)

MLE

In contrast to the MRR, the MLE considers the actual failure and suspension times. Increasing the number of suspensions mainly increases the Weibull scale parameter, the shape parameter estimate does not significantly change. We perform the MLE for the same data (Note: Analysis is already imported):

# Data from testing
# Failures and suspensions are lists containing the values
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]
# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True)
x.mle()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (6)

Statistical values are stored as so called attributes of the objects in Analysis. Hence, we can print and/or save them. Check out the official documentation of predictr to have an overview of all object attributes.

# We are using f-strings
# x and y are the names of the class objects we created (see code above)
# beta and eta are the attributes we want to access and print
# Just type object.attribute to access them
# e.g. for the object x type the following: x.beta or x.eta
print(f'MRR: beta={x.beta:2f}, eta={x.eta:2f}\nMLE: beta={y.beta:2f}, eta={y.eta:2f}\n')# Output
>>> MRR: beta=1.760834, eta=1.759760
MLE: beta=2.003876, eta=1.707592

Solely using point estimates is risky, especially if you only have a small number of failed units from testing. Assuming that estimated sample statistics, e.g. the Weibull parameters, are close or even equal to the population statistics will likely result in a false sense of security. It is generally recommended to use confidence bounds methods in your Weibull Analysis. By using confidence intervals, we can assume with a certain confidence that the actual true population (or ground truth) Weibull line lies within this interval. Hence, we are less likely to overestimate our system’s reliability. A typical confidence interval is 90%, meaning the lower confidence bound could be set to 5% and the upper to 95%. The bounds could also be set to 1% and 91%. As you can see, the interval is only defined by the difference between the lower and upper bounds and don‘t have to be symmetrical.

It is important to note that there are two kinds of bounds:

  1. Bounds for a fixed unreliability/reliability value (e.g. R(t)=80%) on the the time-to-failure axis, e.g. lower bound: 5000 hours and upper bound: 7000 hours.
  2. Bounds for a fixed time-to-failure (e.g. t= 6000 hours) value on the the unreliabilty/reliability axis, e.g. lower bound: R(t)=20% and upper bound: R(t)=38%.

There are plenty confidence bounds methods to choose from. I will soon publish a follow-up medium article about which method to chose in which situation. The following table lists the supported confidence bounds methods by predictr.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (7)

You can choose between two-sided (2s) and and one-sided (1sl: one-sided lower; 1su: one-sided upper) confidence bounds. All methods except Beta-Binomial bounds use bounds for a fixed unreliability/reliability value. By changing the argument values, you can customize your Weibull analysis.

Let’s use Beta-Binomial bounds for the same data which we already used:

from predictr import Analysis# Data from testing
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]
# Weibull Analysis with two-sided bounds and a plot
x = Analysis(df=failures, ds=suspensions, show=True, bounds='bbb', bounds_type='2s')
x.mrr()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (8)

Now, let’s conduct an MLE and use two-sided Likelihood Ratio bounds for the same data:

from predictr import Analysis# Data from testing
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]
# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True, bounds='lrb', bounds_type='2s')
x.mle()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (9)

Small sample sizes or number of failures result in biased Weibull parameter estimates. There are no clearly defined hard limits for small or big enough sample sizes. Simulation data shows that sample sizes equal or greater than 20 tend to result in significantly more accurate estimates, no matter which parameter estimation and confidence bounds method one uses. But in practice, reliability engineers often have to deal with much smaller sample sizes. Therefore, the use of bias-correction methods is quite common. Predictr supports following bias-correction methods:

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (10)

Bias-corrections influence the estimation of Weibull parameters as well as the confidence bounds. Accurate estimates of Weibull parameters using bias-correction methods do not automatically result in more accurate confidence bounds! Not all confidence bounds are equally sensitive to bias-corrections. For more information on this topic, you can check out my publications on bias-corrections¹².

To better understand what the effects of biased estimates are and how bias-corrections work, let’s conduct a Monte-Carlo (MC) study. We will repeatedly draw random samples (sample size n=6, uncensored) from a predetermined Weibull distribution (β =2 and η=1 aka our ground truth) and conduct a Weibull Analysis for each of them. For each sample, the resulting Weibull line will be drawn in the Weibull probabilty plot. The number of MC trials is set to 10,000. All randomly drawn samples are represented by blue lines, whereas the ground truth color is set to red.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (11)

As can be seen from the plot, the estimated Weibull parameters vary quite a lot for n=6 although the samples were drawn from the same Weibull distribution. This shows how small sample sizes could yield biased estimates. Increasing the sample size to 40 decreases the bias of the estimates (drawn Weibull lines are generally closer to the ground truth). This is expected, since the MLE is asymptotically unbiased.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (12)

The plots below show histograms for all 10,000 estimated Weibull parameters. For small sample sizes, the shape parameter tends to be overestimated and is not symmetrically distributed (in contrast to the scale parameters). That is the reason why nearly all bias-correction methods solely focus on the shape parameter and try to decrease it. Most bias-corrections derive a correction factor from the difference between the ground truth and sample mean (or sample median) from MC studys. Keep in mind that these bias-corrections could falsely adjust estimates downwards when the actual estimate is already underestimated. But in general, bias-correction methods should work the way they are intended to.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (13)

In predictr, the bcm argument sets the bias-correction. We will draw one random uncensored sample from a two-parameter Weibull distribution and will apply a bias-correction to the estimates.

# Needed imports
from scipy.stats import weibull_min
from predictr import Analysis
import numpy as np
# Draw one random sample with a set seed for reproducibility np.random.seed(seed=42)
sample = np.sort(weibull_min.rvs(2, loc = 0, scale = 1, size = 4))
x = Analysis(df=sample, bcm='c4', bounds='fb', show=True)
x.mle()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (14)

The legend shows the estimates for the uncorrected MLE (dashed line). Using the C4 correction, the corrected shape parameter estimate is 2.4, which is closer to the ground truth value of 2.0. Try out other bias-corrections methods in predictr and compare the results!

PlotAll is another class in predictr that let’s you create and save insightful plots. It uses the object and its attributes created in Analysis. Following methods are currently integrated in PlotAll:

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (15)

In order to compare two or more designs (prototypes in this example), you can use the mult_weibull method in PlotAll:

from predictr import Analysis, PlotAll# Create new objects, e.g. name them prototype_a and prototype_b
failures_a = [0.30481336314657737, 0.5793918872111126, 0.633217732127894, 0.7576700925659532, 0.8394342818048925, 0.9118100898948334, 1.0110147142055477, 1.0180126386295232, 1.3201853093496474, 1.492172669340363]
prototype_a = Analysis(df=failures_a, bounds='lrb', bounds_type='2s')
prototype_a.mle()
failures_b = [1.8506941739639076, 2.2685555679846954, 2.380993183650987, 2.642404955035375, 2.777082863078587, 2.89527127055147, 2.9099992138728927, 3.1425481097241, 3.3758727398694406, 3.8274990886889997]prototype_b = Analysis(df=failures_b, bounds='pbb', bounds_type='2s')
prototype_b.mle()
# Create dictionary with Analysis objects
# Keys will be used in figure legend. Name them as you please.
objects = {fr'proto_a: $\widehat\beta$={prototype_a.beta:4f} | $\widehat\eta$={prototype_a.eta:4f}': prototype_a, fr'proto_b: $\widehat\beta$={prototype_b.beta:4f} | $\widehat\eta$={prototype_b.eta:4f}': prototype_b}# Use mult_weibull() method
PlotAll(objects).mult_weibull()
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (16)

In order to draw density functions, use the weibull_pdf method:

from predictr import Analysis, PlotAll# Use analysis for the parameter estimation
failures1 = [3, 3, 3, 3, 3, 3, 4, 4, 9]
failures2 = [3, 3, 5, 6, 6, 4, 9]
failures3 = [5, 6, 6, 6, 7, 9]
a = Analysis(df=failures1, bounds='lrb', bounds_type='2s', show = False, unit= 'min')
a.mle()
b = Analysis(df=failures1, ds = failures2, bounds='fb', bounds_type='2s', show = False, unit= 'min')
b.mle()
c = Analysis(df=failures3, bounds='lrb', bcm='hrbu', bounds_type='2s', show = False, unit= 'min')
c.mle()
# Use weibull_pdf method in PlotAll to plot the Weibull pdfs
# beta contains the Weibull shape parameters, which were estimated using Analysis class. Do the same for the Weibull scale parameter eta.
# Cusomize the path directory in order to use this code
PlotAll().weibull_pdf(beta = [a.beta, b.beta, c.beta], eta = [a.eta, b.eta, c.eta], linestyle=['-', '--', ':'], labels = ['A', 'B', 'C'], x_bounds=[0, 20, 100], plot_title = 'Comparison of three Prototypes', x_label='Time to Failure', y_label='Density Function', save=False, color=['black', 'black', 'black'])
Survival Analysis in Python: A Quick Guide to The Weibull Analysis (17)

Please check the official documentation for more examples and a detailed description of the code.

You’re now able to conduct your own Weibull Analyses with knowledge of fundamental statistical concepts using predictr. Try out different combinations of testing data, parameter estimations, confidence bounds, and bias-corrections in order to get a feel for mutual interdependencies.

References

  1. T. Tevetoglu and B. Bertsche, “On the Coverage Probability of Bias-Corrected Confidence Bounds,” 2020 Asia-Pacific International Symposium on Advanced Reliability and Maintenance Modeling (APARM), 2020, pp. 1–6, doi: 10.1109/APARM49247.2020.9209464.
  2. T. Tevetoglu and B. Bertsche, “Bias Corrected Weibull Parameter Estimation and Impact on Confidence Bounds”. Esrel2020-PSAM15, 2020 doi: 10.3850/978–981–14–8593–0_3925-cd.

I hope this quick guide has been helpful to you. Follow me for more! You can contact me for feedback or feature requests regarding predictr on github or in the comments.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (2024)

FAQs

What is the Weibull model for survival analysis? ›

The Weibull parametric model for EC survival analysis allows simultaneous characterization of the treatment effect in terms of the hazard ratio and the event time ratio (ETR), which is likely to be better understood. This method can be extended to study progression free survival and disease specific survival.

What is survival analysis Python? ›

The objective in survival analysis — also referred to as reliability analysis in engineering — is to establish a connection between covariates and the time of an event. The name survival analysis originates from clinical research, where predicting the time to death, i.e., survival, is often the main objective.

How to perform Weibull Analysis? ›

  1. Step 1: Determine the asset(s) to be analysed.
  2. Step 2: Determine the component failure mode for that asset(s)
  3. Step 3: Obtain as much relevant life data as practical.
  4. Step 4: Classify life data.

What is the survival function of the Weibull distribution? ›

where f (t) is the probability density function for failure at time t, S(t) is the survival function describing the probability that the light bulb has not failed up to time t, and h(t) is the hazard function.

What is the best model for survival analysis? ›

The two most common survival analysis techniques are the Kaplan-Meier method and Cox proportional hazard model.

What is the difference between Kaplan-Meier and Weibull? ›

The Kaplan-Meier estimate is used mainly as a descriptive tool. The Weibull model produces a smooth survival curve instead of a step function. The Weibull model assumes a Weibull distribution.

What is example of survival analysis? ›

As the name "survival time analysis" implies, there is also a classic example: the time to death after a disease. Here, the start time is the recognition of the disease and the end time is death. It is often of great interest to know whether a particular drug has an effect on survival time.

What are the disadvantages of Kaplan-Meier? ›

Cons: It is mainly descriptive. It does not control for covariates. It can not accommodate time-dependent variables.

Is Kaplan-Meier a survival analysis? ›

Kaplan–Meier analysis. The Kaplan–Meier method is a more sophisticated method of summarising survival data, which uses all the cases in a series, not just those followed up until the selected cut-off.

What is the formula for the Weibull Analysis? ›

If X has the standard exponential distribution (parameter 1), then Y=bX1/k has the Weibull distribution with shape parameter k and scale parameter b. If Y has the Weibull distribution with shape parameter k and scale parameter b, then X=(Y/b)k has the standard exponential distribution.

When to use Weibull? ›

The Weibull distribution is widely used in modeling failure times, because a great variety of shapes of probability curves can be generated by different choices of the two parameters, β and α.

How many data points do you need for a Weibull Analysis? ›

For all models except Gompertz, there must be at least 2 data points for each unique unit ID. For the Gompertz model, there must be at least 3 data points for each unique unit ID. The data must produce enough extrapolated failure/suspension times to perform life data analysis with the selected life distribution.

Why is Weibull used for survival analysis? ›

The Weibull, being both accelerated and proportional, therefore allows the simultaneous description of treatment effects both in terms of hazard ratios and also in terms of the relative increase or decrease in survival time; we might conveniently refer to this latter quantification of treatment effect as an “event time ...

What are the benefits of Weibull analysis? ›

It supports a comprehensive analysis of failure data, providing a clear understanding of how and why products fail. Employing Weibull analysis can result in better product design, improved quality control, and overall enhanced system reliability.

What is the hazard of Weibull? ›

In all three parameterizations, the hazard is decreasing for k < 1, increasing for k > 1 and constant for k = 1, in which case the Weibull distribution reduces to an exponential distribution.

What is the Weibull model? ›

In probability theory and statistics, the Weibull distribution /ˈwaɪbʊl/ is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

What is the Weibull Analysis primarily used for? ›

Weibull Analysis is a statistical analysis that is used to determine reliability characteristics and trends from field and/or test failure data. It allows decisions to be made based on a limited amount of data.

What is the algorithm used for survival analysis? ›

The RSF model is an extension of the random forest algorithm that is specifically designed for survival analysis. The RSF model uses decision trees to split the data into subgroups based on the predictor variables and estimates the survival probability for each subgroup.

What are the assumptions of the Weibull model? ›

ABSTRACT The usual assumption in Weibull regression is that the scale parameter is a function of the predictor variables, and the shape parameter is constant.

Top Articles
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 6453

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.