Survival Analysis in Python: A Quick Guide to The Weibull Analysis (2024)

A Quick Guide to The Weibull Analysis

Published in

Towards Data Science

MRR

The Median Rank Regression uses so called median ranks and the method of least squares in order to determine the Weibull parameters. Median ranks are estimated unreliability (or reliability) values for each failure (censored times cannot be considered by MRR, only the total number of suspensions). More precisely, MRR estimates are based on the median ranks of the individual failure time and not the actual failure time values.

We will use the Analysis class in predictr in order to conduct the Weibull Analysis.

from predictr import Analysis# Data from testing
# Failures and suspensions are lists containing the valuesfailures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True)
x.mrr()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (5)

MLE

In contrast to the MRR, the MLE considers the actual failure and suspension times. Increasing the number of suspensions mainly increases the Weibull scale parameter, the shape parameter estimate does not significantly change. We perform the MLE for the same data (Note: Analysis is already imported):

# Data from testing
# Failures and suspensions are lists containing the valuesfailures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True)
x.mle()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (6)

Statistical values are stored as so called attributes of the objects in Analysis. Hence, we can print and/or save them. Check out the official documentation of predictr to have an overview of all object attributes.

# We are using f-strings
# x and y are the names of the class objects we created (see code above)
# beta and eta are the attributes we want to access and print
# Just type object.attribute to access them
# e.g. for the object x type the following: x.beta or x.etaprint(f'MRR: beta={x.beta:2f}, eta={x.eta:2f}\nMLE: beta={y.beta:2f}, eta={y.eta:2f}\n')# Output
>>> MRR: beta=1.760834, eta=1.759760
 MLE: beta=2.003876, eta=1.707592

Solely using point estimates is risky, especially if you only have a small number of failed units from testing. Assuming that estimated sample statistics, e.g. the Weibull parameters, are close or even equal to the population statistics will likely result in a false sense of security. It is generally recommended to use confidence bounds methods in your Weibull Analysis. By using confidence intervals, we can assume with a certain confidence that the actual true population (or ground truth) Weibull line lies within this interval. Hence, we are less likely to overestimate our system’s reliability. A typical confidence interval is 90%, meaning the lower confidence bound could be set to 5% and the upper to 95%. The bounds could also be set to 1% and 91%. As you can see, the interval is only defined by the difference between the lower and upper bounds and don‘t have to be symmetrical.

It is important to note that there are two kinds of bounds:

Bounds for a fixed unreliability/reliability value (e.g. R(t)=80%) on the the time-to-failure axis, e.g. lower bound: 5000 hours and upper bound: 7000 hours.
Bounds for a fixed time-to-failure (e.g. t= 6000 hours) value on the the unreliabilty/reliability axis, e.g. lower bound: R(t)=20% and upper bound: R(t)=38%.

There are plenty confidence bounds methods to choose from. I will soon publish a follow-up medium article about which method to chose in which situation. The following table lists the supported confidence bounds methods by predictr.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (7)

You can choose between two-sided (2s) and and one-sided (1sl: one-sided lower; 1su: one-sided upper) confidence bounds. All methods except Beta-Binomial bounds use bounds for a fixed unreliability/reliability value. By changing the argument values, you can customize your Weibull analysis.

Let’s use Beta-Binomial bounds for the same data which we already used:

from predictr import Analysis# Data from testing
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]# Weibull Analysis with two-sided bounds and a plot
x = Analysis(df=failures, ds=suspensions, show=True, bounds='bbb', bounds_type='2s')
x.mrr()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (8)

Now, let’s conduct an MLE and use two-sided Likelihood Ratio bounds for the same data:

from predictr import Analysis# Data from testing
failures = [0.4508831, 0.68564703, 0.76826143, 0.88231395, 1.48287253, 1.62876357]
suspensions = [1.62876357, 1.62876357, 1.62876357, 1.62876357]# Weibull Analysis
x = Analysis(df=failures, ds=suspensions, show=True, bounds='lrb', bounds_type='2s')
x.mle()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (9)

Small sample sizes or number of failures result in biased Weibull parameter estimates. There are no clearly defined hard limits for small or big enough sample sizes. Simulation data shows that sample sizes equal or greater than 20 tend to result in significantly more accurate estimates, no matter which parameter estimation and confidence bounds method one uses. But in practice, reliability engineers often have to deal with much smaller sample sizes. Therefore, the use of bias-correction methods is quite common. Predictr supports following bias-correction methods:

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (10)

Bias-corrections influence the estimation of Weibull parameters as well as the confidence bounds. Accurate estimates of Weibull parameters using bias-correction methods do not automatically result in more accurate confidence bounds! Not all confidence bounds are equally sensitive to bias-corrections. For more information on this topic, you can check out my publications on bias-corrections¹².

To better understand what the effects of biased estimates are and how bias-corrections work, let’s conduct a Monte-Carlo (MC) study. We will repeatedly draw random samples (sample size n=6, uncensored) from a predetermined Weibull distribution (β =2 and η=1 aka our ground truth) and conduct a Weibull Analysis for each of them. For each sample, the resulting Weibull line will be drawn in the Weibull probabilty plot. The number of MC trials is set to 10,000. All randomly drawn samples are represented by blue lines, whereas the ground truth color is set to red.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (11)

As can be seen from the plot, the estimated Weibull parameters vary quite a lot for n=6 although the samples were drawn from the same Weibull distribution. This shows how small sample sizes could yield biased estimates. Increasing the sample size to 40 decreases the bias of the estimates (drawn Weibull lines are generally closer to the ground truth). This is expected, since the MLE is asymptotically unbiased.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (12)

The plots below show histograms for all 10,000 estimated Weibull parameters. For small sample sizes, the shape parameter tends to be overestimated and is not symmetrically distributed (in contrast to the scale parameters). That is the reason why nearly all bias-correction methods solely focus on the shape parameter and try to decrease it. Most bias-corrections derive a correction factor from the difference between the ground truth and sample mean (or sample median) from MC studys. Keep in mind that these bias-corrections could falsely adjust estimates downwards when the actual estimate is already underestimated. But in general, bias-correction methods should work the way they are intended to.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (13)

In predictr, the bcm argument sets the bias-correction. We will draw one random uncensored sample from a two-parameter Weibull distribution and will apply a bias-correction to the estimates.

# Needed imports
from scipy.stats import weibull_min
from predictr import Analysis
import numpy as np# Draw one random sample with a set seed for reproducibility np.random.seed(seed=42)
sample = np.sort(weibull_min.rvs(2, loc = 0, scale = 1, size = 4))x = Analysis(df=sample, bcm='c4', bounds='fb', show=True)
x.mle()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (14)

The legend shows the estimates for the uncorrected MLE (dashed line). Using the C4 correction, the corrected shape parameter estimate is 2.4, which is closer to the ground truth value of 2.0. Try out other bias-corrections methods in predictr and compare the results!

PlotAll is another class in predictr that let’s you create and save insightful plots. It uses the object and its attributes created in Analysis. Following methods are currently integrated in PlotAll:

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (15)

In order to compare two or more designs (prototypes in this example), you can use the mult_weibull method in PlotAll:

from predictr import Analysis, PlotAll# Create new objects, e.g. name them prototype_a and prototype_b
failures_a = [0.30481336314657737, 0.5793918872111126, 0.633217732127894, 0.7576700925659532, 0.8394342818048925, 0.9118100898948334, 1.0110147142055477, 1.0180126386295232, 1.3201853093496474, 1.492172669340363]prototype_a = Analysis(df=failures_a, bounds='lrb', bounds_type='2s')
prototype_a.mle()failures_b = [1.8506941739639076, 2.2685555679846954, 2.380993183650987, 2.642404955035375, 2.777082863078587, 2.89527127055147, 2.9099992138728927, 3.1425481097241, 3.3758727398694406, 3.8274990886889997]prototype_b = Analysis(df=failures_b, bounds='pbb', bounds_type='2s')
prototype_b.mle()# Create dictionary with Analysis objects
# Keys will be used in figure legend. Name them as you please.objects = {fr'proto_a: $\widehat\beta$={prototype_a.beta:4f} | $\widehat\eta$={prototype_a.eta:4f}': prototype_a, fr'proto_b: $\widehat\beta$={prototype_b.beta:4f} | $\widehat\eta$={prototype_b.eta:4f}': prototype_b}# Use mult_weibull() method
PlotAll(objects).mult_weibull()

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (16)

In order to draw density functions, use the weibull_pdf method:

from predictr import Analysis, PlotAll# Use analysis for the parameter estimation
failures1 = [3, 3, 3, 3, 3, 3, 4, 4, 9]
failures2 = [3, 3, 5, 6, 6, 4, 9]
failures3 = [5, 6, 6, 6, 7, 9]a = Analysis(df=failures1, bounds='lrb', bounds_type='2s', show = False, unit= 'min')
a.mle()b = Analysis(df=failures1, ds = failures2, bounds='fb', bounds_type='2s', show = False, unit= 'min')
b.mle()c = Analysis(df=failures3, bounds='lrb', bcm='hrbu', bounds_type='2s', show = False, unit= 'min')
c.mle()# Use weibull_pdf method in PlotAll to plot the Weibull pdfs
# beta contains the Weibull shape parameters, which were estimated using Analysis class. Do the same for the Weibull scale parameter eta.
# Cusomize the path directory in order to use this code
PlotAll().weibull_pdf(beta = [a.beta, b.beta, c.beta], eta = [a.eta, b.eta, c.eta], linestyle=['-', '--', ':'], labels = ['A', 'B', 'C'], x_bounds=[0, 20, 100], plot_title = 'Comparison of three Prototypes', x_label='Time to Failure', y_label='Density Function', save=False, color=['black', 'black', 'black'])

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (17)

Please check the official documentation for more examples and a detailed description of the code.

You’re now able to conduct your own Weibull Analyses with knowledge of fundamental statistical concepts using predictr. Try out different combinations of testing data, parameter estimations, confidence bounds, and bias-corrections in order to get a feel for mutual interdependencies.

References

T. Tevetoglu and B. Bertsche, “On the Coverage Probability of Bias-Corrected Confidence Bounds,” 2020 Asia-Pacific International Symposium on Advanced Reliability and Maintenance Modeling (APARM), 2020, pp. 1–6, doi: 10.1109/APARM49247.2020.9209464.
T. Tevetoglu and B. Bertsche, “Bias Corrected Weibull Parameter Estimation and Impact on Confidence Bounds”. Esrel2020-PSAM15, 2020 doi: 10.3850/978–981–14–8593–0_3925-cd.

I hope this quick guide has been helpful to you. Follow me for more! You can contact me for feedback or feature requests regarding predictr on github or in the comments.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis (2024)

FAQs

What is the Weibull model for survival analysis? ›

The Weibull parametric model for EC survival analysis allows simultaneous characterization of the treatment effect in terms of the hazard ratio and the event time ratio (ETR), which is likely to be better understood. This method can be extended to study progression free survival and disease specific survival.

View Details ›

What is survival analysis Python? ›

The objective in survival analysis — also referred to as reliability analysis in engineering — is to establish a connection between covariates and the time of an event. The name survival analysis originates from clinical research, where predicting the time to death, i.e., survival, is often the main objective.

Tell Me More ›

How to perform Weibull Analysis? ›

Step 1: Determine the asset(s) to be analysed.
Step 2: Determine the component failure mode for that asset(s)
Step 3: Obtain as much relevant life data as practical.
Step 4: Classify life data.

When to use Weibull? ›

The Weibull distribution is widely used in modeling failure times, because a great variety of shapes of probability curves can be generated by different choices of the two parameters, β and α.

How many data points do you need for a Weibull Analysis? ›

For all models except Gompertz, there must be at least 2 data points for each unique unit ID. For the Gompertz model, there must be at least 3 data points for each unique unit ID. The data must produce enough extrapolated failure/suspension times to perform life data analysis with the selected life distribution.

Tell Me More ›

Why is Weibull used for survival analysis? ›

The Weibull, being both accelerated and proportional, therefore allows the simultaneous description of treatment effects both in terms of hazard ratios and also in terms of the relative increase or decrease in survival time; we might conveniently refer to this latter quantification of treatment effect as an “event time ...

View Details ›

What are the benefits of Weibull analysis? ›

It supports a comprehensive analysis of failure data, providing a clear understanding of how and why products fail. Employing Weibull analysis can result in better product design, improved quality control, and overall enhanced system reliability.

Show Me More ›

What is the hazard of Weibull? ›

In all three parameterizations, the hazard is decreasing for k < 1, increasing for k > 1 and constant for k = 1, in which case the Weibull distribution reduces to an exponential distribution.

What is the Weibull model? ›

In probability theory and statistics, the Weibull distribution /ˈwaɪbʊl/ is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

Discover More ›

What is the Weibull Analysis primarily used for? ›

Weibull Analysis is a statistical analysis that is used to determine reliability characteristics and trends from field and/or test failure data. It allows decisions to be made based on a limited amount of data.

Get More Info Here ›

What is the algorithm used for survival analysis? ›

The RSF model is an extension of the random forest algorithm that is specifically designed for survival analysis. The RSF model uses decision trees to split the data into subgroups based on the predictor variables and estimates the survival probability for each subgroup.

What are the assumptions of the Weibull model? ›

ABSTRACT The usual assumption in Weibull regression is that the scale parameter is a function of the predictor variables, and the shape parameter is constant.

Find Out More ›