Application of python libraries for variance, normal distribution and Weibull distribution analysis in diagnosing and operating production systems
Andrzej Chmielowiec 1  
,   Leszek Klich 1  
More details
Hide details
Rzeszow University of Technology
Andrzej Chmielowiec   

Rzeszow University of Technology
Submission date: 2021-10-02
Final revision date: 2021-12-02
Acceptance date: 2021-12-03
Online publication date: 2021-12-06
Publication date: 2021-12-06
Diagnostyka 2021;22(4):89–105
The use of statistical methods in the diagnosis of production processes dates back to the beginning of the 20th century. Widespread computerization of processes made enterprises face the challenge of processing large sets of measurement data. The growing number of sensors on production lines requires the use of faster and more effective methods of both process diagnostics and finding connections between individual systems. This article is devoted to the use of Python libraries to effectively solve some problems related to the analysis of large data sets. The article is based on the experience related to data analysis in a large company in the automotive industry, whose annual production reaches 10 million units. The methods described in this publication were the basis for the initial analysis of production data in the plant, and the obtained results fed the production database and the created automatic anomaly detection system based on artificial intelligence algorithms.
Aigner R, Leitner M, Stoschka M. Fatigue strength characterization of Al-Si cast material incorporating statistical size effect. In G. Henaff, editor, 12th International Fatigue Congress (FATIGUE 2018), volume 165 of MATEC Web of Conferences, 2018.
Almeida A, Loy A, Hofmann H. ggplot2 Compatible Quantile-Quantile Plots in R. The R Journal. 2018; 10(2):248-261.
Almeida JB. Application of weilbull statistics to the failure of coatings. Journal of Materials Processing Technology. 1999; 92-93:257-263.
Anderson TW, Darling DA. Asymptotic Theory of Certain ’Goodness of Fit’ Criteria Based on Stochastic Processes. Annals of Mathematical Statis-tics. 1952; 23(2):193-212.
Ansell JI, Phillips MJ. Practical Methods for Reliability Data Analysis. Oxford University Press, Oxford. 1994.
Augustin NH, Sauleau EA, Wood SN. On quantile quantile plots for generalized linear models. Computational Statistics & Data Analysis. 2012; 56(8):2404-2409.
Bai YL, Yan ZW, Ozbakkaloglu T, Han Q, Dai JG, Zhu DJ. Quasistatic and dynamic tensile properties of large-rupture-strain (LRS) polyethylene terephthalate fiber bundle. Construction and Building Materials, 2020; 232.
Baringhaus N, Henze N. A consistent test for multivariate normality based on the empirical characteristic function. Metrika. 1988; 35:339-348.
Baringhaus N, Henze N. Limit distributions for mardia’s measure of multivariate skewness. Annals of Statistics, 1992; 20(4):1889-1902.
Barlow RE, Proschan F. Statisitcal Theory of Reliability and Life Testing. Holt, Rinehart, Austin, 1975.
Birolini A. Reliability Engineering. Springer, Berlin, Heidelberg, 2017.
Bowman AW, Foster PJ. Adaptive Smoothing and Density-Based Tests of Multivariate Normality. Journal of the American Statistical Association, 1993; 88(422):529-537.
Box G. Signal-to-Noise Ratios, Performance Criteria, and Transformations. Technometrics, 1988; 30(1):1-17.
Box GEP, Hunter JS. Condensed calculations for evolutionary operation programs. Technometrics, 1959; 1(1):77-95.
Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey. CoRR, 2019; abs/1901.03407.
Chmielowiec A. Weibull distribution and its application in the process of optimizing the operating costs of non-repairable elements. Prediction in mechanical and automatic systems 2020 - mathematical and statistical modelling volume 1. pages 45-73, Oficyna Wydawnicza Politechniki Rzeszowskiej, Rzeszow, 2020. Rozkład Weibulla i jego zastosowanie w procesie optymalizacji kosztów eksploatacji elementów nienaprawialnych
Chmielowiec A. Algorithm for error-free deter-mination of the variance of all contiguous sub-sequences and fixed-length contiguous subsequences for a sequence of industrial measurement data. Computational Statistics, 2021.
Csorgo S. Testing for Normality in Arbitrary Dimension. Annals of Statistics, 1986; 14(2):708-723.
Das KR, Imon AHMR. A brief review of tests for normality. American Journal of Theoretical and Applied Statistics, 2016; 5(1):5-12.
Dhar SS, Chakraborty B, Chaudhuri P. Comparison of multivariate distributions using quantile–quantile plots and related tests. Bernoulli, 2014; 20(3):1484-1506.
Ding J, Liu Y, Zhang L, Wang J, Liu Y. An anomaly detection approach for multiple monitoring data series based on latent correlation probabilistic model. Applied Intelligence, 2016; 44:340-361.
Ebner B, Henze N.. Tests for multivariate normality - a critical review with emphasis on weighted L2-statistics. TEST, 2020; 29:845-892.
Elsayed EA, Chen A. Optimal levels of process parameters for products with multiple characteristics. International Journal of Production Research, 1993; 31(5):1117-1132.
Epps TW, Pulley LB. A test for normality based on the empirical characteristic function. Biometrika, 1983; 70(2):723-726.
Evans K, Love T, Thurston SW. Outlier identification in model-based cluster analysis. Journal of Classification, 2015; 32:63-84.
Fisher RA. The moments of the distribution for normal samples of measures of departure from normality. Proceedings of the Royal Society, 1930; 130(812):16-28.
Fisher RA, Tippett LMC. Limiting forms of frequency distribution of the largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, 1928; 24:180-190.
Fok SL, Mitchell BC, Smart J, Marsden BJ. A numerical study on the application of the weibull theory to brittle materials. Engineering Fracture Mechanics, 2001; 68(10):1171-1179.
Frechet M. Sur la loi de probabilite de l’ecart maximum. Annales de la Societe Polonaise de Mathematique, 1927; 6:93-116.
Grynchenko O, Alfyorov O. Mechanical Reliability. Springer, Cham, 2020.
Guner B, Frankford MT, Johnson JT. A study of the Shapiro-Wilk test for the detection of pulsed sinusoidal radio frequency interference. IEEE transactions on Geoscience and Remote sensing, 2009; 47(6):1745-1751.
Hawkins DM. Identification of outliers. Springer, Dordrecht, 1980.
Hemphill MA, Yuan T, Wang GY, Yeh JW, Tsai CW, Chuang A, Liaw PK. Fatigue behavior of Al0.5CoCrCuFeNi high entropy alloys. Acta Materialia, 2012; 60(16):5723-5734.
Henze N. Invariant tests for multivariate normality: a critical review. Statistical Papers, 2002; 43(4):467-506.
Henze N, Visagie J. Testing for normality in any dimension based on a partial differential equation involving the moment generating function. Annals of the Institute of Statistical Mathematics, 2019; 5:1-28.
Henze N, Wagner T. A New Approach to the BHEP Tests for Multivariate Normality. Journal of Multivariate Analysis, 1997; 62(1):1-23.
Henze N, Zirkler B. A class of invariant and consistent tests for multivariate normality. Communications in Statistics - Theory and Methods, 1990; 19(10):3595-3617.
Hyndman RJ, Wang E, Laptev N. Large-scale unusual time series detection. In 2015 IEEE international conference on data mining workshop (ICDMW), pages 1616-1619, 2015.
Jacquelin J. Inference of sampling on Weibull parameter estimation. IEEE transactions on dielectrics and electrical insulation, 1996; 3(6):809-816.
Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions, volume 1. Wiley, New York, 1994.
Johnson VE, Hamada M, Martz H, Reese S, Wilson A. Modern Reliability Analysis: A Bayesian Perspective. Springer, Berlin, Heidelberg, New York, 2005.
Kackar RN. Off-Line Quality Control, Parameter Design, and the Taguchi Method. Journal of Quality Technology, 1985; 17(4):176-209.
Kackar RN, Shoemaker AC. Robust design: A cost effective method for improving manufacturing process, pages 159-174. Springer, Boston, 1989.
Keogh E, Lin J. Clustering of time-series subsequences is meaningless: implications for previousand future research. Knowledge and Information Systems, 2005; 8(2):154-177.
Keogh E, Lin J, Fu A. Hot sax: efficiently finding the most unusual time series subsequence. In Fifth IEEE international conference on data mining (ICDM’05), page 8, 2005.
Keogh E, Lin J, Lee SH, Herle HV. Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 2006; 11(1):1-27.
Keshevan K, Sargent G, Conrad H. Statistical analysis of the hertzian fracture of pyrex glass using the weibull distribution function. Journal of Materials Scien-ce, 1980; 15:839-844.
Knuth DE. The Art of Computer Programming, volume II: Seminumerical Algorithms. Addison-Wesley, 2 edition, 1981.
Lai C, Murthy DN, Xie M. Weibull Distributions and Their Applications, pages 63–78. Springer London, London, 2006.
Lai CD. Generalized Weibull Distributions. Springer, Heidelberg, 2014.
Laptev N, Amizadeh S, Flint I. Generic and scalable framework for automated time-series anomaly detection. In KDD’15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1939-1947, 2015.
Leon RV, Shoemaker AC, Kackar RN. Performance Measure Independent of Adjustment: An Explanation and Extension of Taguchi’s Signal-to-Noise Ratio. Technometrics, 1987; 29(3):253-265.
Li QS, Fang JQ, Liu DK, Tang J. Failure probability prediction of concrete components. Cement and Concrete Research, 2003; 33(10):1631-1636.
Lilliefors HW. On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. Journal of the American Statistical Association, 1967; 62(318):399-402.
Logothetis N, Haigh A. Characterizing and optimizing multi-response processes by the Taguchi method. Quality and Reliability Engineering International, 1988; 4(2):159-169.
Lowry CA, Montgomery DC. A review of multivariate control charts. IIE Transactions, 1995; 27(6):800-810.
Malkovich JF, Afifi AA. On Tests for Multivariate Normality. Journal of the American Statistical Association, 1973; 68(341):176-179.
Mardia KV. Measures of multivariate skewness and kurtosis with applications. Biometrika, 1970; 57(3):519-530.
Mason RL, Champ CW, Tracy ND, Wierda SJ, Young JC. Assessment of multivariate process control techniques. Journal of Quality Technology, 1997;29(2):140-143.
Massey JF. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 1951; 46(253):68-78.
McPherson JW. Reliability Physics and Engineering: Time-To-Failure Modeling. Springer, Cham, 2019.
Montgomery DC. Introduction to statistical quality control. Wiley, New York, 2019.
Montgomery DC, Woodall WH. A Discussion on Statistically-Based Process Monitoring and Control. Journal of Quality Technology, 1997; 29(2):121-121.
Mori TF, Szekely GJ, Rizzo ML. On energy tests of normality. Journal of Statistical Planning and Inference, 2021; 213:1-15.
Murthy DNP, Xie M, Jiang R. Weibull Models. Wiley, New York, 2003.
Nair VN. Taguchi’s Parameter Design: A Panel Discussion. Technometrics, 1992; 34(2):127-161.
Newell JA, Kurzeja T, Spence M, Lynch M. Analysis of recoil compressive failure in high performance polymers using two-, four-parameter weibull models. High Performance Polymers, 2002; 14:425-434.
Oldford WR. Self-calibrating quantile–quantile plots. The American Statistician, 2016; 70(1):74-90.
Pearson ES. A further development of tests for normality. Biometrika, 1930; 22(1-2):239-249.
Phadke MS, Kackar RN, Speeney DV, Grieco MJ. Off-line quality control integrated circuit fabrication using experimental design. Bell System Technical Journal, 1983; 62(5):1273-1309.
Pignatiello JJ. Strategies for robust multiresponse quality engineering. IIE Transsactions, 1993; 25(3):5-15.
Queeshi FS, Sheikh AK. A probabilistic characterization of adhesive wear in metals. IEEE Transactions on Reliability, 1997; 46(1):38-44.
Ross R. Bias and standard deviation due to Weibull parameter estimation for small data sets. IEEE Transactions on Dielectrics and Electrical insulation, 1996; 3(1):28-42.
Royston P. Algorithm AS 181: the W test for normality. Applied Statistics, 1982; 31(2):176-180.
Royston P. An extension of Shapiro and Wilk’s W test for normality to large samples. Journal of the Royal Statistical Society: Series C (Applied Statis-tics), 1982; 31(2):115-124.
Royston P. Approximating the Shapiro-Wilk W-test for non-normality. Statistics and computing, 1992; 2(3):117-119.
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S. Time seriesanomaly discovery with grammar-based compression. In Proceedings of the 18th international conference on extending database technology, EDBT 2015, pages 481-492, 2015.
Shapiro SS, Francia RS. An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association, 1972; 67(337):215-216.
Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika, 1965; (3-4):591-611.
Sheikh AK, Boah JK, Hansen DA. Statistical modelling of pitting corrosion and pipeline reliability. Corrosion, 1990; 46(3):190-197.
Shewart WA. Economic Control of Quality Manufactured Product. D. Van Nostrand, New York, 1931.
Szekely GJ, Rizzo ML. A new test for multivariate normality. Journal of Multivariate Analysis, 2005; 93(1):58-80.
Taguchi G. Introduction to Quality Engineering: Designing Quality into Products and Processes. Asian Productivity Organization, Tokyo, 1986.
Tenreiro C. A new test for multivariate normality by combining extreme and nonextreme BHEP tests. Communications in Statistics - Simulation and Computation, 2017; 46(3):1746-1759.
Thas O, Ottoy JP. Some generalizations of the Anderson–Darling statistic. Statistics & Probability Letters, 2003; 64(3):255-261.
Tiryakioglu M, Campbell J. Weibull analysis of mechanical data for castings: A guide to the interpretation of probability plots. Metallurgical and Materials Transactions A, 2010; 41(12):3121-3129.
Tsui KL. A critical look at Taguchi’s modelling approach. Journal of Applied Statistics, 1996; 23(1):81-95.
Tsui KL. Robust design optimization for multiple characteristic problems. International Journal of Production Research, 1999; 37(2):433-445.
Vasicek O. A Test for Normality Based on Sample Entropy. Journal of the Royal Statistical Society. Series B, 1976; 38(1):54-59.
Wang X, Lin J, Patel N, Braun M. Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Minind and Knowledge Discovery, 2018; 32:1806-1844.
Weibull W. A statistical theory of the strength of material. Ingeniors Vetenskaps Akademiens Hand-ligar, 1939; 151:5-45.
Weibull W. A statistical distribution function of wide applicability. Journal of Applied Mechanics, 1951; 18:293-296.
Welford BP. Note on a method for calculating corrected sums of squares and products. Technometrics, 1962; 4(3):419-420.
Woo S. Reliability Design of Mechanical Systems. Springer, Singapore, 2020.
Woodall WH, Tsui KL, Tucker GR. A Review of Statistical and Fuzzy Quality Control Charts Based on Categorical Data. In H.-J. Lenz and P.-T. Wilrich, editors, Frontiers in Statistical Quality Control, pages 83-89, Heidelberg, 1997. Physica-Verlag HD. fttps://
Xie S, Lin H, Wang Y, Chen Y, Xiong W, Zhao Y, Du S. A statistical damage constitutive model considering whole joint shear deformation. International Journal of Damage Mechanics, 2020; 29(6):988-1008.
Zhang Y, Meratnia N, Havinga P. Outlier detection techniques for wireless sensor networks: a survey. IEEE Communications Surveys and Tutorials, 2010; 12(2):159-170.
Zhu LX, Wong HL, Fan KT. A test for multivariate normality based on sample entropy and projection pursuit. Journal of Statistical Planning and Inference, 1995; 45(3):373-385.