Genetic diversity analysis of peppers: a comparison of discarding variable methods
Elizanilda R. do Rego; Mailson M. Do Rêgo; Cosme D. Cruz; Paulo R. Cecon3; Dany S.S.L. Amaral and Fernando L. Finger
There are a lot of variables in genetic diversity studies, and it is necessary to know whether or not they are all important and which ones can be discarded. There are often little changes in clustering patterns if a subset of these variables is used, because the discarded variables are redundant or of little contribution to the variability. This study aimed at comparing two discards of variables methods – the Singh method and the principal components method – as well as evaluating the effect of the discards on the cluster analysis. In this analysis data of six ripe fruits traits were used. Other characters with previously known variability or collinearity were added to the analysis. The method considered being the most efficient was the one, which indicated variables that did not alter the initial clustering pattern when discarded. The Singh method did not detect variation differences when standardized data were used. When the distance was obtained by the non-standardized data, the pericarp thickness (0.018%), total soluble solids (0.1668%) and minimum width (2.99%) had the lowest contribution to the divergence. The principal components pointed out that the characteristics fruit length, total soluble solids and seeds yield/fruit were considered as dispensable variables. There were changes in the initial clustering pattern when the variable pericarp thickness was discarded, and the Singh method was not efficient in detecting the importance of this variable. There were no changes in the initial clustering pattern when fruit length was discarded. The data showed that the two compared methods differed, since Singh’s and principal component methods showed different variables to be discarded. The Singh method was not efficient in detecting multicollinearity among variables. The principal component method was more efficient in pointing out the variables that can be discarded. It is advisable that the genetic divergence is calculated based on the scores of the principal components. In future studies, when there is no replicated data, the genetic divergence and the pinpoint of characters should be calculated based on the principal component scores to avoid discarding some important variables when determining divergence. However, if the variable values differ independently, the Singh method based on Euclidean distance is appropriate.