مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Verion

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

62
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

0
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

Comparison of Clustering High Dimensional Data by Random Projections Method and Some Common Methods of Dimensional Reduction

Pages

  239-252

Abstract

 Introduction The clustering of the High dimensional data is usually encountered with problems such as the curse of dimensionality. To overcome such obstacles, dimensionality reduction methods are often used. This view is typically referred to by two approaches,variable selection and variable extraction. Recently, researchers proposed a way that is claimed to lose less information in clustering high-dimensional data than other techniques. Among them, that presented by Anderlucci et al. (2021) under the title of Random Projections is very popular. The RP method is based on creating Random Projections, selecting a small subset, and then performing clustering tasks. Comparison and superiority of this method with conventional approaches of dimensionality reduction, using four critical criteria of clustering including adjusted Rand index, Jaccard index, Fowlkes-Malo index and the accuracy index is performed on three gene expression datasets in this article. Material and Methods One of the variable selection methods is the variable selection approach for clustering based on the Gaussian model. On the other hand, the principal components analysis method is one of the most popular methods for extracting variables. Another practical, new and exciting approach to performing dimensionality reduction is the Random Projections method. Using a group Random Projections, Andrelucci et al. (٢, ٠, ٢, ١, ) proposed clustering algorithm to cluster the high-dimensional data. This algorithm obtains the final output through Gaussian mixture model clustering applied to the optimal subset of Random Projections. Then, the original high-dimensional data is mapped onto the reduced spaces. Finally, model selection criteria are calculated for them and observations are clustered using optimal projections. Results and Discussion In this paper, the proposed methods by Anderlucci et al. (2021) are described and compared on three gene expression datasets, including leukaemia, lymphoma, and prostate cancers. Based on the gained results, using the introduced criteria, both competing methods have lower values than the Random Projections method and therefore have weaker performance. The final result is that the Random Projections method performs better for the three mentioned datasets. It should be noted that the purpose of the current study was only to compare the performance of clustering based on the three mentioned approaches and some different clustering criteria. So, other analytical aspects related to the random projection were not considered. Further exploration of these methods will be followed in our future research. Conclusion Clustering of high-dimensional data faces different statistical challenges, and various methods exist to overcome the related problems. One of these practical tools is reducing the data dimension. This article examined the random projection from both theoretical and practical aspects. Also, its performance was evaluated on three real data sets and compared with other standard methods, and its superiority was shown based on several conventional indicators of clustering measures. To conduct future research, one can address the probabilistic aspects of the Random Projections approach by considering proper statistical inference methods.

Cites

  • No record.
  • References

  • No record.
  • Cite

    APA: Copy

    Nourani Pileh Roud, S., & GOLALIZADEH, M.. (2022). Comparison of Clustering High Dimensional Data by Random Projections Method and Some Common Methods of Dimensional Reduction. JOURNAL OF STATISTICAL SCIENCES, 16(1 ), 239-252. SID. https://sid.ir/paper/1021469/en

    Vancouver: Copy

    Nourani Pileh Roud S., GOLALIZADEH M.. Comparison of Clustering High Dimensional Data by Random Projections Method and Some Common Methods of Dimensional Reduction. JOURNAL OF STATISTICAL SCIENCES[Internet]. 2022;16(1 ):239-252. Available from: https://sid.ir/paper/1021469/en

    IEEE: Copy

    S. Nourani Pileh Roud, and M. GOLALIZADEH, “Comparison of Clustering High Dimensional Data by Random Projections Method and Some Common Methods of Dimensional Reduction,” JOURNAL OF STATISTICAL SCIENCES, vol. 16, no. 1 , pp. 239–252, 2022, [Online]. Available: https://sid.ir/paper/1021469/en

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button