Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Q: How can I download an article?

To download an article from SID, first log in to the site, search for the article title, and click on the 'Download Article' option.

Q: How can I download an ISI article?

To download an ISI article on SID, enter the keyword or article title in the search bar, view the relevant results, click on the desired article, and select the 'Download Article' option.

Q: How can I access the SID database?

To access the SID database, visit SID.ir, create an account, and log in to access scientific resources.

Q: Is downloading articles from SID free?

Some articles on SID are available for free, while others require payment. Details are specified on the article's page.

Ahmadi Rozhan; Kasaei Shohreh

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Journal Paper

Paper Information

Journal: AUT Journal of Electrical Engineering Year:2025 | Volume:57 | Issue:2 Page(s): 333-342

Download Full-Text

View:

Download:

Cites:

Information Journal Paper

Title

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Author(s)

Ahmadi Rozhan | Kasaei Shohreh | Issue Writer Certificate

Keywords

Weakly Supervised Semantic Segmentation

Class Activation Map

Hierarchical Vision Transformer

Image-level label

Abstract

Recent advancements in Weakly Supervised Semantic Segmentation (WSSS) have highlighted the use of image-level class labels as a form of supervision. Many methods use pseudo-labels from Class Activation Maps (CAMs) to address the limited spatial information in class labels. However, CAMs generated from Convolutional Neural Networks (CNNs) are often led to focus on prominent features, making it difficult to distinguish foreground objects from their backgrounds. While recent studies show that features from Vision Transformers (ViTs) are more effective in capturing the scene layout than CNNs, the use of hierarchical ViTs has not been widely studied in WSSS. This work introduces "SWTformer" and explores the effect of Swin Transformer’s local-to-global view on improving the accuracy of initial seed CAMs. SWTformer-V1 produces CAMs solely based on patch tokens as its input features. SWTformer-V2 enhances this process by integrating a multi-scale feature fusion mechanism and employing a background-aware mechanism that refines the accuracy of localization maps, resulting in better differentiation between objects. Experiments on the Pascal VOC 2012 dataset demonstrate that compared to state-of-the-art models, SWTformer-V1 achieves 0.98% mAP higher in localization accuracy and generates initial localization maps that are 0.82% mIoU higher in accuracy while relying solely on the classification network. SWTformer-V2 enhances the accuracy of the seed CAMs by 5.32% mIoU. Code available at: ttps://github.com/RozhanAhmadi/SWTformer

Multimedia

No record.

Cites

No record.

References

No record.

Cite

Related Journal Papers

No record.

Related Seminar Papers

No record.

Related Plans