Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis

Arulananth TS; Kuppusamy PG; Ayyasamy RK; Alhashmi SM; Mahalakshmi M; Vasanth K; Chinnasamy P

doi:10.1371/journal.pone.0300767

Fulltext

Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis

Arulananth TS ¹ , Kuppusamy PG ² , Ayyasamy RK ³ , Alhashmi SM ⁴ , Mahalakshmi M ⁵ , Vasanth K ⁶ Show all authors , Chinnasamy P ⁷

Affiliations

¹ Department of Electronics and Communication Engineering, MLR Institute of Technology, Hyderabad, India
² Department of Electronics and Communication Engineering, Siddharth Institute of Engineering & Technology, Puttur, Andhrapradesh, India
³ Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Perak, Malaysia
⁴ College of Computing and Informatics, University of Sharjah, Sharjah, UAE
⁵ Department of Networking and Communications, SRM Institute of Science & Technology, College of Engineering and Technology, Kattankulathur, Tamil Nadu, India
⁶ Department of Electronics and Communication Engineering, Chaitanya Bharathi Institute of Technology, Hyderabad, Telangana, India
⁷ Department of Computer Science and Engineering, MLR Institute of Technology, Hyderabad, Telangana, India

PLoS One, 2024;19(4):e0300767.

PMID: 38578733 DOI: 10.1371/journal.pone.0300767

Abstract

Semantic segmentation of cityscapes via deep learning is an essential and game-changing research topic that offers a more nuanced comprehension of urban landscapes. Deep learning techniques tackle urban complexity and diversity, which unlocks a broad range of applications. These include urban planning, transportation management, autonomous driving, and smart city efforts. Through rich context and insights, semantic segmentation helps decision-makers and stakeholders make educated decisions for sustainable and effective urban development. This study investigates an in-depth exploration of cityscape image segmentation using the U-Net deep learning model. The proposed U-Net architecture comprises an encoder and decoder structure. The encoder uses convolutional layers and down sampling to extract hierarchical information from input images. Each down sample step reduces spatial dimensions, and increases feature depth, aiding context acquisition. Batch normalization and dropout layers stabilize models and prevent overfitting during encoding. The decoder reconstructs higher-resolution feature maps using "UpSampling2D" layers. Through extensive experimentation and evaluation of the Cityscapes dataset, this study demonstrates the effectiveness of the U-Net model in achieving state-of-the-art results in image segmentation. The results clearly shown that, the proposed model has high accuracy, mean IOU and mean DICE compared to existing models.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.

MeSH terms

Similar publications