Modern neural network methods for building vectorization from high-resolution satellite imagery

Authors

  • I. A. Radion* PhD student Lesya Ukrainka Volyn National University
  • O. V. Melnyk Associate Professor, Ph.D. in Engineering Lesya Ukrainka Volyn National University

DOI:

https://doi.org/10.36910/6775-2410-6208-2025-14(24)-28

Keywords:

building vectorization, deep learning, neural networks satellite imagery, semantic segmentation, transformers, polygonization, topology

Abstract

Background. Automatic vectorization of buildings from satellite imagery is a key task for mapping and cadastral purposes. Modern deep learning methods have achieved high raster accuracy (IoU 85-92%), yet a fundamental problem remains: segmentation optimization does not guarantee the generation of geometrically and topologically correct vector polygons. Studies report significant angular deviations (up to 8.3°), non-parallel walls, and a high rate of topological errors (12-18%). Poor generalization to new regions and the omission of small objects also remain challenges.

To systematize and analyze modern deep learning methods for building vectorization, with a focus on the problems of geometric regularity, topological correctness, and generalization.

A review of publications from 2015-2024 (CVPR, ISPRS, etc.) using benchmark datasets (SpaceNet, WHU, INRIA) was conducted. Evaluation metrics included IoU and F1-score for raster accuracy, as well as PoLiS and Chamfer Distance for vector quality. Methods were classified into three groups: CNN-based (U-Net, DeepLab), transformer-based (Swin, SegFormer), and end-to-end methods (Frame Field Learning, GNN).

CNN architectures remain an effective baseline. Transformers demonstrate the highest raster accuracy (IoU >90%) but are computationally expensive. End-to-end methods, such as Frame Field Learning and PolyWorld, which generate vectors directly by bypassing the polygonization step, show slightly lower raster accuracy but significantly better vector quality (PoLiS ~73%), which is critical for cadastral applications.

A trade-off exists: transformers lead in raster accuracy (IoU 85-92%), while end-to-end methods (IoU 82-88%) provide significantly higher vector quality (PoLiS 70-73%). Promising research directions include integrating geometric constraints into network architectures, developing topology-aware loss functions, improving generalization, and multimodal approaches combining optical imagery with LiDAR data.

Downloads

Download data is not yet available.

References

1. Li W., He C., Fang J., Zheng J., Fu H., Yu L. Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source GIS data. Remote Sensing. 2019. Vol. 11, № 4. P. 403.

2. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Springer, 2015. P. 234–241.

3. Oktay O., Schlemper J., Folgoc L.L., Lee M., Heinrich M., Misawa K., Rueckert D. Attention U-Net: Learning where to look for the pancreas. Medical Imaging with Deep Learning. 2018.

4. Chen L.-C., Zhu Y., Papandreou G., Schroff F., Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV). 2018. P. 801–818.

5. Wang L., Li R., Zhang C., Fang S., Duan C., Meng X., Atkinson P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing. 2022. Vol. 190. P. 196–214.

6. Wang Y., Wang M., Hao Z., Wang Q., Wang Q., Ye Y. MDFA-Net: Multi-scale differential feature self-attention network for building change detection in remote sensing images. Remote Sensing. 2024. Vol. 16, № 18. P. 3466.

7. Zhao W., Persello C., Stein A. Building outline delineation: From very high resolution remote sensing images to polygons with an improved end-to-end learning framework. ISPRS Journal of Photogrammetry and Remote Sensing. 2021. Vol. 179. P. 364–378.

8. Yang G., Zhang Q., Zhang G. EANet: Edge-aware network for the extraction of buildings from aerial images. Remote Sensing. 2020. Vol. 12, № 13. P. 2161.

9. Wang Y., Chen C., Ding M., Li J. Real-time dense semantic labeling with dual-Path framework for high-resolution remote sensing image. Remote Sensing. 2019. Vol. 11, № 24. P. 3020.

10. Chen L., Papandreou G., Schroff F., Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. 2017.

11. Zhang L., Wu J., Fan Y., Gao H., Shao Y. An efficient building extraction method from high spatial resolution remote sensing images based on improved mask R-CNN. Sensors. 2020. Vol. 20, № 5. P. 1465.

12. Xie E., Wang W., Yu Z., Anandkumar A., Alvarez J.M., Luo P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems (NeurIPS). 2021. Vol. 34. P. 12077–12090.

13. Girard N., Smirnov D., Solomon J., Tarabalka Y. Polygonal building extraction by frame field learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. P. 5891–5900.

14. Zorzi S., Bazrafkan S., Habenschuss S., Fraundorfer F. PolyWorld: Polygonal building extraction with graph neural networks in satellite images. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. P. 1848–1857.

15. Li Z., Wegner J.D., Lucchi A. Topological map extraction from overhead images. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019. P. 1715–1724.

16. Zhang Y., Gong W., Sun J., Li W. Web-Net: A novel nest networks with ultra-hierarchical sampling for building extraction from aerial imageries. Remote Sensing. 2019. Vol. 11, № 16. P. 1897.

17. Cheng G., Wang Y., Xu S., Wang H., Xiang S., Pan C. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing. 2017. Vol. 55, № 6. P. 3322–3337.

18. He K., Gkioxari G., Dollár P., Girshick R. Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017. P. 2961–2969.

19. Kirillov A., Mintun E., Ravi N., Mao H., Rolland C., Gustafson L., Girshick R. Segment anything. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023. P. 4015–4026.

20. Ji S., Wei S., Lu M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing. 2019. Vol. 57, № 1. P. 574–586.

Published

2025-12-24

How to Cite

Radion, I. A., & Melnyk, O. V. (2025). Modern neural network methods for building vectorization from high-resolution satellite imagery. Modern Technologies and Methods of Calculations in Construction, 24, 340-347. https://doi.org/10.36910/6775-2410-6208-2025-14(24)-28