Wildfires have significant impacts on the environment, society, and economy, therefore understanding the dynamics of wildfires is crucial to evaluate such effects. Nevertheless, monitoring and measuring the burned area by traditional, non-automatic methods remains time-consuming and challenging.
For several years, automatic semantic segmentation models have been used to describe natural phenomena, but deep learning models have recently achieved very competitive results. However, this new breed of models typically needs annotated datasets of significant dimensions. Nonetheless, datasets for real-time burnt area segmentation are often scarce.
We propose a new manually annotated dataset for segmentation of forest fire burned area based on a video captured by a UAV to train and evaluate semantic segmentation models. We explore deep learning-based techniques and establish baselines. We also suggest specific temporal consistency metrics to validate burned area polygons generated by the models. By applying U-Net network models, we obtain IoU values superior to 95% on the test set and competitive temporal consistency compared to classical machine learning approaches in the successive frames generated by the model based on non-annotated data.
The dataset consists of a total of 249 frames and corresponding segmentation masks. The subset considered for training and validation contains 226 frame-mask pairs approximately 90% while the test subset contains 23 pairs approximately 10%.
Each of the training and validation frames was generated by taking a sample every 4 seconds (corresponds to 100 frames) starting at the initial one and ending at frame 22500. The frames and annotations of the test set have the same sampling rate but are offset from the 50 frames (2 seconds) of the training and validation frames to avoid overlapping. In the case of the test frames, they start at frame 20250 and end at frame 22450, which corresponds to the final portion of the video.
The dataset is available for download here.
These models is based on the architecture proposed by Ronneberger et al [U-Net].
This model is based on the architecture proposed by Çiçek et al [3D U-Net].
Average Temporal Consistency. The U-Net Base model, on average, has the smallest TC value, which suggests that it has greater temporal consistency throughout the video.
To cite this work please use:
@article{ribeiro_ba_uav_article,
title = {Burned area semantic segmentation: A novel dataset and evaluation using convolutional networks},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {202},
pages = {565-580},
year = {2023},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2023.07.002},
author = {Tiago F.R. Ribeiro and Fernando Silva and Jos\'e Moreira and Ro\'erio Lu\'is de C. Costa},
}
@misc{ba_uav_ribeiro_dataset,
author = {Ribeiro, Tiago F. R. and Silva, Fernando and Moreira, Jos\'e and Costa, Ro\'erio Lu\'is de C.},
title = {BurnedAreaUAV Dataset (v1.1)},
month = may,
year = 2023,
publisher = {Zenodo},
version = {1.1},
doi = {10.5281/zenodo.7944963},
}
This work is partially funded by FCT - Fundação para a Ciência e a Tecnologia, I.P., through projects MIT-EXPL/ACC/0057/2021 and UIDB/04524/2020, and under the Stimulus for Scientific Employment - Institutional Support - CEECINST/00051/2018.
If you have any questions, suggestions or want to contribute, feel free to contact me at [email protected]
.