The amount of water that is stored in snow is an important resource to monitor especially with anthropogenic climate change. In the prairie, snow is highly heterogeneous but also extremely important. Snowmelt accounts for 80 to 85% of the streamflow in prairie environments [5]. The heterogeneity of snow accumulation is a direct result of strong winds and local variations in topography and vegetation that can trap snow [5]. We hope that cosmic ray neutron sensing, which has approximate maximum footprints ranges between 200 m to 300 m be able to get important measurements of snow water equivalence, the amount of water stored in snowpack, in prairie environments. However, recent examinations have discussed whether the current corrections of raw CRNS data as described by Desilets (2017) [3] are extremely location derived [1, 10, 14]. From the correction factors, we know that neutron counts are related to moisture in its surrounding. Can we use a regression model to obtain a modelled neutron count given our given the atmospheric conditions (temperature, humidity, and pressure) assuming no changes in soil moisture? How different are the correction equations relationships in the regression model from Desilet's (2017) paper? How accurate are SWE estimates from these modelled counting rates? Can we predict neutron counts and SWE in another Montana site from our regression?
- Anaconda
- You need to have a NASA Earthdata Account
Before running the code, please have your username and password ready for a NASA Earthdata Account. One of the datasets required for this analysis downloads data from the NSIDC. Then, you may clone this github repo in a terminal or Anaconda Power Shell.
git clone https://github.com/heyjoekim/SnowProj22
Alternatively, you can download the zipped file. After that, change into the repo saved onto your device. We need to make the script that will run all of the code executable first
cd SnowProj22
chmod +x run_snowproj.sh
These instructions are based off of my testing as of 12/14/2022. Some additional dependecies may be needed. Opening Anaconda Powershell Prompt, we may need to download:
conda install git
conda install wget
Then we would need to create our python environment using the lines of code below:
conda env create --name snow --file myenv.yml
NOTE: Do not activate the environment at this time. Simply run the shell script
sh .\run_snowproj_win.sh <uid> <password>
The <uid>
and <password>
are the username and password for NASA Earthdata Login. Delete these and add your username and password to be able to download the files.
The following shell script should be used to run the analysis for this repository. The following script will create a new conda environment called snow
which will use all the versions of the python packages that this analysis used. It will also complete all of the analysis.
source run_snowproj.sh <uid> <password>
Again, like above <uid>
and <password>
are the username and password for NASA Earthdata Login. Delete these and add your username and password to be able to download the files.
This analysis will use measurements from the Cosmic Ray Neutron Sensor (CRNS) manufactured by Hydroinova LLC at a field site in Moccasin, Montana. The field site is a 1 km by 1 km research site containing multiple agricultural fields managed by Central Agriculture Research Center (CARC) through Montana State University. At the CARC, winter crops may persists which helps to trap snow and change snow redistribution patterns from winds. The CRNS sensor detects low-energy or “fast” neutrons. These neutrons are produced from cosmic rays interacting in matter and are typically attenuated from water molecules [3,15]. CRNS sensors were originally developed for soil moisture detection but a method was established to determine the snow water equivalent (SWE). Our CRNS sensor collected data from November 2020 to May 2021. The CRNS also collects air temperatures, humidity, and atmospheric pressure in order to correct the effects of the interactions of cosmic rays with the atmosphere.
Along with the atmospheric conditions that are collected by the CRNS sensor, I also used humidity, atmospheric pressure, and temperature from the Montana Mesonet network located near our study site in Moccasin. The Montana Mesonet stations are operated by the Montana Climate Office (MCO) as a part of the Montana Forest & Conservation Experiment Station at the University of Montana. In order to support research and agriculture, the MT Mesonet stations gather data on soil moisture and meteorological conditions. Data was downloaded from https://shiny.cfc.umt.edu/mesonet-download/. Daily Air Temperatures, Atmospheric Pressure, and Relative Humidity was downloaded from the Mocassin ARC Mesonet station from November 22, 2020 to June 1, 2021 using the url provided.
To compare and test the location effect of the correction formulas for CRNS data, another site in located west of Missoula, MT will be added. This site was chosen due to the rough colocation of a MT Mesonet station and a United States Department of Agriculture Natural Resource Conservation Service Snow Telemetry (SNOTEL) station. The SNOTEL network is a group of over 600 stations across the Western United States that was placed in high snow accumulation area, mostly colocated with a snow course site [12]. SNOTEL stations are designed to run automatically, especially in regions where snow course data is dangerous to collect. SWE is collected through a snow pillow, a pressure transducer that when snow accumulates induced a change in a monometer [12]. Occasionally, foreign debris may land on top of the snow pillow, so data needs to be checked for erroneous changes. Data will be taken for the approximate winter season for water year 2021 (October 1, 2020 to June 1, 2021) for the Moiese N MDA Mesonet station and the Sleeping Woman SNOTEL station.
To double check the magnitude of SWE, I will also use the Western United States UCLA Daily Snow Reanalyis, Version 1 dataset [4]. This data is freely available from the National Snow and Ice Data Center (NSIDC). A bash script is provided to download the currently available data. This dataset is created from a series of Landsat images that are uniquely blended in order to obtain posterior estimates of snow depth, SWE, and fractional snow cover. This analysis will only look at the SWE potion of the data. A time series of SWE at the approximate pixel location of both sites will be taken for water year 2021.
This section will outline the major data processing that must be done on the CRNS measurements.
CRNS measurements of neutron counts below 2000 counts per hour was removed from the dataset. CRNS measured temperature and humidity must also be quality controlled for erroneous measurements and missing data. For CRNS measured temperatures, any values below -60 was considered to be missing data and was removed Additionally, any relative humidity values that were above 100 was also removed. Similar quality controls will also be done on Montana Mesonet data if needed.
SNOTEL daily measurements will also be quality controlled for missing and erroneous data using the following procedures [12]. Erroneous data is designated as any change in measurements that is greater than at least 5 standard deviations greater than the monthly mean. Missing data will also be flagged. Common quality control for SNOTEL stations typically involve negative measurements for SWE and snow depth. Unless precipitation is involved, most data will be filled with a value of 0 for these negative measurements.
The main calculations for CRNS processing and SWE calculations are done based on Darin Desilet's 2017 [3] analysis which is summarized in this section. The raw neutron counting rate from the sensor first needs to be corrected from other moisture sources such as the atmosphere:
for temperature T in
where
where
In order to check if the equations from [3] will work for our site in Montana, I will first use a random forest regression to my CRNS data. Random forests are an ensemble model by creating multiple decision trees with more randomness [2].The random forest regression will be running using the version implemented in Python with the scikit-learn package [8]. The random forest regression will be used running 1000 decision trees. In order to estimate the coefficients, I will use partial dependence plots. Introduced by [7], partial dependence plots allows us to interpret model relationships within a model, especially with machine learning models that do no take explicit assumptions. The partial dependence plots used are also from scikit-learn [8]. Additionally, another issue that needs to be brought up in multicolinearity. I calculated the VIF for my features for my random forest regression. While slightly colinear with our variables (VIF<10), pressure is highly colinear with our corrected neutron counts with the VIF being at least an order of magnitude greater than 10. While feature selection is not a major part of this analysis, using highly colinear variables may influence the results of our regression.
Table 1: Variable Inflation Factors for each feature in the regression DataFrame. VIFs greater than 10 is set to have too much multicolinearity.
Feature | VIF |
---|---|
N_cor [cph] | 723.05 |
P [mb] | 676.81 |
H |
9.71 |
I will specifically run random forest models using both pressure in mb/hPa and absolute humidity in
From the approximations of the partial dependence plots, I will approximate the
The regression model would be trained on the processed CRNS data. The corrected neutron counting rate N will be predicted using pressure (mb) and absolute humidity
The results from the random forest regression is shown on Figure 1 using our test dataset. It is evident from the residuals (Figure 1B) that there is significant heteroscedasticity in our random forest model. We also observe that our out-of-bag (OOB) score is only 0.245 and the
Figure 1. (a) Random Forest results shown between "observed" corrected Neutron counts against predicted corrected Neutron Counts for our CRNS sensor at the CARC in counts per hour. The dashed line shows a one to one relationship between the two entities. (b) A similar plot against our "observed" neutron counts with the residuals of the regression results. A dashed line at 0 represents where both predicted and observed values are equal. The residuals clearly have some heteroscedasticity.
Analyzing the partial dependence plots for pressure, we see that from modeled relationships, an increase in pressure leads to an increase in neutron counts. In order to estimate the pressure correction equations, we took linear regressions using the logarithm of the corrected neutron counts. We can estimate the linear equation to obtain a general equation form of
We can then divide the theoretical mean of the raw neutron count to esimate the
Figure 2. A partial dependence plot (PDP) between pressure [mb] against Corrected Neutron counts (blue dots). The black dashed line is a linear fit for pressure as the independent variable and the log Neutron Count as the depdendent variable. Modeled results from URANOS are plotted along side our regression PDP with changing pressures at a constant humidity, soil moisture, and soil porosity. The red line is a linear fit for modeled results and shows a slight difference between the RF regression and simple modeled results.
In analyzing the PDP results for absolute humidity (Figure 3), we notice one big discrepancy. From URANOS results, we see the predicted result that neutron counts should decrease with increasing humidity as is noted in the literature [3,9]. However, in our data, we notice the opposite trend. When we take a linear regression of the PDP plot, we obtain the following linear equation:
These results are slightly different from the values from Desilets (2017), whose slope was equal to 0.0054.
Figure 3. A partial dependence plot (PDP) between absolute humidity [g/m^3] against Corrected Neutron counts (blue dots). The black line is a linear fit for absolutre pressure as the independent variable and the Neutron Count as the depdendent variable. Modeled results from URANOS are plotted along side our regression PDP with changing humidity at a constant pressure, soil moisture, and soil porosity. The red dashed line is the linear equation for modeled humidity and neutron counts.
We can then use our equations to try and back out a new predicted neutron counts and eventually SWE. Figure 4 shows the results of comparing using the random forest regression to predict neturon counts and SWE against using Desilet's (2017) equations. We notice a big error in neutron counts towards the middle/end of December 2020. We are able to recreate the large spike of snowfall in middle February 2021, but not able to maintain the correct shape and trend after this time. Comparing our SWE calculations to the UCLA Snow Reanalysis, we see similar trends across all three calculations. The UCLA Reanalysis SWE tended to overrepresent the snow spike in mid-February. However, across all other time periods, we see less snow in this dataset, most likely from the large areal averages.
Figure 4: a) Corrected neutron counts for our CARC site using the Desilets (2017) equation (blue) and the RF regression (orange) for the Moccasin Mesonet station data. b) SWE calculations using the Desilets (2017) equations for neutron counts based on Desilet (2017) corrections (blue), Random Forest correction (orange). The Western United States UCLA Snow Reanalysis Data is also plotted as an additional data source (green).
When we try to apply our regression to another site in Montana (Figure 5), our random forest model does not accurately predict SWE. Because there isn't a CRNS measurement at this site, we cannot predict neutron counts or SWE. There is considerably more snow at this location compared to our CARC site. Even the UCLA Snow Reanalysis data under predicts snow at this location. Again, since the SNOTEL site is only a single measurements, this is likely due to an areal averaging.
Figure 5: a) Corrected neutron counts for the Sleeping Woman SNOTEL site using RF regression (blue) using the Moise MD Mesonet station data. No CRNS data is available for this site b) SWE calculations using the Desilets (2017) equations for neutron counts based on Desilet (2017) for Random Forest correction (orange). SNOTEL measurements (blue) and the Western United States UCLA Snow Reanalysis Data (green) are also shown. At this site, the RF corrections does not work well.
Figures 4 and 5 clearly show that we need more CRNS measurements to train a general model like a random forest regression to model neutron counts at various sites. We should expect neutron interactions to be heavily moderated with increased snow amounts as per the literature [6,15]. At the Sleeping Woman site, SWE exceeded the maximums of 1 to 1.5 cm of SWE that we observed at the CARC. With more measurements at different sites, we may potentially be able to predict neutron counts and potentially SWE. The heteroscedasticity in the model residuals also suggests our model is incomplete. We may want to potentially fit other features such as soil moisture to help explain more of the variability in neutron counting rates.
The PDP results here also show that there is a factor to location when we are using the correction equations for CRNS. Our equations from the PDP differed greatly from the equations provided by Desilets (2017) for correcting the effects of pressure and humidity. More research needs to be investigated with modeled results for humidity. While literature shows that we should expect neutron counts to drop as humidity increases, our raw neutron counts increased with increasing humidity (not shown). Again, the limitation of this analysis with one winter measurement at one site severly limited our analysis.
As mentioned on Table 1, multicolinearity is clearly an issue within this analysis. Pressure and neutron counts were highly colinear in this anaylsis, possibly creating errors when we used the linear regressions in our PDP plots. Doing the analysis with absolute humidity helped to address the multicolinearity between neutron counts, relative humidity, and temperature, but made the colinear relationship between pressure and neutron counts. We hope that more data collection from our CRNS this winter may help. Additionally, I would like to use CRNS measurements for other locations i.e. the COSMOS-UK, COSMOS-EU, and other CRNS data across the United States. I would also like to heavily feature the Monte Carlo simulations like URANOS in my analysis to serve as a theoretical control to not only experiment with correction equations, but also the relationship between neutron counts and SWE (Equation 10), but also the theoretical counting rates over soil (Equation 9).
- Andreasen, M., Jensen, K. H., Zreda, M., Desilets, D., Bogena, H., & Looms, M. C. (2016). Modeling cosmic ray neutron field measurements: MODELING COSMIC RAY NEUTRON FIELD MEASUREMENTS. Water Resources Research, 52(8), 6451–6471. https://doi.org/10.1002/2015WR018236
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
- Desilets, D. (2017). Calibrating A Non-Invasive Cosmic Ray Soil Moisture For Snow Water Equivalent. https://doi.org/10.5281/ZENODO.439105
- Fang, Y., Y. L. and Margulis., S. A. (2022). Western united states ucla daily snow reanalysis, version 1
- Gray, D. M. (1970). Snow hydrology of the prairie environment. Snow hydrology, 21-34.
- Köhli, M., Schrön, M., Zacharias, S., & Schmidt, U. (2022). URANOS v1.0 – the Ultra Rapid Adaptable Neutron-Only Simulation for Environmental Research [Preprint]. Climate and Earth system modeling. https://doi.org/10.5194/gmd-2022-93
- Molnar, C., Freiesleben, T., König, G., Casalicchio, G., Wright, M. N., & Bischl, B. (2021). Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process (arXiv:2109.01433). arXiv. http://arxiv.org/abs/2109.01433
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
- Rosolem, R., Shuttleworth, W. J., Zreda, M., Franz, T. E., Zeng, X., & Kurc, S. A. (2013). The Effect of Atmospheric Water Vapor on Neutron Count in the Cosmic-Ray Soil Moisture Observing System. Journal of Hydrometeorology, 14(5), 1659–1671. https://doi.org/10.1175/JHM-D-12-0120.1
- Schattan, P., Köhli, M., Schrön, M., Baroni, G., & Oswald, S. E. (2019). Sensing Area‐Average Snow Water Equivalent with Cosmic‐Ray Neutrons: The Influence of Fractional Snow Cover. Water Resources Research, 55(12), 10796–10812. https://doi.org/10.1029/2019WR025647
- Seabold, S. and Perktold, J. (2010). statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.
- Serreze, M. C., Clark, M. P., Armstrong, R. L., McGinnis, D. A., & Pulwarty, R. S. (1999). Characteristics of the western United States snowpack from snowpack telemetry (SNOTEL) data. Water Resources Research, 35(7), 2145–2160. https://doi.org/10.1029/1999WR900090
- Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, ̇I., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272.
- Wallbank, J. R., Cole, S. J., Moore, R. J., Anderson, S. R., & Mellor, E. J. (2021). Estimating snow water equivalent using cosmic‐ray neutron sensors from the COSMOS‐UK network. Hydrological Processes, 35(5). https://doi.org/10/gq5kxc
- Zreda, M., Shuttleworth, W. J., Zeng, X., Zweck, C., Desilets, D., Franz, T., & Rosolem, R. (2012). COSMOS: The COsmic-ray Soil Moisture Observing System. Hydrology and Earth System Sciences, 16(11), 4079–4099. https://doi.org/10.5194/hess-16-4079-2012