We recently published a new paper on using information theory to quantify process representations in hydrologic models in Water Resources Research. We calculated a measure from information theory called transfer entropy to better understand how physical processes are represented in different hydrologic models and across various hydroclimatic regimes. Co-authors of the paper included current and former members of the computational hydrology group as well as Martyn P. Clark, University of Saskatchewan, and Grey Nearing, University of Alabama. Andrew Bennett, a Ph.D. student in the UW Hydro | Computational Hydrology group, was the lead author.
We applied the concept of transfer entropy to quantify the active (or time-varying) transfer of information between hydrologic processes at various timescales for three hydrologic models in the Columbia River Basin: the Variable Infiltration Capacity (VIC) model, the Precipitation Runoff Modeling System (PRMS), and the Structure for Unifying Multiple Modeling Alternatives (SUMMA). All models were run over a historical period of 61 years. Model intercomparison experiments in the hydrologic community have often taken the form of calculating the Nash-Sutcliffe efficiency, Pearson correlation coefficient, the root mean square error (RMSE), or the runoff ratio and aridity index. However, none of these metrics account for non-linearities in process representations, which are characteristic of hydrologic models and do not always provide much insight into the internal workings of these models. Applying more advanced techniques such as information theory allows for quantifying non-linearities inherent to these process representations. We used transfer entropy to calculate how much information is transferred between different flux terms of the water balance, including the following terms: precipitation, runoff, evapotranspiration, soil moisture and snow water equivalent (SWE). By computing the strength of the pairwise process connections (for example, the information transfer between evapotranspiration and soil moisture) between variables in the model output, we were able to highlight similarities and differences in model behavior. We then used chord diagrams to visualize these process connections.
This analysis enabled us to demonstrate the relationship between model structure and model output. For example, in the Snake River region, runoff in SUMMA is driven by snowmelt and rainfall contributes to ET rather than runoff due to the aridity of the region. Although the timeseries of soil moisture for the Snake in VIC and this specific SUMMA configuration are more similar than that of PRMS, process connectivity between SUMMA and PRMS is actually more similar than between SUMMA and VIC or PRMS and VIC. The model output timeseries are to be similar because of the interaction between changes in SWE and ET. This demonstrates how transfer entropy can at times reveal model structural similarities and differences better than conventional error metrics. As Andrew Bennett, lead author of the study, stated, this approach “…opens [up] an entirely new area of model evaluation and intercomparison which can help to describe the complexity of modern hydrologic models. We found that even the simplest configuration of computing transfer entropy provided new insights.” The authors have made analysis code for the study, including plotting capabilities, transfer entropy estimators and network analysis routines, publicly available on GitHub as a fully-contained package called HYEENNA. Scripts for post-processing model outputs and reproducing the entire analysis are publicly available as well in a separate repository. This study represents an exciting step forward in understanding the concept of equifinality in hydrology, a topic that has been studied extensively for decades in the field and has often been tackled with little more than a purely heuristic approach. The application of information theory provides new tools that augment and at times replace the existing model intercomparison approaches based on summary statistics to reveal the effects of structural model differences.
Citation: Bennett, A., B. Nijssen, G. Ou, M. Clark, and G. Nearing, 2019: Quantifying process connectivity with transfer entropy in hydrologic models. Water Resources Research, doi:10.1029/2018EF001047.
Source code repository: https://github.com/UW-Hydro/HYEENNA
Funding and acknowledgements: This work was supported in part by grants NNX15AI67G and 80NSSC17K0541 from the NASA AIST program. The estimators, network analysis routines, statistical tests, and plotting capabilities were implemented in a publicly available software library implemented in the Python programming language available under the GNU GPL‐3.0 license (https://www.github.com/UW‐Hydro/HYEENNA). The analysis code for the complete processing of the model outputs is publicly available at https://github.com/arbennett/2018WR024555. The SUMMA model configuration as well as the aggregated model output time series for each of the regions and each of the models is available at https://doi.org/10.5281/zenodo.2637752.
Computer models are the main tools to investigate and predict what will happen to the planet as greenhouse gas concentrations continue to increase in the atmosphere. But models are just tools and we have known for a long time that models can differ quite a lot in their predictions of what will happen in the future. One of the ways that we deal with that is by using multiple models to get an idea of the range of conditions that models project. For example, all models predict continued warming as a result of increased greenhouse gas concentrations in the atmosphere, but they differ in where and how much warming will occur. To make use of model output from climate change model studies, it would help if we could better determine what model components are most responsible for the spread in the projections.
We recently published a new paper on the ways in which different modeling decisions affect hydrologic projections in the Pacific Northwest. The paper was a culmination of a three-year study supported by the Bonneville Power Administration, U.S. Army Corps of Engineers, and the Bureau of Reclamation in which we explored the range of hydrologic futures in the Pacific Northwest. Collaborators on the project included scientists from the Oregon Climate Change Research Institute, Oak Ridge National Lab, Bonneville Power Administration, U.S. Army Corps of Engineers, Princeton University, National Center for Atmospheric Research, University of California in Los Angeles and the University of Saskatchewan. One of the key outcomes of the study was a publicly available dataset of hydrologic projections for the Columbia River Basin and coastal drainages in the Pacific Northwest representing a large number of different hydrologic futures in the region.
The paper was based on an ensemble of hydrologic simulations created by varying different steps along the modeling chain: 1) representative concentration pathways (RCPs), 2) downscaling methods (DSMs), global climate models (GCMs) and hydrologic model implementations. A total of 2 RCPs, 2 DSMs, 10 GCMs and 4 hydrologic model implementations were used, thus the total number of permutations, which can be thought of as a range of possible hydrologic futures, totaled 160. Although ensemble-based analysis is not new to the field of hydrology, this is a much larger number of hydrologic simulations than is typically used. This ensemble enabled us to evaluate how hydrologic model spread depends on the question you ask of the ensemble (e.g. evaluation metric and location) and how modeling choices affect projections of hydrologic change. For the latter question, due to the nature of the ensemble, we were able to tease out the relative importance of GCM, RCP, downscaling method, and hydrologic model implementation in hydrologic projections of future snow and streamflow changes. We also investigated the role of internal variability and found that the model variance exceeded that of internal variability. In other words, modeling choices had a larger impact on the spread of the hydrologic projections than the internal variability in the system. Overall, we found that the role and contribution of each modeling choice depended on the location in the Pacific Northwest as well as the hydrologic question. The choice of RCP was most important when the hydrologic question was driven by snowmelt, whereas the GCM was the most important factor in energy-limited environments. By contrast, choice of hydrologic model implementation mattered most when soil processes were dominant, such as for questions of low flows and in drier parts of the region. We also grouped the results by Koppen-Geiger class, a widely-used climate classification based on temperature and precipitation, and in doing so were able to generalize our results to hydroclimate studies in other regions.
Analyzing such a large ensemble of hydrologic simulations enabled us to draw more robust conclusions about how to guide future hydrologic modeling efforts. As Oriana S. Chegwidden, Ph.D. student and Research Scientist in the Computational Hydrology group and the lead author on the study, stated, “From a scientific perspective, it [the study] helps modelers understand how our modeling decisions can impact our results. Through the paper we can see under what circumstances different steps in the impact modeling chain matter.” Consequently, future studies may decide to construct their modeling chain around diversification of a particular modeling step in accordance with their particular question. For example, if a study wants to investigate future low flows, it would be instructive to use a larger number of hydrologic model implementations versus downscaling method or GCM.
Citation: Chegwidden, O., B. Nijssen, D. Rupp, J.Arnold, M. Clark, J. Hamman, S.-C. Kao, Y. Mao, P. Mote, M. Pan, E. Pytlak, and M. Xiao, 2019: How do modeling decisions affect the spread among hydrologic climate change projections?. Earth’s Future, doi:10.1029/2018EF001047.
Source code repository: https://github.com/UW-Hydro/VIC
Funding and acknowledgements: This work was funded in part by the Bonneville Power Administration’s Technology and Innovation Program under grant TIP 304 to the University of Washington and Oregon State University. Additional support was provided by the United States Army Corps of Engineers Climate Preparedness and Resilience Programs and the Bureau of Reclamation under Cooperative Agreement R17AC00024 to the University of Washington. The authors thank Eric Salathé at the University of Washington who assisted in implementing the BCSD system. The MACA‐downscaled data were provided by John Abatzoglou and Katherine Hegewisch at the University of Idaho. We also thank Lieke Melsen and an anonymous reviewer for their insightful comments on this manuscript.
We presented the VIC-5 release and discussed its scaling performance in blog posts in 2016, but we finally published the manuscript that describes VIC-5 in detail. The paper was the culmination of nearly five years of model development work by the group to upgrade the model and was published in August 2018 in Geoscientific Model Development. Joe Hamman, a former member of the Computational Hydrology group and now a Project Scientist at the National Center for Atmospheric Research (NCAR) in the Climate and Global Dynamics (CGD) group, was the lead author. The new version of the model, VIC-5, was first released via GitHub in 2016 and subsequent releases occurred in 2018 as well.
The VIC model was originally developed at the University of Washington in the early 1990s by then-Ph.D. student Xu Liang for her dissertation work and quickly became one of the most commonly used hydrologic models for large domain hydrologic simulations. The first VIC paper was published in 1994 with Liang as the first author, and by 2017 the original Liang et al 1994 paper had reached over 2000 citations. Development of the Regional Arctic System Model (RASM), a fully-coupled regional earth system model that uses the VIC model as its land surface model, spurred a significant overhaul of the VIC modeling infrastructure to facilitate easier and more streamlined coupling with a fully-coupled climate model. Furthermore, members of the group wanted to facilitate easier collaboration between model users, since the user group had grown to include members across Asia, Africa, Europe as well as North and South America, thus development of the model migrated to GitHub.
From 2013 on, members of the group conducted a significant overhaul of the VIC model infrastructure. This overhaul consisted of five key components: 1) separation of model physics from input and output, 2) creating multiple model types (called “drivers”) to support different modeling applications, 3) parallel processing to enable fast large-domain simulations, 4) improvement of model documentation, and 5) a comprehensive test suite as a framework for collaborative model development. The most noticeable part of the model reconfiguration is the model “drivers”, which include the classic, image, python and the CESM drivers. The classic driver is the legacy VIC configuration supporting the traditional time-before-space mode, whereas the image driver is the space-before-time configuration, which supports Network Common Data Format (NetCDF) inputs and outputs. The CESM driver facilitates the coupling of VIC to the Community Earth System Model (CESM) infrastructure, which is currently implemented within the Regional Arctic System Model (RASM). The python driver includes Python bindings to all functions and data structures contained in the core of the VIC model.
Together, these improvements represent a major step forward for both the VIC model as well as the broader hydrologic modeling community. The model has now been “brought up to technological parity with other widely used hydrologic models”, said Ted Bohn, a Research Scientist at the University of Arizona who has been actively involved with VIC development and was also an author on the paper. Moreover, the entire VIC user community can now take advantage of thoroughly tested code, parallel computing, NetCDF input/output (IO), easier coupling to atmospheric models, and model development and collaboration hosted via GitHub. This is a seminal paper for hydrologic modeling and we expect that this model development framework will be used as a template for other hydrologic models going forward.
Citation: Hamman, J. J., B. Nijssen, T. J. Bohn, D. R. Gergel, and Y. Mao, 2018: The Variable Infiltration Capacity Model, Version 5 (VIC-5): Infrastructure improvements for new applications and reproducibility. Geoscientific Model Development, doi:10.5194/gmd-11-3481-2018.
Source code repository: https://github.com/UW-Hydro/VIC
Funding and acknowledgements: This research was supported in part by United States Department of Energy (DOE) grants DE-FG02-07ER64460 and DE-SC0006856 to the University of Washington and grant 1216037 from the National Science Foundation’s Science, Engineering, and Education for Sustainability (SEES) program. Supercomputing resources were provided through the United States Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) at the Army Engineer Research and Development Center (ERDC) and the Air Force Research Laboratory (AFRL). We thank Anthony Craig for his feedback during the design and implementation of the image and CESM drivers. We also thank Wietse Franssen and Iwan Supit for their contributions to the streamflow routing extension.
A new paper on the use of data assimilation to improve streamflow estimates was published earlier this year in the Journal of Hydrometeorology. The paper described the development of a framework to diagnose bottlenecks in data assimilation of satellite soil moisture in hydrological models for the purpose of improving simulated streamflow. Yixin Mao, who recently completed her Ph.D. in the Computational Hydrology group and has started a new position as Data Scientist at Salesforce, Inc. in San Francisco, was the lead author.
The goal of the paper was to explore how (and if) data assimilation of soil moisture states might improve simulated streamflow. It was part of a broader, multiyear collaboration between the Computational Hydrology group and Wade Crow, a Research Scientist at the United States Department of Agriculture (USDA) Agricultural Research Service (ARS) Hydrology and Remote Sensing Laboratory, located in Beltsville, MD. The collaboration, and study that resulted, comprised the bulk of Yixin’s doctoral research. Although data assimilation of satellite soil moisture has been used before to improve streamflow estimates, these past studies have shown mixed results. Some studies pointed to improvements in streamflow estimates, while others pointed to little improvement or even degraded streamflow estimates. The point of the broader study was to design a model framework that would quantitatively diagnose what factors contributed both to improvements and degradation in streamflow simulations.
To accomplish this, the authors set up a diagnostic model framework so that they could determine which factors were contributing to errors in hydrologic simulations, quantify those errors, and discern the extent to which data assimilation of soil moisture might correct the errors. The crux of the experiments tested the benefit of assimilating soil moisture measurements from the Soil Moisture Active Passive (SMAP) satellite into the Variable Infiltration Capacity (VIC) model in the Arkansas-Red river basin, and VIC-modeled runoff was then routed using the RVIC streamflow routing model and compared to USGS streamflow observations. They conducted a series of synthetic experiments to complement the experiment with SMAP, termed the “real data” experiment.
The authors found that approximately 60% of errors in runoff in the basin came from precipitation forcings rather than soil moisture states. In addition, systematic model errors (due to model structure and model parameters) dominated much of the remaining error and these errors cannot be “fixed” through data assimilation alone. They found that runoff with a slower response time was highly dependent on moisture in the bottom soil layer, but that the assimilated satellite surface soil moisture did not contain sufficient information about the deeper layer to significantly improve overall streamflow performance. These results highlight that correcting soil moisture states based on surface soil moisture alone is insufficient to significantly improve streamflow simulations. To achieve significant improvements in simulated streamflow, according to Yixin, future research efforts should focus more on precipitation forcing errors and model representations of runoff rather than the development of increasingly sophisticated data assimilation techniques for soil moisture states.
Citation: Mao, Y., W. T. Crow, and B. Nijssen, 2018: A framework for diagnosing factors degrading the streamflow performance of a soil moisture data assimilation system. Journal of Hydrometeorology, doi:10.1175/JHM-D-18-0115.1.
Funding and acknowledgements: This work was supported in part by NASA Terrestrial Hydrology Program Award NNX16AC50G to the University of Washington and NASA Terrestrial Hydrology Program Award 13-THP13-0022 to the United States Department of Agriculture, Agricultural Research Service. Yixin Mao also received a Pathfinder Fellowship by CUAHSI with support from the National Science Foundation (NSF) Cooperative Agreement EAR-1338606. The VIC model used in the study is available at https://github.com/UW-Hydro/VIC. Specifically, we used VIC version 5.0.1 with a modification to the calculation of drainage between soil layers (https://github.com/UW-Hydro/VIC/releases/tag/Mao_etal_stateDA_May2018). The DA code used in this study is available at https://github.com/UW-Hydro/dual_DA_SMAP.
A new paper by current and former members of the Computational Hydrology group was published in October 2018 in Water Resources Research. The paper described coupling a reservoir model and a stream temperature model in order to represent river temperatures in the Tennessee River Basin. Ryan Niemeyer, a former postdoctoral researcher in the Computational Hydrology group and now a postdoctoral researcher for the United States Department of Agriculture (USDA) at the University of California Santa Barbara, was the lead author.
The goal of the paper was to capture the impact of seasonal thermal stratification of a reservoir on stream temperatures located downstream of reservoirs. To do this, the authors first developed a two-layer model of a reservoir, including a layer for the hypolimnion (the lower layer of water in a lake) and the epilimnion (the upper layer), and treated both layers as well-mixed. They then coupled the two-layer reservoir model to the River Basin Model (RBM), a stream temperature model developed by John Yearsley (also an author on the study and an affiliate professor in the Computational Hydrology group). RBM represents river temperatures by taking in grid-based meteorological data and output from hydrologic models and then calculates temperatures by drawing on surface energy fluxes, mixing of associated tributaries as well as source and sink terms such as energy inputs from power plants that exist nearby. The transfer of energy is modeled using a semi-Lagrangian numerical method.
Although previous studies had used RBM to model stream temperatures in different parts of the globe, and other studies had used a two-layer reservoir model for understanding thermal stratification, coupling these two models together was a novel idea. The outcome of this new model framework was that the authors were able to capture the impacts of reservoirs on downstream water temperatures, which had not previously been done before. Past studies had treated reservoirs as well-mixed, rather than thermally stratified, which precluded the ability to capture these downstream impacts, as Yifan Cheng, a Ph.D. student in the Computational Hydrology group and second author on this study, discussed. By including these effects, the authors reduced the model bias in stream temperatures downstream of reservoirs. The improvement in model performance depended on the residence time of reservoirs: for a reservoir with a residence time of 92 days, the bias decreased from 6.7 to -1.2°C, whereas for a reservoir with a residence time of only 8 days, the bias decreased from 3.0 to -0.7°C. The scope of this study was limited to the Tennessee River Basin; however, it is applicable to other basins globally in which regulation impacts stream temperatures downstream of reservoirs. The model code for coupling the two-layer reservoir model and RBM is publicly available through GitHub.
Citation: Niemeyer, R., Y. Cheng, Y. Mao, J. Yearsley, and B. Nijssen, 2018: A thermally-stratified reservoir module for large-scale distributed stream temperature models with application in the Tennessee River Basin. Water Resources Research, doi:10.1029/2018WR022615.
Funding and acknowledgements: This project was funded in part by NOAA grant NA14OAR4310250 and NSF grant EFRI‐1440852 to the University of Washington. We also wish to thank the Tennessee Valley Authority for providing data and Tian Zhou (now at Pacific Northwest National Lab) for help with the hydrologic parameter calibration. The code for RBM and the two‐layer reservoir module is available on GitHub.
This week we released a new web site with streamflow projections under climate change for a large number of locations in the Columbia River and coastal drainages in Washington and Oregon State. These projections are the result of a four-year study, which is both an update and enhancement to a previous study conducted by the Climate Impacts Group (CIG) in 2010. The results of our study include projections of streamflow for about 400 locations on rivers throughout the Pacific Northwest through the end of the 21st century, reflecting impacts of projected changes in temperature and precipitation for the region.
The scientific goals of the study were two-fold. First, we wished to use the latest climate projections from global climate models to provide an updated suite of projections of climate change impacts on Pacific Northwestern hydrology. Second, we wanted to build upon the CIG study by investigating the effect of methodological choices on the uncertainty in projections. In other words, we wanted to explore how our modeling choices impact our results. To that end, we used multiple modeling decisions at four different steps in the hydrologic modeling process:
By taking the permutations of the above modeling choices, we developed a dataset including 172 different projections of hydrologic states (e.g. snow water equivalent, soil moisture) and fluxes (e.g. evapotranspiration, streamflow). The streamflow was then routed to develop time series at 400 sites of interest throughout the Pacific Northwest. The ensemble of different possible futures allows users to better understand the spread in streamflow changes.
The results are publicly available to users in academic, public, and private sectors. Given the diversity of stakeholders in the Columbia River Basin, we expect interest from users in communities of fisheries, hydroelectric power generation, water availability planning, flood risk management, among others. We look forward to seeing the variety of ways the dataset will be used!
Funding: This study was partly funded by the Bonneville Power Administration as part of its Technology and Innovation Program (project BPA TIP304 to the University of Washington and Oregon State University), with additional funding to the University of Washington from the United States Bureau of Reclamation and the United States Army Corps of Engineers.
Last month, we released VIC 5.0.0, a major rewrite of the user interface to and the infrastructure of the VIC model. We introduced this release in a recent blog post. This rewrite included a number of upgrades that will allow VIC to perform much better in high-performance computing (HPC) environments. In this blog post, I will discuss some of these improvements and illustrate how they may facilitate VIC applications at scales that were previously computationally infeasible.
VIC was originally intended to be used as a land surface scheme (LSS) in general circulation models (GCMs) (Liang et al., 1994). However, the original source code was written as a stand-alone column model. In this configuration, distributed simulations were run one grid cell at a time, where the model would complete a simulation for a single grid cell for all timesteps prior to moving on to the next grid cell. We call this configuration “time-before-space”. From an infrastructure perspective, this meant VIC did not need built-in tools for parallelization or memory management for distributed simulations. Large scale (e.g. regional or global), distributed simulations were made possible through what we refer to as “poor man’s parallelization” where each grid cell was simulated as a separate task (even on a separate computer). For the past 15 years, this parallelization strategy has been sufficient for many VIC applications.
The development of VIC 5 was largely motivated by a number of limitations that were related to the legacy configuration of the VIC source code. First, despite this being one of VIC’s original goals, the “time-before-space” loop order precluded VIC’s direct coupling within GCMs, which typically require a “space-before-time” evaluation order. Second, VIC’s input/output (I/O) was designed to work with its “time-before-space” configuration. This meant there were individual forcing and output files for each grid cells. For large model domains, such as the Livneh et al. (2015) domain which included more than 330,000 grid cells, the sheer number of files that VIC users were having to deal with began to be the most challenging part of working with VIC. It also precluded the use of standard tools (such as ncview, cdo, xarray and others) that are designed to visualize and analyze large data sets. Not being able to visualize model output easily hampers model application and development. Third, a number of important hydrologic problems require a “space-before-time” configuration. For example, if upstream flow needs to be taken into account for local water and energy balance calculations, then it is necessary to perform routing after each model timestep. Routing requires information about the entire domain. While there are ways to accommodate this with prior VIC versions, these implementations tend to be awkward and aim to circumvent the “time-before-space” configuration. Implementation of a true “space-before-time” mode simplifies the integration of routing with the rest of the VIC model.
The largest change introduced in VIC 5 is the notion of individual drivers that all call the same physics routines. Because VIC has such as large user community and there are many ongoing projects that will use the legacy “time-before-space” configuration, we provided a classic driver, which essentially functions the same way as previous VIC implementations. For the reasons described above, most of our work on VIC 5 was focused on the development of a “space-before-time” configuration. We’re calling this the image driver.
The image driver has two main infrastructure improvements relative to the classic driver. First, we’ve incorporated a formal HPC parallelization strategy in the form of a Message Passing Interface (MPI). MPI is a standardized communication protocol for large parallel applications (out-of-core and off-node). In the context of VIC, it allows for the simultaneous simulation of thousands of grid-cells, distributed across a cluster or super-computer. Second, we’ve overhauled and standardized VIC’s I/O. Whereas previous versions of VIC used custom ASCII or binary file types, the image driver uses netCDF for both input and output files. NetCDF has been widely adopted throughout the geoscience community and offers three main advantages: first, it stores for N-dimensional binary datasets; second, it provides a standardized method for storing metadata along with the data; and third, there are many tools for visualizing and analyzing NetCDF files.
In this section, we will show an example of how VIC can be applied in parallel on a large super-computer. We’re showing results from two 1-year test simulations run using the image driver on the RASM model domain. The RASM model domain has about 26,000 land grid cells and includes the the entire pan-Arctic drainage. Both of these simulations were run using 3-hourly forcing inputs, a 3-hour model timestep, frozen soils, and minimal output (only 8 variables written once monthly). We ran these tests on the Topaz super-computer at the U.S. Department of Defense Engineer Research and Development Center Supercomputing Resource Center (ERDC DSRC). Topaz is a 124,416 core SGI ICE X machine, made up of 3,456 nodes (36 cores/node) capable 4.62 PFLOPS.
The first simulation we’re going to look at was run in “Water Balance” mode (
FULL_ENERGY = FALSE).
When VIC is run in this mode it doesn’t iterate to find the surface temperature, resulting in much faster run times at the expense of model complexity.
The figure below shows the model throughput (left axis) for 15 identical VIC simulations run using between 1 and 432 MPI processes.
Using just 1 processor (no MPI), this VIC configuration has a model throughput of 27 model-years/wall-day (where one wall-day short for “wall-clock day” which is simply a calendar day in real time).
Using 72 processors, we see the model throughput increase 334 years/wall-day.
Beyond 36 processors, the throughput begins to plateau and the scaling efficiency (right axis) is too low to push the scaling any further.
Next we’ll look at the scaling performance of VIC run in “Energy Balance” mode (
FULL_ENERGY = TRUE).
When run in this configuration, we expect the model to be much more expensive.
In fact, we find that at 36 cores (one node), the throughput in the “Energy Balance” simulation is about 36 times less than the “Water Balance” simulation.
Because this configuration is so much more expensive, the scaling efficiency is substantially better.
In this case, the model throughput doesn’t really plateau after 864 processors where we’re able to get a throughput of 97 model-years/wall-day.
Our initial analysis of the VIC scaling shows us that for complex model configurations, we get reasonable scaling. With the scaling we’re seeing here, we can conceivably run 1000 year spinup simulations of VIC in the Arctic in 2-3 weeks. Another way to think about the potential here is in terms of applying VIC in large ensembles where 100s of ensemble members are run and readily analyzed using tools optimized for use with netCDF output. For simple VIC configurations (e.g. “Water Balance” only), it seems that VIC is not going to scale much beyond a single node, but it still benefits from multiple cores on the same node. The reason for this difference in scaling behavior lies in the interplay between the digital volume of input forcings (read on the master MPI processor and scattered to the slave processors) and computation performed on slave processors. As the input volume goes up or the computation time spent by individual slave processors goes down, the scaling performance decreases.
We are developing tools that automate these scaling tests. We also have a number of issues slated for future development that we believe will improve the scaling of VIC. The issues we think hold the most potential would add parallel netCDF I/O and on-node shared memory threading. Both of these features would reduce the overhead introduced by MPI.
Supercomputing resources were provided through the Department of Defense (DOD) High Performance Computing Modernization Program at the Army Engineer Research. Tony Craig, a member of the RASM team, also contributed to development of this blog.
Today we released VIC 5.0.0, a major upgrade to the VIC model infrastructure and the basis for all future releases of the VIC model. The new release is available from the VIC GitHub repository. Documentation is provided at on the VIC documentation web site.
Following this release, no further releases will be made as part of the VIC 4 development track except for occasional bug fixes to the support/VIC.4.2.d branch.
The VIC 5.0.0 release is the result of three years of concerted effort in overhauling the VIC source code to allow for future expansion and for better integration with other models. More details on individual model features will be provided on this web site in a series of posts over the next few weeks. Full details can be found on the VIC documentation web site and in the various issues that have been tracked on the VIC GitHub repository.
Note that the VIC 5.0.0 release includes many infrastructure upgrades, but that the model calculations provide the same results as can be obtained with VIC 4.2.d. This was a conscious decision to limit the number of simultaneous changes that were implemented in the source code.
Major changes in VIC 5.0.0 include:
a clean separation of model physics from the model driver: All model physics are now contained in a vic_run module and different model drivers, that manage input/output, initialization, memory allocation, all interact with the same vic_run module.
multiple model drivers: For historic reasons, VIC always ran in a time before space mode in which every model element (or grid cell) was run to completion before advancing to the next model element. While this had advantages, this mode made it much more difficult to interact with other models which use a space before time mode in which the entire model domain is completed for one time step. We have retained the original behavior as part of a classic driver and have implemented the space before time behavior as part of an image driver. We are also creating a CESM driver (which is currently still being tested) and have a python driver that can be used to test individual model functions.
netCDF file format: While the classic driver uses (nearly) the same ASCII format for input and output as earlier versions of VIC, all input and output in the image driver uses netCDF. This includes spatial model parameters, meteorological forcing data, model output and model state files.
exact restarts: When operated in image mode, the VIC model now has byte-exact restarts. That means that if you run the model in a single simulation, the results are exactly the same as when you run the model in shorter increments and restart each time from a model state file that was generated at the end of the previous simulation. Because the classic mode uses ASCII state files, restarts are not byte-exact, but the behavior has been much improved over previous versions of VIC.
parallelization: In classic mode (time before space), VIC could easily be parallelized by breaking the domain into separate pieces and running each piece on a separate processor and/or node. In image mode this is a less desirable solution. Although parallelization could be implemented that way, it reduces the advantages of having a space before time mode in which you may want to treat the entire domain as a single entity. The image mode therefore uses MPI to allow parallelization of the code over a large number of nodes or processors. We have run tests with VIC 5.0.0 in which we used more than three thousand processors.
separation of the generation of the meteorological forcings: We have removed the MTCLIM code that was used to estimate meteorological forcings at sub-daily time steps given limited inputs from the VIC source code. This means that the user will need to generate all the meteorological forcing data outside of VIC. This will likely be the single largest impact on users who are only interested in the classic mode and who would like to upgrade to VIC 5.0.0.
continuous integration and model testing: As part of the VIC GitHub repository all code changes are automatically subjected to a large number of tests. We currently use travis to automatically build and run the model on a number of different architectures and with a number of different compilers. As part of this, we also test model functionality in an automated manner. In addition, we have designed a science test suite which is run before model releases in which model output is compared with observations and previous model simulations to assess the effect of model changes in a large range of environments.
improved documentation: All documentation has been updated and is available on the VIC documentation page. Note that documentation for previous versions of VIC is available in the same location.
extensive code cleanup and reformatting.
We encourage everyone to upgrade to VIC 5.0.0 and to contribute to further model development.
You can report bugs and issues on the VIC GitHub repository.
If you have questions, please refer them to the VIC user mailing list.
I am looking for two motivated and qualified postdoctoral research associates (postdocs) to join the UW Hydro | Computational Hydrology group. One is on an NSF-funded project to study hydrological and stream temperature changes under climate change, while the other is on a NASA-funded project to develop infrastructure that supports hyper-resolution, large ensemble, continental-scale hydrologic modeling. You can find details about the position on our join page.
Update (January 15, 2016): Both positions have been filled. Future openings will be posted on our join page.
In collaboration with Martyn Clark’s group at NCAR/RAL, we released the source code for SUMMA, a new hydrologic modeling code that advances a unified approach to process-based hydrologic modeling and enables controlled and systematic evaluation of multiple model representations. This has been some time in the making and will potentially change a lot of the model development, application and evaluation activities in the group.
SUMMA stands for the the Structure for Unifying Multiple Modeling Alternatives and is described in detail in two papers that recently appeared in Water Resources Research and in an NCAR Technical Note. The source code (and details about the papers) can be found on the SUMMA github page and sample data sets can be obtained from the SUMMA page at NCAR.
In future posts, I will provide some samples of SUMMA’s capabilities.
One of the projects that the group has been involved with over the past few years is the Integrated Scenarios of the Future Northwest Environment project, together with groups at Oregon State University and the University of Idaho. The goal of the project was to evaluate changes in climate, hydrology and vegetation in the Pacific Northwest during the rest of this century. In our group, Matt Stumbaugh, Diana Gergel, Dennis Lettenmaier, and myself have been involved with model simulations and analysis. While the analysis is ongoing, the model runs to support this project have been completed and the model output is available for other groups to use in their research. There is an interesting article about the project in a new annual magazine from the Northwest Climate Science Center: Northwest Climate Magazine. The Integrated Scenarios article can be found here.
There has been a spate of new publications (co-)authored by the UW Hydro | Computational Hydrology group. You can always get the latest on our publications page, but here is a brief overview of the papers published so far this year:
Mishra et al., Changes in observed climate extremes in global urban areas, Environmental Research Letters, doi:10.1088/1748-9326/10/2/024005:
A look at changes in daily, historic extremes in temperature, precipitation and wind in urban areas around the globe over th period 1973-2012. Main takeaways: “[…] urban areas have experienced significant increases […] in the number of heat waves during the period 1973–2012, while the frequency of cold waves has declined”. At the same time “[e]xtreme windy days declined”.
Roberts et al., Simulating transient ice–ocean Ekman transport in the Regional Arctic System Model and Community Earth System Model, Annals of Glaciology, doi:10.3189/2015AoG69A760:
This is one of the first publications that stems from the Regional Arctic System Model (RASM). The study demonstrates that high-frequency coupling is necessary to represent certain transport processes in the Arctic Ocean. In particular, “[t]he result suggests that processes associated with the passage of storms over sea ice (e.g. oceanic mixing, sea-ice deformation and surface energy exchange) are underestimated in Earth System Models that do not resolve inertial frequencies in their marine coupling cycle.”
Vano et al., Seasonal hydrologic responses to climate change in the Pacific Northwest, Water Resources Research, doi:10.1002/2014WR015909:
The development of a methodology that allows rapid evaluation long-term changes in seasonal hydrographs based on global climate model output and an application in the Pacific Northwest (PNW). Withing the PNW, “[…] transitional (intermediate elevation) watersheds experience the greatest seasonal shifts in runoff in response to cool season warming.”
Clark et al., A unified approach for process-based hydrologic modeling: 1. Modeling concept, Water Resources Research, doi:10.1002/2015WR017198:
The first paper describing the concepts behind SUMMA (Structure for Unifying Multiple Modeling Alternatives), which potentially has a big impact on the work done in our group. While we will continue to support VIC and DHSVM for the foreseeable future, much of our model development effort over the next few years will shift towards SUMMA. SUMMA “[…] formulates a general set of conservation equations, providing the flexibility to experiment with different spatial representations, different flux parameterizations, different model parameter values, and different time stepping schemes.” The goal of this modeling approach is for SUMMA to “help tackle major hydrologic modeling challenges, including defining the appropriate complexity of a model, selecting among competing flux parameterizations, representing spatial variability across a hierarchy of scales, identifying potential improvements in computational efficiency and numerical accuracy as part of the numerical solver, and improving understanding of the various sources of model uncertainty.”
Clark et al., A unified approach for process-based hydrologic modeling: 2. Model implementation and case studies, Water Resources Research, doi:10.1002/2015WR017200:
The second paper in this two-part series discusses the SUMMA implementation and provides some initial applications to demonstrate SUMMA’s potential.
Mao et al., Is climate change implicated in the 2013–2014 California drought? A hydrologic perspective, Geophysical Research Letters, doi:10.1002/2015GL063456:
California has been subject to a major drought over the last few years (with no relief in sight as of this writing). In this paper, we examined whether the 2012-2014 drought was within the range of historic droughts and to what extent warming during the past century has contributed to the below-average mountain snow pack in 2013-2014. “We find that the warming may have slightly exacerbated some extreme events (including the 2013–2014 drought and the 1976–1977 drought of record), but the effect is modest; instead, these drought events are mainly the result of variability in precipitation.”
Clark et al., Continental Runoff into the Oceans (1950-2008), Journal of Hydrometeorology, doi:10.1175/JHM-D-14-0183.1:
This paper (by a different Clark than the above two papers – Liz rather than Martyn) provides a new set of estimates of freshwater discharge to the oceans, based on a merger of observations and model simulations. “We estimate that flows to the world’s oceans globally are 44,200 (± 2660) km3 yr-1 (9% from Africa, 37% from Eurasia, 30% from South America, 16% from North America, and 8% from Australia-Oceania). These estimates are generally higher than previous estimates, with the largest differences in South America, and Australia-Oceania.”
With the move of Dennis Lettenmaier and part of his group to UCLA last fall, it is time for a new name and web site for what used to be the Land Surface Hydrology Group at the University of Washington. We’ll maintain the old link for a little while, but any updates will happen on this new web site. To more accurately reflect what the group does and where we are headed, we are now the UW Hydro | Computational Hydrology group. Existing content such as information about our models, data sets, and projects has been maintained, but you may need to look around to find it. Other parts of the web site have seen more drastic change to reflect the current make-up and activities within the group. We’ll use the web site to provide periodic updates on projects, publications and new activities.
web site (1) publications (2) integrated scenarios (1) code (4) model (4) summa (1) Hiring (1) vic (4) hpc (1) climate change (2) Columbia River (2) Pacific Northwest (1) stream temperature (1) reservoirs (1) RBM (1) data assimilation (1) remote sensing (1) paper (3) information theory (1) VIC (1) SUMMA (1) PRMS (1) transfer entropy (1)
Subscribe to this blog via RSS.