We have developed a software tool, the SOM-Assisted Hazard Area Risk Analysis (SAHARA), to reduce large climate datasets to more manageable sizes - yet statistically similar - which are then used to produce ensembles of potential hazard outcomes.The Self-Organizing Map (SOM) is a machine learning / data clustering algorithm which is well-suited for data that have strong topological properties. By employing the SOM algorithm to analyze topological patterns of climatological fields over a regional domain for a 30 year span, we can find a close statistical equivalent with fewer, non-contiguous input days. When using SOMs to cluster monthly climate data in this way, we find that by sampling only 150 days, it reduces computational time by greater than a factor of 6 compared to using the entire climate dataset. (See Figure 1)

Figure 1
Figure 1
Figure 2
Figure 2

The SAHARA software can scale from a laptop to workstations to many-core, many-node clusters by using a modern microservice architecture to distribute the Climate Database (CSFR currently), the SOM Engine, atmospheric model ensembles (such as the SCIPUFF Transport and Dispersion model) and pre- and post-processing across available computing resources, either locally or remotely. (See Figure 2)

We are currently in the process of adding the Weather Research and Forecasting (WRF) model as an additional workflow component to provide on-demand dynamical downscaling to increase the fidelity of simulations with high spatial and temporal resolution requirements, such as Urban-scale events.

Additional planned features and capabilities include:

  • Output compatibility with GIS (Geographical Information System) tools for interactive analysis and post-processing
  • Integration with large HPC systems, such as those at the DOD HPCMP centers
  • Seasonal-based forecasts of hazard areas, with user-configurable time periods