Data Mining
The data mining field encompasses a wide variety of application domains including astronomy, atmospheric science, bioinformatics, business, computer vision, economics, high energy physics, medical imaging, molecular chemistry, robotics, security, surveillance and so on. Often the algorithms that have demonstrated capability in one domain have applicability in many of the other domains as well. By utilizing data mining technology, RAL has been able to leverage the latest data mining technology in nowcasting and forecasting severe weather for the aviation industry.
The data mining procedure typically involves the following stages:
- Preparation and selection of suitable input data
- Organization of input data into training and test data sets
- Running the data mining software on both the training and evaluation data sets.
- Assessing the data mining results for the various techniques.
For example, in the National Ceiling and Visibility project we have collected approximately 20 years of site-based meteorological reports including ceiling, visibility, temperature, dew point, wind speed, wind direction, etc. This data is organized into training and test data files. The choice of what to include in the training and test data sets is often a difficult one. For example, in a single training case, should one utilize observations from only a single weather station site or should observations from multiple sites close in proximity be utilized?
Once we have come up with an organizational schema of the input data, we then produce the appropriate training and test set files and run the data mining software. Here we will typically apply a number of different data mining techniques such as support vector machines, random forests, decision trees, etc.
After running the data mining software packages, we will then collect the output results and assess performance. The general data mining algorithms will often compare quite favorably with or will outperform custom designed techniques.
