somspace is an R package for spatial classification of time series. Using Self-Organizing Map (SOM), a popular unsupervised learning algorithm [Kohonen 1984], somspace provides the tools to generate meaningful representations that capture the main spatial patterns of the analyzed time series. A quick introduction to SOM can be found in Wikipedia, while more demanding readers can get more insight by reading some of the numerous papers on the subject.
In this first presentation of the somspace package, we will follow the steps of our study on the spatial patterns of European hydroclimate. The aim of the study was to determine homogeneous regions of hydroclimatic variability, derived by the fluctuations of a single variable, namely the self-calibrated Palmer Drought Severity Index (scPDSI).
The dataset used is the Old World Drought Atlas (OWDA), a tree-ring
reconstruction of the scPDSI over Europe and parts of Northern Africa
and Middle East for the last 2,000 summers Cook et al., 2016.
The OWDA has been compiled by the spatial regression of 106 tree-ring
records to a map with 5414 half-degree grid cells at a 0.5 x 0.5
resolution. Here, we use a subset [Markonis et al. 2018],
which optimizes the temporal and spatial distribution of the dataset.
The resulting data grid covers 35.25ºN – 62.75ºN and 4.25ºW – 36.25ºE
(2647 grid cells) for the period 992–2012 and is freely available. Alternatively, the owda data object can also be used without any downloadingas. However, it covers an even smaller period of OWDA (500 years).
library(somspace)
str(owda)## Classes 'data.table' and 'data.frame': 1355264 obs. of 4 variables:
## $ Time : int 1501 1501 1501 1501 1501 1501 1501 1501 1501 1501 ...
## $ Lat : num 37.2 37.8 38.2 38.8 39.2 ...
## $ Lon : num -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
## $ scPDSI: num -0.623 0.168 0.312 -0.722 -0.392 ...
## - attr(*, "sorted")= chr "Time" "Lon" "Lat"
## - attr(*, ".internal.selfref")=<externalptr>
The package is built around the som() function of kohonen package. However to apply it in spatial data, we first have to create a somin object. The transformation of raw data to somin class is performed through the sominp() function. Its sole argument is the original dataset as a 4-column data.table object, containing: time, latitude, longitude and the values
of the variable to be classified. Even though the input dataset can be
gridded or not, with regular or irregular time step, the time series
must have equal length and no missing values.
inp <- sominp(owda) #may take some time depending on dataset size
str(inp)The somin class is a list with three elements. The first one, input_for_som, is a matrix that will be used by kohonen::som() function to generate the SOM. The second one, coords, is a data.table with the point ids and their coordinates. The last one, input_dt is also a data.table with the raw data that were used to generate input_for_som.
The generation of SOM is quite straightforward:
owda_som <- somspa(inp)This creates a somsp object, which contains the som, the nodes properties and the original data.
Since somspa() uses the function som(), it can also take som() arguments. For instance, to create a 6 x 6 map after 1000 iterations:
owda_som <- somspa(inp, grid = somgrid(6, 6), rlen = 1000)The somsp objects can be easily plotted by plot() or summarized by summary() functions.
plot(owda_som)summary(owda_som)## node node_counts node_lat node_lon node_sd_lat node_sd_lon
## 1: 1 106 51.25 9.50 1.8785269 2.925263
## 2: 2 111 53.25 10.00 1.8710913 3.693948
## 3: 3 117 59.25 11.50 1.7241309 3.704834
## 4: 4 81 60.25 16.50 3.2110919 3.919325
## 5: 5 113 57.25 24.50 2.5492928 3.843344
## 6: 6 114 60.75 31.25 1.2543048 3.606703
## 7: 7 174 49.75 1.00 2.1143243 4.050415
## 8: 8 29 55.75 -3.00 3.3083526 6.507858
## 9: 9 83 53.25 22.00 1.4618018 2.787746
## 10: 10 110 56.25 29.00 2.1315621 3.052242
## 11: 11 32 61.25 7.25 2.4560656 11.346450
## 12: 12 86 53.50 33.25 1.2164091 2.334663
## 13: 13 104 46.25 2.00 1.8238319 2.478808
## 14: 14 67 50.25 15.50 1.0899711 2.045751
## 15: 15 29 51.25 20.00 0.6952513 1.479956
## 16: 16 34 52.25 28.00 0.8550208 1.085166
## 17: 17 33 57.25 33.50 3.5664020 3.505475
## 18: 18 30 50.75 32.75 0.5632418 1.679919
## 19: 19 87 43.25 1.50 1.0614947 3.586019
## 20: 20 46 48.25 16.50 0.9834991 1.863196
## 21: 21 32 51.25 24.50 0.8131976 2.101267
## 22: 22 44 49.75 25.00 1.8381959 2.604553
## 23: 23 59 48.25 31.00 0.9530167 1.827528
## 24: 24 63 47.75 29.50 2.1835277 4.240604
## 25: 25 51 37.75 -0.50 1.5264338 3.651376
## 26: 26 47 47.75 12.50 0.8872621 2.353231
## 27: 27 55 48.25 22.00 0.9447712 1.712747
## 28: 28 65 46.25 26.00 1.3095434 1.452989
## 29: 29 73 44.75 11.50 1.6308363 2.924191
## 30: 30 64 41.25 31.75 3.6701063 3.323647
## 31: 31 44 36.25 7.00 0.4726953 2.438352
## 32: 32 58 40.75 -2.00 1.5635606 3.125041
## 33: 33 106 45.25 19.00 1.1285023 2.116550
## 34: 34 73 40.75 20.00 2.2865499 5.240233
## 35: 35 104 42.75 24.00 1.4239414 3.245922
## 36: 36 123 38.75 32.00 1.1954198 4.008255
## node node_counts node_lat node_lon node_sd_lat node_sd_lon
summary offers an overview of the classification
results. In our example, we have generated a 6 x 6 classification
resulting to 36 nodes. Each of them contains a varying number of grid
cells ranging from 29 (node 8) to 174 (node 7). Their latitude and
longitude are also available as the mean of their corresponding grid
cells, as well as their standard deviations. The latter can be used as a
preliminary measure of node homogeneity.
Additionaly, the average time series of specific SOMs can be plotted by plot_ts():
plot_ts(owda_som, n = 12)The time series can also be manually extracted and further analysed or plotted:
node_12 <- owda_som$input_dt[node == 12, .(node, time, variable)] #in version 1.1.0 `variable` is changed to `values`Multiple time series can also be plotted, by adding a vector argument in n.
plot_ts(owda_som, n = c(1, 7, 21, 33)) plot_ts(owda_som, 1:max(owda_som$summary$node)) #plots all somsThe number of regions with similar characteristics can be further reduced by hierarchical cluster analysis implemented in somregs() function. Instead of returning a pre-defined number of regions, somregs() provides the series of homogeneous regions starting from 2 up to nregions and returns them as a regs
object. In each new classification one of the prior regions is divided
and the sub-region with the less nodes gets a new id. This can be useful
for keeping track of the classification process and observing which
regions remain undivided as the number of splits increases.
Except for the somsp and nregions, other arguments used in the hclust() function can be added, e.g., method for choosing the hierarchical cluster algorithm.
owda_regions <- somregs(owda_som, nregions = 13, method = "ward.D2") The output regs object are a two-element list. The regions element contains the corresponding information about the regions and the input_dt the time series of the original dataset. Similarly to somsp objects, regs can be plotted as maps or time series.
plot(owda_regions, regions = 2:13, nrow = 3, ncol = 4)plot_ts(owda_regions, n = 9)In this example, we can see that hydroclimatic variability over France and Germany is quite homogeneous, as this region remains undivided for a large number of classifications. On the contrary, Mediterranean appears to be divided into smaller regions in almost every step of the classification process.
Except for a qualitative comparison of the classification, we can
apply various statistical criteria to see how the heterogeneity between
regions varies as the number of regions increases (here using the clusterCrit package).
library(clusterCrit)
x <- owda_som$som$codes[[1]]
setorder(owda_regions$regions, node)
classifications <- unique(data.frame(node = owda_regions$regions$node, owda_regions$regions[, 11:22]))
cluster_comparison <- as.matrix(intCriteria(x, as.integer(classifications[, 2]), "all"))
for(i in 3:13){
cluster_comparison <- cbind(cluster_comparison,
intCriteria(x, as.integer(classifications[, i]), "all"))
}
colnames(cluster_comparison) <- 2:13
cluster_comparison <- apply(cluster_comparison, 2, unlist)
colnames(cluster_comparison) <- 2:13
cluster_comparison <- data.table(melt(cluster_comparison, variable.factor = T))
colnames(cluster_comparison) <- c("Method", "nRegions", "Value")
homo_methods <- c("davies_bouldin", "c_index",
"calinski_harabasz", "sd_scat",
"sd_dis", "wemmert_gancarski")
ggplot(cluster_comparison[Method %in% homo_methods], aes(x = nRegions, y = Value)) +
geom_point() +
geom_line() +
facet_wrap(~Method, scales = "free") +
theme_bw()For OWDA, most of the selected measures present milder decline in heterogeneity after the classification in 9 regions. This implies that this number might sufficiently represent the spatial heterogeneity of European hydroclimate. Of course, the suggested division should be carefully studied to see whether it agrees with similar findings in other studies and other measures as well.
Another useful feature of somspace is the analysis of spatial dependencies between regions with the complex networks methodology [Tsonis et al. 2006]. Any of the classificationa stored in the regs object can be analysed through complex networks and plotted as map with cnet(). The threshold, thres, refers to the cross-correlation coefficient between the aggregated time series of each region; only values above thres are considered to be network connections.
cnet(owda_regions, n = 9, thres = 0.3)We can see that for a classification of n = 9 regions
there is a clear zonal (west-to east) orientation of the spatial
dependencies in European hydroclimate. The Iberian Peninsula, appears
not to be possitively correlated with any of the other regions. If the
cross-correlation matrix is studied, it can be seen that it is actually
negatively correlated. This is one of the limitations of the complex
networks approach and this is why the cross-correlation matrix should always be checked.
Users of the somspace package or readers interested in European hydroclimatic variability might also find the following references useful.
Cook, E. R., Seager, R., Kushnir, Y., Briffa, K. R., Büntgen, U., Frank, D., … & Baillie, M. (2015). Old World megadroughts and pluvials during the Common Era. Science Advances, 1(10), e1500561. Link
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69. Link
Markonis, Y., Hanel, M., Máca, P., Kyselý, J., & Cook, E. R. (2018). Persistent multi-scale fluctuations shift European hydroclimate to its millennial boundaries. Nature communications, 9(1), 1767. Link
Markonis, Y., & Strnad, F. (2019). Representation of European Hydroclimatic Patterns with Self-organizing Maps. EarthArXiv. May, 14. Link
Tsonis, A. A., Swanson, K. L., & Roebber, P. J. (2006). What do networks have to do with climate?. Bulletin of the American Meteorological Society, 87(5), 585-596. Link
Wehrens & J., & Kruisselbrink, J. (2018). Flexible Self-Organising Maps in kohonen 3.0. Journal of Statistical Software, 87, 7. Link