# Variogram modelling for kriging in Surfer - a tutorial

This tutorial is modified from the original 'Variogram Tutorial' written by Randal Barnes for Golden Software. This updated version refers to the graphics and terminology of the Variogram modeling page in the Grid Data dialog in Surfer 17 and newer.

### 1. Introduction

The variogram characterizes the spatial continuity or roughness of a data set. Ordinary one-dimensional statistics for two data sets may be nearly identical, but the spatial continuity may be quite different. Refer to Section 2 for a partial justification of the variogram.

Variogram analysis consists of the experimental variogram calculated from the data and the variogram model fitted to the data. The experimental variogram is calculated by averaging one-half the difference squared of the z-values over all pairs of observations with the specified separation distance and direction. It is plotted as a two-dimensional graph. Refer to Section 3 for details about the mathematical formulas used to calculate the experimental variogram.

The variogram model is chosen from a set of mathematical functions that describe spatial relationships. The appropriate model is chosen by matching the shape of the curve of the experimental variogram to the shape of the curve of the mathematical function.

Refer to the Surfer User's Guide topic Variogram Model Graphics in the Surfer Help for graphs illustrating the curve shapes for each function. To account for geometric anisotropy (variable spatial continuity in different directions), separate experimental and model variograms can be calculated for different directions in the data set.

### 2 What does a variogram represent?

Consider two synthetic data sets; we will call them A and B. Some common descriptive statistics for these two data sets are given in Table 1.1.

 A B Count 15251 15251 Average 100.00 100.00 Standard Deviation 20.00 20.00 Mediation 100.35 100.92 10 Percentile 73.89 73.95 90 Percentile 125.61 124.72

Table 1.1 Some common descriptive statistics for the two example data sets.

The histograms for these two data sets are given in Figures 1.1 and 1.2. According to this evidence the two data sets are almost identical.

However, these two data sets are significantly different in ways that are not captured by the common descriptive statistics and histograms. As can be seen by comparing the associated contour plots (see Figures 1.3 and 1.4), data set A is rougher than data set B. Note that we can not say that data set A is "more variable" than data set B, since the standard deviations for the two data sets are the same, as are the magnitudes of highs and lows. The visually apparent difference between these two data sets is one of texture and not variability.

In particular, data set A changes more rapidly in space than does data set B. The continuous high zones (red patches) and continuous low zones (blue patches) are, on the average, smaller for data set A than for data set B. Such differences can have a significant impact on sample design, site characterization, and spatial prediction in general.

It is not surprising that the common descriptive statistics and the histograms fail to identify, let alone quantify, the textural difference between these two example data sets. Common descriptive statistics and histograms do not incorporate the spatial locations of data into their defining computations.

The variogram is a quantitative descriptive statistic that can be graphically represented in a manner which characterizes the spatial continuity (i.e. roughness) of a data set. The variograms for these two data sets are shown in Figures 1.5 and 1.6. The difference in the initial slope of the curves is apparent.

### 3 What is a variogram?

The mathematical definition of the variogram is:

where  is the value of the variable of interest at location location , and  is the statistical expectation operator. Note that the variogram, , is a function of the separation between points , and not a function of the specific location . This mathematical definition is a useful abstraction, but not easy to apply to observed values.

Consider a set of n observed data: , where  is the location of observation , and  is the associated observed value. There are  unique pairs of observations. For each of these pairs we can calculate the associated separation vector:

When we want to infer the variogram for a particular separation vector, , we will use all of the data pairs whose separation vector is approximately equal to this separation of interest:

Let  be the set of all such pairs:

Furthermore, let  equal the number of pairs in . To infer the variogram from observed data we will then use the formula for the experimental variogram:

That is, the experimental variogram for a particular separation vector of interest is calculated by averaging one-half the difference squared of the z-values over all pairs of observations separated by approximately that vector.

### 4 The variogram grid

If there are n observed data, there are n(n - 1)/2 unique pairs of observations. Thus, even a data set of moderate size generates a large number of pairs. For example, if n = 500, n(n - 1)/2 = 124,745 pairs. The manipulation of such a large number of pairs can be time consuming, even for a fast computer. Surfer pre-computes all of the pairs and stores the necessary sums and differences in the variogram grid. (Note: a variogram grid is not the same format as a grid used in creating a map). To view the variogram and variogram grid used during kriging:

Figure 4.2 Select the data file and gridding method in the Grid Data dialog.

1. Click Home | Grid Data | Grid Data
2. Choose Kriging as the Gridding Method
3. Choose the ExampleDataSetC.xls data file in the Surfer Samples directory by clicking the Browse button next to the Dataset 1 field
4. Specify the X, Y, and Z columns as columns A, B and C respectively
5. Click the Filter Data button to specify Duplicates data and Data Exclusion filtering, if desired
6. You can also the data using the View Data and Statistics buttons.
7. Click Next to view the variogram

Figure 4.3 Resulting variogram with default variogram settings using ExampleDataSetC.xls.

The variogram plot is displayed at the top of the dialog. The red line with the dots is the omni-directional experimental variogram, while the blue line is a first pass (albeit a poor one) at a fitted variogram model. The numbers on the experimental variogram line corresponds with the number of data pairs within each lag.

The variogram surface (the circular graphic at the bottom of the dialog) shows the values of the variogram function for all directions and distances. The center of the surface is at pair distance 0, and the radius is the maximum lag distance (lag size x number of lags). Each cell of the surface graphic is a square equal to the lag size. It is color coded to display the values of the estimate. If a cell has no pairs in it, then it is left blank. Hovering the mouse over the variogram surface will display a tooltip showing the value of the variogram function for that the cell and the number of pairs that were used to compute that value. The variogram plot shows data from a 'pie wedge' extracted and averaged from this surface. The pie wedge direction and angular arc are controlled by the Direction and Tolerance properties in the property list on the right. These settings can also be manipulated interactively by sliding the controls on the graphic itself.

### 5 Modeling the omni-directional variogram

By default, this first plot above is the omni-directional variogram (the Tolerance is 90 degrees). We choose the appropriate model (component) type, the partial sill, and the nugget effect based upon the omni-directional variogram.

#### 5.1 Selecting the variogram model type

There are infinitely many possible variogram models. Surfer allows for the construction of thousands of different variogram models by selecting combinations of the eleven available component types. When combined with a nugget effect, one of three models is adequate for most data sets: the linear, the exponential, and the spherical models. Examples of these three models are shown in Figure 5.1.

Figure 5.1 Variogram Models

If the experimental variogram never levels out, then the linear model is usually appropriate. If the experimental variogram levels out, but is "curvy" all the way up, then the exponential model should be considered. If the experimental variogram starts out straight, then bends over sharply and levels out, the spherical model is a good first choice.

For the data in ExampleDataSetC.xls, a spherical model appears appropriate as the curve increases before appearing to level off (one could also try an exponential model). In the Variogram Component #1 section, change the Component type from Linear to Spherical.

5.2 Selecting the variogram model scale and length parameters

We must now set the Partial sill and the Range parameters for the spherical model using an iterative approach (i.e. guess and check). The Partial sill is the height on the y-axis at which the variogram levels off. By simply looking at the plot, a value between 400 and 450 seems reasonable: enter 425. The Range for a spherical model is the lag distance at which the variogram levels off. Again, from the plot a value between 30 and 40 seems reasonable: enter 35. The plot updates immediately.

This is not a bad first guess, but upon examination of the redrawn curve, it appears that the Range is a little bit too long since the model (blue line) lies to the right of the experimental variogram plot (red line and dots). Reset the Range to 30. This is still a little bit too long. Try 29 for the Range. This is a good fit for a variogram.

Figure 5.3 Variogram model with initial assumptions

Left: Partial sill = 425, Range = 35. Right: Partial sill =425, Range = 29.

#### 5.3 Selecting the variogram nugget effect

If the experimental variogram appears to have a non-zero intercept on the vertical axis, then the model may need a nugget effect component. The value of

NN Gamma Z contained in the Display Statistics report in the Variogram modelling page in Grid Data offers a quantitative upperbound for the nugget effect in most circumstances.

In Surfer the nugget effect is partitioned into two sub-components: the Error variance and the Micro variance. Both of these sub-components are nonnegative, and the sum of these two sub-components should equal the apparent non-zero intercept. The Error variance measures the reproducibility of observations. This includes both sampling and assaying (analytical) errors. The Error variance is best selected by computing the variance of differences between duplicate samples. The Micro variance is a substitute for the unknown variogram at separation distances of less than the typical sample spacing. This is best selected by taking the difference between the apparent non-zero intercept of the experimental variogram and the error variance.

The model for our example appears to intersect the vertical axis at 0, so we will not apply a nugget effect.

### 6. Modeling the variogram anisotropy

Often, the experimental variogram shows different length ranges in different directions. This is called geometric anisotropy. For a linear variogram model this would appear as a different slope in different directions, while a spherical model manifests geometric anisotropy as different apparent Range parameters in different directions. To investigate the variogram anisotropy, we can edit the properties in the Experimental section. The omni-directional experimental variogram averages the behavior over all directions. To look in particular directions change the Direction and the Tolerance parameters. Start by setting the Tolerance to 30 degrees. Notice how the variogram surface diagram now shows 'pie wedge' centered at 0 degrees (east in this model), with an angular tolerance (plus or minus) of 30 degrees.

Fig 6.1 The variogram surface diagram showing Direction of 0°, and Tolerance of 30°

Drag the dialog box to the side to make the variogram plot visible. To quickly sweep through all directions, click on the central bar in the pie wedge (the one at 0°) on the variogram surface diagram and drag it around through a full 360°. Sweep through all directions a few times. Notice how the shape and pair counts for the experimental variogram update on the variogram chart as you move through the various directions.

Figure 6.2 Directional variogram plots with Tolerance set to 30°. From left to right: Direction = (Top row) 0°, 30°, 60° (Bottom row) 90°, 120°, 150°.

It appears that for directions between 90° and 180° degrees the omni-directional model is consistently to the right of the ascending portion of the experimental variogram. Conversely, the omni-directional model is consistently to the left of the ascending portion of the experimental variogram for directions between 0° and 90° degrees. This is a clear indication of geometric anisotropy.

By dragging the Direction control on the surface diagram, go to the direction at which the apparent Range and Partial sill of the experimental variogram is maximized: say, 30°. In the Variogram component #1 section, change the Range parameter to better fit this directional variogram. 40 is too long, 30 is too

short, and 37 appears to fit quite well. Change the Anisotropy Angle to 30° (the plot will not change at this point).

In the Experimental section, change the Direction to 120° (this is orthogonal to the direction of maximum length which we determined was 30° (120° = 30° + 90°). Back in the Variogram component #1 section, change the Anisotropy Ratio to fit the variogram model to the current directional experimental variogram. A Ratio value of 1.6 appears to be a good fit.

Figure 6.4 Directional variograms with anisotropy settings. Left: Direction = 30°, Right: Direction = 120° Anisotropy Ratio = 1.6, Angle = 30°

To get a final visual verification of your model, take it for a "spin". Drag the Direction control on the variogram surface diagram around the full 360°. Notice how the variogram model mimics the changing shape of the experimental variogram. You can fine tune the model parameters to better match the model to the experimental variogram, but do not over-model: one or two significant digits is all that you can hope for.

Click on the AutoFit button to fine-tune the model after choosing appropriate initial parameters. Click OK to accept the defaults, and AutoFit returns an Anisotropy Ratio of 1.742 and Angle of 38.56°.

Figure 6.2 Directional variogram plots with Anisotropy Ratio =1.6 and Angle=30. From left to right: Direction = (Top row) 0°, 30°, 60° (Bottom row) 90°, 120°, 150°.

### 7 Rules-of-thumb

Know your data! Before calculating the experimental variogram, calculate regular non-spatial statistics. Use the Display Statistics button in the Variogram modelling page to display the data minimum, maximum, median, mean, standard deviation, variance, and skewness. Create a post map or classed post map in Surfer to display scatter plots. Use Grapher to create histograms and cumulative frequency plots.

• Do not over-model. The simplest model that reproduces the important features of the experimental variogram is the best model.
• When in doubt, use the default variogram model for gridding. A simple linear variogram model usually generates an acceptable grid; this is especially true for initial data analysis. Remember, however, that the kriging standard deviation grid generated using the default ariogram is meaningless.
• Unless there is a clear, unambiguous, physical justification, do not use an anisotropy ratio of greater than 3 to 1. If the experimental variogram appears to support an anisotropy of greater than 3 to 1, and there is no unambiguous, physical justification for such a severe anisotropy, there may be a trend in the data. Consider detrending the data before carrying out your variogram analysis.
• Try the model and see how the resulting grid looks in a contour map. If you have competing candidate variogram models, generate a grid and contour map from each. If there are no significant differences, choose the simplest variogram model.
• The range of the variogram is often close to the average size of physical anomalies in the spatial fluctuation of the Z values. In the absence of a reliable experiment variogram, this rule-of-thumb may be used to postulate a variogram range.
• An experimental variogram that fluctuates around a constant value is not an ill-behaved variogram. It is an indication that the Z values are uncorrelated at the scale of the typical sample spacing. In such a situation a contour map, regardless of the gridding method used, is an unreliable representation of the data, and more data at closer sample spacing are needed for detailed local characterization.
• If the following three conditions are met, then the sample variance is a reasonable approximation for the variogram sill:
1. The data are evenly distributed across the area of interest, as displayed in a post map.
2. There is no significant trend in the data across the area of interest.
3. The dimension of the area of interest is more than three times the effective variogram range.