Oft-crossvalidate

From Open Foris Wiki

Jump to: navigation, search

oft-crossvalidate is a program for carrying out a leave-one-out cross-validation using nearest neighbour estimation (see oft-nn for nearest neighbour estimation).

Usage: oft-crossvalidate [-h] <-i datafile> <-k val> <-v col> <-bands val> [-dw {1/2/3}] [-x col] [-y col] [-id col] [-norm] [-mindist val] [-maxdist val] [-dem col thres] [-lu col]

Specifications:

  • You need to give at least the datafile, number of neighbours (k), the column for your variable and nbr of bands
  • Bands must be located after all other variables
  • Other parameters, optional:
    • -dw = weight the nearest neighbour data with 1=equal (default), 2=inverse distance, 3=squared inv. distance weights
    • -x = column for x-coordinate
    • -y = column for y-coordinate
    • -id = column for id
    • -norm = normalize the image features (default is no normalization)
    • -mindist = use a minimum spatial distance (e.g. 1000). Observations closer than that, based on the x and y-coordinates are not allowed as neighbours (default is no restriction)
    • -maxdist = use a maximum spatial distance (e.g. 50000). Observations outside that radius are not allowed as neighbours (default is no restriction)
    • -dem = column and threshold value (e.g. 1000) for restriction of neighbours in vertical direction (default is no restriction)
    • -lu = column used for stratification of the data. If given, separate RMSEs are computed for each class indicated in the column (default is no stratification)
  • Program is terminated if the spatial neighbourhood restriction leaves too few (less than k) potential neighbours
  • A possible order of data is: id, variable, x-coordinate, y-coordinate, feature1...featureN
  • Values must be separated with a space or tab


  • Prints the average, RMSE and bias on screen
  • Saves original value, estimate and difference in an output file. If id or x and y are given, they are printed out as well.
  • If the id is indicated in the command line, the id's of 10 nearest neighbours are printed into the output file.


Exercise:

  • Get Example data set
  • For this exercise following tools are used: oft-crossvalidate

1. For this exercise we use sample_polyN20.txt. You might have created it already in exercise oft-sample-within-polys.bash.

2. Open your working directory using

cd /home/...

3. The script oft-crossvalidate prints the average, RMSE and bias on screen using the input head sample_landuse.txtdata file sample_landuse.txt. Lets take a closer look at the input file head sample_landuse.txt(space or tab separate):

head sample_landuse.txt
10557.00 772650.00 -2404770.00 5.00 53.00 26.00 28.00 54.00 81.00 131.00 39.00
94788.00 773490.00 -2431680.00 1.00 51.00 24.00 25.00 45.00 65.00 127.00 33.00
201536.00 774750.00 -2439390.00 1.00 54.00 25.00 27.00 50.00 71.00 130.00 35.00
88531.00 771450.00 -2431110.00 1.00 47.00 21.00 18.00 37.00 48.00 126.00 21.00
123374.00 774150.00 -2433990.00 1.00 54.00 24.00 30.00 35.00 75.00 132.00 42.00
97345.00 776220.00 -2431950.00 1.00 52.00 23.00 24.00 42.00 60.00 131.00 30.00
199041.00 773190.00 -2439120.00 1.00 51.00 23.00 23.00 52.00 58.00 130.00 28.00
144276.00 775860.00 -2435400.00 1.00 49.00 22.00 21.00 45.00 59.00 125.00 30.00
180961.00 772680.00 -2437890.00 1.00 49.00 21.00 21.00 36.00 61.00 126.00 28.00
185386.00 772410.00 -2438190.00 1.00 49.00 21.00 18.00 43.00 51.00 126.00 22.00

Explanation of the columns: pixel_id x y class band1 band2 band3 band4 band5 band6 band7


4. Lets run oft-crossvalidate defining our input file with -i in front, number of neighbours -k 10, -v defines the column of the variable we want use - only to exemplify the tool we use column 1 containing the IDs as our input data has no additional column with values, -bands defines the number of bands, -x defines to look up the x coordinates in column 2 and -y defines to look up the y coordinates in column 3:

oft-crossvalidate -i  sample_landuse.txt -k 10 -v 1 -bands 7 -x 2 -y 3

Result is printed on screen:

k=10
normalize=0
RMSE=   62255.181
Bias=    1367.027
Avg =  116318.433

Further, and output file sample_landuse.txt_out is created:

head sample_landuse.txt_out
772650.000 -2404770.000   10557.00  103566.30  -93009.30
773490.000 -2431680.000   94788.00  128938.00  -34150.00
774750.000 -2439390.000  201536.00  110055.80   91480.20
771450.000 -2431110.000   88531.00  127395.30  -38864.30
774150.000 -2433990.000  123374.00  102471.90   20902.10
776220.000 -2431950.000   97345.00  123907.80  -26562.80
773190.000 -2439120.000  199041.00  105271.30   93769.70
775860.000 -2435400.000  144276.00  130783.50   13492.50
772680.000 -2437890.000  180961.00  127426.40   53534.60
772410.000 -2438190.000  185386.00  126411.20   58974.80

Explanation of the columns: x, y, pixel_id, estimate, difference (col3 - col4).



Back to Open Foris Toolkit Main Page

Back to Tools & Exercises



Personal tools