Hi, I would like to use CollectEarth to collect training samples for an unsupervised classification in GoogleEarth Engine. My area is 190,000 ha and I would have 13 classes. I am not sure how many sample plots I should collect data for. Thanks, Florence
asked
flandsberg |

It sounds like a proportional, stratified sampling approach where your samples are grouped into each of your 13 classes (or strata) is suitable for you. There are several general strategies for weighting your samples, equal or proportional (or power). - Equal allocation means you have equal number of samples for each class.
- Proportional allocation means that the number of samples per class reflect the size (area) of a class in your spatial extent, i.e. larger classes would require more samples while small, rarer classes have less samples.
You will need to find a compromise between equal and proportional allocation to find the optimal number of samples that fit your objectives. To calculate the minimum sample count via stratified sampling, please refer to Olofsson et al (2014). You will find the formulas and answers you need there. Since you are already using GEE, I recommend you use the AREA2 GEE application, which calculates minimum sample counts using the formulas outlined in Olofsson et al (2014). You are asking two questions; how many total samples should you assess? And how many of those need to be training samples? Unless I am mistaken, a minimum sample count is calculated for the testing samples (not training) to ensure that you are able to perform a robust accuracy assessment. Some studies suggest a 70-30 ratio, where of all your assessed samples, 70% will be used to train your classifier, while 30% is used to test (or validate) it. Depending on the classification type and mapping goals, this ratio can be varied.
answered
EarthOrbGIS |