Revision history - Open Foris Support

Revision history[back]

So sample size calculation is a bit more tricky. There are two things you need to consider: 1. **Technical** : What is the level of accuracy you want to achieve. Specifically for what variable you are collecting. Different variables will have different uncertainties in the data collected as the uncertainty is measured through the number of times the value appears in the study. Normally you will achieve higher accuracy in the "main" classes like the IPCC Land Use Category (Forest, Settlement, Cropland...) than in the sudvisions (type of forest, type of cropland, type of grasslands...). So if you need a high accuracy on data of Land Use Change, you will need to collect a lot more plots that if you just want to have a high level of accuracy on the IPCC classes for the current Land Use. 2. **Practical** : how much time you have for the data collection activity. Depending on the level of complexity of the survey, the internet speed and the knowledge of the area you should be able to collect between 100 (slow internet, difficult survey) to 300 (fast internet, simplified survey) plots per day. How many days and how many people will be collecting data? Also, are there a lot of Google Earth very high resolution images in the AOI or will you need to use Landsat or Sentinel for assessing the plots (this takes longer) This will make the decision easier. Obviously, at the end you will get a certain amount of plots and from that you will be able to calculate the uncertainties. So there are two approaches to try to accomodate these aspects : 1. Building a multiple grid so that the data collection can start from plots at say 10x10 km and then if the uncertainties is too large, refine it with a grid one level down, in this case 5x5 km. This grid has 4 times as many plots than the initial one. Since you already collected the plots at 10x10km (say 100 plots) then it means that you will have to collect the remaining ones (in this example 300). If this data is still not accurate enough then you move to a 2.5x2.5 km grid and so on... 2. The previous approach can be a bit complex to set up. A very similar approach which is much easier to design and has identical results is to just use a random sampling design. In this case you will collect data in completely random plots. Once you have collected enough plots you may evaluate the uncertainties. If you are happy with that, you are done! If not, then you need to keep collecting data. To generate a random sampling design you can use this Google Earth Engine Script : [IMPROVED GRID GENERATOR][1] [1]: ~~https://code.earthengine.google.com/764eb6a6b5e7075d1faa051040f93031~~https://code.earthengine.google.com/c4a9f26e3b2242fca5571750531da1cc

So sample size calculation is a bit more tricky. There are two things you need to consider: 1. **Technical** : What is the level of accuracy you want to achieve. Specifically for what variable you are collecting. Different variables will have different uncertainties in the data collected as the uncertainty is measured through the number of times the value appears in the study. Normally you will achieve higher accuracy in the "main" classes like the IPCC Land Use Category (Forest, Settlement, Cropland...) than in the sudvisions (type of forest, type of cropland, type of grasslands...). So if you need a high accuracy on data of Land Use Change, you will need to collect a lot more plots that if you just want to have a high level of accuracy on the IPCC classes for the current Land Use. 2. **Practical** : how much time you have for the data collection activity. Depending on the level of complexity of the survey, the internet speed and the knowledge of the area you should be able to collect between 100 (slow internet, difficult survey) to 300 (fast internet, simplified survey) plots per day. How many days and how many people will be collecting data? Also, are there a lot of Google Earth very high resolution images in the AOI or will you need to use Landsat or Sentinel for assessing the plots (this takes longer) This will make the decision easier. Obviously, at the end you will get a certain amount of plots and from that you will be able to calculate the uncertainties. So there are two approaches to try to accomodate these aspects : 1. Building a multiple grid so that the data collection can start from plots at say 10x10 km and then if the uncertainties is too large, refine it with a grid one level down, in this case 5x5 km. This grid has 4 times as many plots than the initial one. Since you already collected the plots at 10x10km (say 100 plots) then it means that you will have to collect the remaining ones (in this example 300). If this data is still not accurate enough then you move to a 2.5x2.5 km grid and so on... 2. The previous approach can be a bit complex to set up. A very similar approach which is much easier to design and has identical results is to just use a random sampling design. In this case you will collect data in completely random plots. Once you have collected enough plots you may evaluate the uncertainties. If you are happy with that, you are done! If not, then you need to keep collecting data. To generate a random sampling design you can use this Google Earth Engine Script : [IMPROVED GRID GENERATOR][1] [1]: https://code.earthengine.google.com/764eb6a6b5e7075d1faa051040f93031