--- title: "spbal - Spatially Balanced Sampling" author: "spbal 1.0.0" date: "April 2024" always_allow_html: true #output: # pdf_document # bookdown::html_document2: output: rmarkdown::html_vignette #toc: true #toc_float: # toc_collapsed: true #toc_depth: 3 #number_sections: true #fig_caption: yes vignette: > %\VignetteIndexEntry{spbal - Spatially Balanced Sampling} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(bookdown) knitr::opts_chunk$set( fig.dim = c(8, 6), collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(spbal) library(ggplot2) library(gridExtra) ``` # spbal This vignette is intended to provide details of the functions and methods provided in the __spbal__ package. The **spbal** package has three spatially balanced sample designs: BAS, Halton Frames and HIP. This document illustrates how samples are drawn using **spbal**, including master and stratified sampling applications. The package is freely available from CRAN. This document was created with **spbal** version $1.0.0$. This vignette is divided into the following sections: - Simple Random Sampling (SRS) - Balanced acceptance sampling (BAS) - Halton Frames (HF) - Halton Iterative Partitioning (HIP) Lets start by looking at Simple Random Sampling (SRS). ## Simple Random Sampling (SRS) Draw a random sample without replacement from a population. ### spbal::SRS() This function invokes __base::sample()__ to draw a random sample using a user specified random seed. Returned is a list of random positive integers of length __sample_size__, in the range $1$ to __total_rows__, these can then be used to index the original population data. The following parameters are supported: + __seed__ - The random seed to be used to draw the current sample. The default is $42$. + __total_rows__ - The total number of rows in your population. + __sample_size__ - The number of rows wanted in the random sample. All parameter values must be numeric and have values greater than zero, __sample_size__ must be less than __total_rows__. #### spbal::SRS() code example Function __spbal::SRS()__ returns a list of size __sample_size__. ```{r srs1} # Create a random sample of 20 with a seed of 99 from a population of 100. rand_samp <- spbal::SRS(seed = 99, total_rows = 100, sample_size = 20) rand_samp ``` ## Balanced Acceptance Samples (BAS) BAS draws spatially balanced samples from areal resources. To draw BAS samples, **spbal** requires a study region shapefile and the region's bounding box. An initial sample size is also needed, which can be easily increased or decreased within **spbal** for master sampling applications. ### spbal::BAS() The __spbal::BAS()__ function supports the following input parameters: + __shapefile__ The shape file as a polygon (sp or sf) to select sites for. + __n__ The number of sites to select. If using stratification it is a named vector containing sample sizes of each group. + __boundingbox__ The Bounding box around the study area. If a bounding box is not supplied then **spbal** will generate a bounding box for the shapefile. + __minRadius__ If specified, the minimum distance, in meters, allowed between sample points. This is applied to the *$sample* points. Points that meet the __minRadius__ criteria are returned in the __minRadius__ output variable. The default value for __minRadius__. + __panels__ A list of integers that define the size of each panel in a non-overlapping panels design. The length of the list determines the number of panels required. The sum of the integers in the panels parameter will determine the total number of samples selected, __n__. The default value for __panels__ is NULL, this indicates that a non-overlapping panel design is not wanted. + __panel_overlap__ A list of integers that define the overlap into the previous panel. Is only used when the __panels__ parameter is not NULL. The default value for __panel_overlap__ is NULL. The length of __panel_overlap__ must be equal to the length of panels. The first value is always forced to zero as the first panel never overlaps any region. + __stratum__ The name of a column in the data.frame attached to __shapefile__ that defines the strata of interest. + __seeds__ A vector of two seeds, __u1__ and __u2__. If not specified, the default is NULL and will be defined randomly. + __verbose__ A boolean, if you want to see any output printed to screen set to TRUE. Helpful if taking a long time. The default is FALSE i.e. no informational messages will be displayed. The __spbal::BAS()__ function returns a list containing three variables: + __sample__ containing locations in the BAS sample, in BAS order. + __seeds__ the __u1__ and __u2__ seeds used to generate the sample. + __minRadius__ containing points from __sample__ that meet the __minRadius__ criteria. If the __minRadius__ parameter is NULL then __minRadius__ will return NULL. The *sample* points are returned in the form of a simple feature collection of POINT objects. They have the following attributes: + __SiteID__ A unique identifier for every sample point. This encodes the BAS order. + __spbalSeqID__ A unique identifier for every sample point. This encodes the BAS sample order. + __geometry__ The XY co-ordinates of the sample point in the CRS of the original shapefile. #### spbal::BAS() code examples First, lets define our shapefile, select a region from within it and define a bounding box around it: ```{R BASex1a} # Use the North Carolina shapefile supplied in the sf R package. shp_file <- sf::st_read(system.file("shape/nc.shp", package="sf")) shp_gates <- shp_file[shp_file$NAME == "Gates",] shp_gates # Vertically aligned master sample bounding box. bb <- spbal::BoundingBox(shapefile = shp_gates) ``` ### Equal probability BAS sample. This section uses the Gates county loaded above. The first example design is an equal probability BAS sample of $n = 20$ survey sites from Gates, North Carolina, U.S.A. The call is below: ```{R BASex2a,warning=FALSE, message=FALSE} set.seed(511) n_samples <- 20 result <- spbal::BAS(shapefile = shp_gates, n = n_samples, boundingbox = bb) BAS20 <- result$sample ``` ```{r BASex3a} BAS20 ``` The site locations are illustrated below: ```{R BASex1c} # 1. Plot using ggplot gates <- sf::st_as_sf(shp_gates, coords = c("longitude", "latitude")) ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = BAS20, size = 3, aes(label = spbalSeqID, vjust = -0.5, geometry = geometry, x = after_stat(x), y = after_stat(y), color = "ID"), stat = "sf_coordinates") + geom_point(data = BAS20, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "black"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Spatially balanced sample using a equal probability BAS design.", subtitle = "Total of 20 survey sites from Gates, North Carolina", caption = "spbal Equal probability BAS sample.") ``` ### Increase the BAS sample size. To increase this BAS sample of $20$ sites to $50$ sites, we take the first $50$ points from the random-start Halton sequence that BAS used to draw its sample. **spbal** achieves this by specifying the random seed integers from the first sample using the following call. ```{R BASex4a,message=FALSE, warning=FALSE} n_samples <- 50 result2 <- spbal::BAS(shapefile = shp_gates, n = n_samples, boundingbox = bb, seeds = result$seed) BAS50 <- result2$sample BAS50 # Check, first n_samples points in both samples must be the same. all.equal(BAS20$geometry, BAS50$geometry[1:20]) ``` The first 20 sites in **BAS50** are identical to **BAS20**. The sample size can be increased arbitrarily from a specified seed, making BAS well-suited for master sampling applications. The critical point is that any sub-sample of consecutive points from a BAS master sample is a bona fide BAS sample. Users can also specify their seed point using **seeds = c(u1, u2)** to generate a specific random-start Halton sequence, such as resurrecting a previously used BAS sample. Plot the BAS point ordering, **BAS20** and **BAS50** using ggplot. ```{r BASex5a, warning=FALSE, message=FALSE} ## Convert foreign object to an sf object for ggplot. gates <- sf::st_as_sf(shp_gates, coords = c("longitude", "latitude")) ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = BAS50, size = 2, aes(label = BAS50$spbalSeqID, vjust = -0.7, geometry = geometry, x = after_stat(x), y = after_stat(y), color = "BAS Order"), stat = "sf_coordinates") + geom_point(data = BAS50, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour = "BAS n = 20 + 30"), stat = "sf_coordinates")+ geom_point(data = BAS20, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour = "BAS n = 20"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red","blue","black"), name = "Legend") + theme_bw() + labs(x = "Latitude", y = "Longitude", title = "BAS samples from the Gates county.") ``` ### Stratified BAS Sample. It is possible to create stratified samples from BAS master samples using **spbal**. To implement this design, the input shapefile must have labelled sub-regions or strata, all of which should be enclosed within a single bounding box. An example of a stratified design from a BAS master sample is shown below. In this design, $50$ sites are selected from three counties in the state of North Carolina, U.S.A, with $20$ sites from Gates, $20$ from Northampton, and $10$ from Hertford. The code for this stratified design is provided below, and the survey sites within each stratum can be seen in the figure below. ```{R BASex6a, results='hide', message=FALSE} strata <- c("Gates", "Northampton", "Hertford") n_strata <- c("Gates" = 20, "Northampton" = 20, "Hertford" = 10) shp_subset <- shp_file[shp_file[["NAME"]] %in% strata,] ``` Define the bounding box for Gates, Northampton and Hertford using **spbal**. ```{r BASex7a, results='hide', message=FALSE} bb_strata <- spbal::BoundingBox(shapefile = shp_subset) ``` Draw the stratified sample from the BAS master sample and display the first three sites in the Gates region. ```{r BASex8a, results='hide', message=FALSE} set.seed(511) result3 <- spbal::BAS(shapefile = shp_subset, n = n_strata, boundingbox = bb_strata, stratum = "NAME") BASMaster <- result3$sample gates_samp <- BASMaster[BASMaster[["NAME"]] %in% "Gates",] ``` ```{r BASex9a} gates_samp ``` Plot the stratified sample using ggplot. ```{r BASex10a} strat <- sf::st_as_sf(shp_subset, coords = c("longitude", "latitude")) gates_samp <- BASMaster[BASMaster[["NAME"]] %in% "Gates",] northampton_samp <- BASMaster[BASMaster[["NAME"]] %in% "Northampton",] hertford_samp <- BASMaster[BASMaster[["NAME"]] %in% "Hertford",] ggplot() + geom_sf() + geom_sf(data = strat, size = 4, shape = 23) + geom_point(data = gates_samp, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Gates"), stat = "sf_coordinates" ) + geom_point(data = northampton_samp, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Northampton"), stat = "sf_coordinates" ) + geom_point(data = hertford_samp, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Hertford"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "green"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Stratified samples from BAS master samples.", subtitle = "50 sites from North Carolina, U.S.A, 20 from Gates, \n20 from Northampton and 10 from Hertford.", caption = "spbal Stratified BAS sample.") ``` ### Increasing or decreasing strata sample sizes. The stratum sample sizes can be easily increased or decreased using points from the master sample. For instance, one may double the number of survey sites in the Hertford stratum (from $10$ to $20$) for the stratified design above using the **seeds** input. The call is below. ```{R BASex11f} n_strata <- c("Gates" = 20, "Northampton" = 20, "Hertford" = 20) result4 <- spbal::BAS(shapefile = shp_subset, n = n_strata, boundingbox = bb_strata, seeds = result3$seed, stratum = "NAME") BASMaster <- result4$sample gates_samp2 <- BASMaster[BASMaster[["NAME"]] %in% "Gates",] gates_samp2 ``` The $20$ sites in **gates_samp2** are identical to **gates_samp** because the same seed point was used. ```{r BASex12a} # Ensure gates_samp is equal to the first 10 sites in gates_samp2. Must return TRUE. all.equal(gates_samp$geometry[1:20], gates_samp2$geometry[1:20]) ``` Plot the increased stratified sample using ggplot. ```{r BASex13a} gates_samp4 <- BASMaster[BASMaster[["NAME"]] %in% "Gates",] northampton_samp4 <- BASMaster[BASMaster[["NAME"]] %in% "Northampton",] hertford_samp4 <- BASMaster[BASMaster[["NAME"]] %in% "Hertford",] ggplot() + geom_sf() + geom_sf(data = strat, size = 4, shape = 23) + geom_point(data = gates_samp4, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Gates"), stat = "sf_coordinates" ) + geom_point(data = northampton_samp4, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Northampton"), stat = "sf_coordinates" ) + geom_point(data = hertford_samp4, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Hertford"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "green"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Stratified samples from BAS master samples.", subtitle = "60 sites from North Carolina, U.S.A, 20 from Gates, \n20 from Northampton and 20 from Hertford.", caption = "spbal Stratified BAS samples.") ``` ### Non-Overlapping panel design. The final BAS example design with **spbal** is a non-overlapping panel design for surveying over time. This design has three panels containing $20$ survey sites from the county of Gates, North Carolina, U.S.A. Each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 60$. ```{R BASex14g} set.seed(511) n_panels <- c(20, 20, 20) result5 <- spbal::BAS(shapefile = shp_gates, panels = n_panels, boundingbox = bb) BASpanel <- result5$sample BASpanel ``` The sites in panel $2$ are obtained using the **getPanel()** function as follows. ```{r BASex15a} panel_2 <- spbal::getPanel(BASpanel, 2) panel_2 <- panel_2$sample panel_2 ``` Plot the sample sites in each panel using ggplot. ```{r BASex16a, warning=FALSE} # to extract the sample points associated with a specific panelid, we can use the following code: panel_1 <- BASpanel[BASpanel$panel_id == 1,] panel_2 <- BASpanel[BASpanel$panel_id == 2,] panel_3 <- BASpanel[BASpanel$panel_id == 3,] # or use the spbal::getPanel() function. #panel_1 <- spbal::getPanel(BASpanel, 1) #panel_2 <- spbal::getPanel(BASpanel, 2) #panel_3 <- spbal::getPanel(BASpanel, 3) ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_point(data = panel_1, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-1 (n = 20)"), stat = "sf_coordinates" ) + geom_point(data = panel_2, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-2 (n = 20)"), stat = "sf_coordinates" ) + geom_point(data = panel_3, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-3 (n = 20)"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "black"), name = "Panels") + theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Panel Design from BAS Master Sample", subtitle = "Total of 60 survey sites from Gates, North Carolina, U.S.A", caption = "spbal Non-overlapping Panel Design.") ``` ### Overlapping panel design. Panel overlaps are also possible in **spbal**. The following call sets the last five elements from panel $1$ as the first five elements in panel $2$ and the last five elements from panel $2$ as the first five elements in panel $3$ ($50$ unique sites). Because the overlaps are sub-samples of consecutive BAS points, they are well-spread over the study region. Furthermore, each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 50$. ```{r BASex17a, results='hide', message=FALSE} set.seed(511) n_panels <- c(20, 20, 20) n_panel_overlap <- c(0, 5, 5) result6 <- spbal::BAS(shapefile = shp_gates, panels = n_panels, panel_overlap = n_panel_overlap, boundingbox = bb) BASpanel <- result6$sample ``` The sites in panel $2$ are obtained using the **getPanel()** function as follows. The first five sites in panel $2$ are also in panel $1$ so **panel_id** = $1, 2$ for these sites. ```{r BASex18a} panel_2 <- spbal::getPanel(BASpanel, 2) panel_2 <- panel_2$sample panel_2[1:5,] ``` Plot our overlapped samples. ```{r BASex19a} panel_1 <- spbal::getPanel(BASpanel, 1) panel_2 <- spbal::getPanel(BASpanel, 2) panel_3 <- spbal::getPanel(BASpanel, 3) ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_jitter(width = 10, height = 10) + geom_point(data = sf::st_jitter(panel_1$sample, 0.02), aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-1"), stat = "sf_coordinates" ) + geom_point(data = sf::st_jitter(panel_2$sample, 0.02), aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-2"), stat = "sf_coordinates" ) + geom_point(data = panel_3$sample, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Panel-3"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "green"), name = "Overlapped Panels") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Spatially balanced sample using a overlapping panel \ndesign, of three panels, each containing 20 survey sites.", subtitle = "Total of 50 survey sites from Gates, North Carolina. Panel Overlap = (0, 5, 5)", caption = "spbal Overlapping Panel Design.") ``` ## Halton Frame (HF) Halton Frames discretize an areal resource into a spatially ordered grid, where samples of consecutive frame points are spatially balanced. ### spbal::HaltonFrame() The __spbal::HaltonFrame()__ function supports the following input parameters: + __shapefile__ The shape file as a polygon (sp or sf) to select sites for. + __N__ The number of points in the frame to generate. + __J__ The number of grid cells. A list of $2$ values. The default value is $c(3, 2)$. + __bases__ Co-prime base for the Halton Sequence. The default value is $c(2, 3)$. + __boundingbox__ The bounding box around the study area. If a bounding box is not supplied then **spbal** will generate a bounding box for the shapefile. + __panels__ A list of integers that define the size of each panel in a non-overlapping panels design. The length of the list determines the number of panels required. The sum of the integers in the panels parameter will determine the total number of samples selected, __n__. The default value for __panels__ is NULL, this indicates that a non-overlapping panel design is not wanted. + __panel_overlap__ A list of integers that define the overlap into the previous panel. Is only used when the __panels__ parameter is not NULL. The default value for __panel_overlap__ is NULL. The length of __panel_overlap__ must be equal to the length of panels. The first value is always forced to zero as the first panel never overlaps any region. + __stratum__ The name of a feature column in the __shapefile__ that defines the strata of interest. The default value is NULL indicating that stratification will not be performed. + __seeds__ A vector of two seeds, __u1__ and __u2__. If not specified, the default is NULL and will be defined randomly. + __verbose__ A boolean, if you want to see any output printed to screen set to TRUE. Helpful if taking a long time. The default is FALSE i.e. no informational messages will be displayed. The __HaltonFrame()__ function returns a list containing five variables: + __J__ The number of grid cells. A list of $2$ values that were used to generate this Halton grid and frame. + __hg.pts.shp__ Halton grid over the bounding box and study area. + __hf.pts.shp__ Halton frame, the sample points within the study area. + __bb__ The bounding box surrounding the study area. + __seeds__ The $u1$ and $u2$ seeds used to generate the sample. The sample points in *hf.pts.shp* are returned in the form of a simple feature collection of POINT objects. It has the following features: + __ID__ A unique identifier, the Halton frame point order. + __spbalSeqID__ A unique identifier, the study area point order. + __x__ The point geometry. ### spbal::HaltonFrame() code example To generate Halton Frames, **spbal** requires a study region shapefile and the region's bounding box. To illustrate Halton Frames, we discretize the Gates study region into a coarse grid using $B = 2^{J_1} \times 3^{J_2} = 2^3 \times 3^2$ (a $9$ by $8$ grid). The call is below. ```{r HFex1a, message=FALSE, warning=FALSE, results='hide'} set.seed(511) result6 <- spbal::HaltonFrame(shapefile = shp_gates, J = c(3, 2), boundingbox = bb) Frame <- result6$hf.pts.shp Grid <- result6$hg.pts.shp ``` Spatially ordered $9$ by $8$ Halton grid $(B = 2^{3} * 3^{2} = 72)$ for the Gates county in North Carolina, U.S.A. ```{R HFex1b} # Grid - Halton grid over Gates county. ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = Grid, size = 3, aes(label = ID, vjust = -0.5, geometry = x, x = after_stat(x), y = after_stat(y), color = "ID"), stat = "sf_coordinates") + geom_point(data = Grid, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "black"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "A Halton grid, B = 2^3 * 3^2, over Gates, North Carolina, U.S.A.", subtitle = "A total of 432 points.", caption = "spbal Halton Grid example.") ``` and the corresponding Halton Frame for the Gates county in North Carolina in the U.S.A. ```{R HFex1c} # Frame - Halton frame over Gates county. ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = Frame, size = 3, aes(label = ID, vjust = -0.5, geometry = x, x = after_stat(x), y = after_stat(y), color = "ID"), stat = "sf_coordinates") + geom_point(data = Frame, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "black"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "A Halton frame, B = 2^3 * 3^2, over Gates, North Carolina, U.S.A.", subtitle = "Showing sample points within the Gates shapefile.", caption = "spbal Halton Frame example.") ``` ### Halton frame fine grid The second example frame generates a fine grid using an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame over Gates and draws a spatially balanced sample of $n = 25$. The call is below. ```{r HFex1d, results='hide', message=FALSE, warning=FALSE} set.seed(511) result7 <- spbal::HaltonFrame(shapefile = shp_gates, J = c(8, 5), boundingbox = bb) Frame <- result7$hf.pts.shp ``` Draw $n = 25$ sites from the Halton Frame using the **getSample()** function. The first 25 sites from a $B = 2^{8} * 3^{5}$ Halton Frame ($62,208$ grid points covering Gates), the numbering shows the frame's ordering, where sub-samples of consecutively numbered points are spatially balanced samples. ```{r HFex1e, message=FALSE, warning=FALSE} n_samples <- 25 FrameSample <-getSample(shapefile = Frame, n = n_samples) FrameSample <- FrameSample$sample FrameSample[1:10, c("x", "spbalSeqID")] ``` Plot the sample and number the points in the frame's order (given by **spbalSeqID**) using ggplot. ```{r HFex1f, warning=FALSE} ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = FrameSample, size = 3, aes(label = spbalSeqID, vjust = -0.5, geometry = x, x = after_stat(x), y = after_stat(y), color = "Frame Order"), stat = "sf_coordinates") + geom_point(data = FrameSample, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("black", "red"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "First 25 sites from the Gates Halton Frame.") ``` ### Spatially balanced sample from a random position It is also possible to draw a spatially balanced sample from a random position in the frame using the **getSample()** function. The call for an $n = 20$ sample is given below. Note the use of the **randomStart = TRUE** parameter to generate the sample from a random position. ```{r HFex1g, warning=FALSE, message=FALSE} n_samples <- 20 FrameSample <-getSample(shapefile = Frame, n = n_samples, randomStart = TRUE) FrameSample <- FrameSample$sample FrameSample[1:10, c("x", "spbalSeqID")] ``` ### Halton frame overlapping panel design. The following example generates an overlapping panel design for surveying over time with three panels, each with $20$ sites with a panel overlap of five. ```{R HFex3a} set.seed(511) # Three panels, of 20 samples each. panels <- c(20, 20, 20) # second panel overlaps first panel by 5, and third panel # overlaps second panel by 5. panel_overlap <- c(0, 5, 5) # generate the sample. samp <- spbal::HaltonFrame(J = c(4, 3), boundingbox = bb, panels = panels, panel_overlap = panel_overlap, shapefile = shp_gates) # get halton frame data from our sample. samp3 <- samp$hf.pts.shp samp3 panelid <- 1 olPanel_1 <- spbal::getPanel(samp3, panelid) panelid <- 2 olPanel_2 <- spbal::getPanel(samp3, panelid) panelid <- 3 olPanel_3 <- spbal::getPanel(samp3, panelid) # Plot using ggplot2 ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_jitter(width = 10, height = 10) + geom_point(data = sf::st_jitter(olPanel_1$sample, 0.02), aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-1"), stat = "sf_coordinates" ) + geom_point(data = sf::st_jitter(olPanel_2$sample, 0.02), aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-2"), stat = "sf_coordinates" ) + geom_point(data = olPanel_3$sample, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-3"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "green"), name = "Overlapped Panels") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Halton Frame sample using a overlapping panel \ndesign, of three panels, each containing 20 survey sites.", subtitle = "Total of 50 survey sites from Gates, North Carolina, U.S.A. Panel Overlap = (0, 5, 5)", caption = "spbal Halton Frame Overlapping Panel Design.") ``` ### Halton Frame Stratified sample. Stratified samples can be selected from a Halton Frame with **spbal**. To implement this design, the input shapefile must have labelled sub-regions or strata, which must be enclosed within a single bounding box. We provide a stratified example design for $50$ sites are selected from three counties in the state of North Carolina, U.S.A - Gates ($n = 20$), Northampton ($n = 30$), and Hertford ($n = 10$). Load the shape file for the Gates, Northampton and Hertford regions. ```{r HFex4a, results='hide', message=FALSE} strata <- c("Gates", "Northampton", "Hertford") n_strata <- c("Gates" = 20, "Northampton" = 30, "Hertford" = 10) shp_subset <- shp_file[shp_file[["NAME"]] %in% strata,] ``` Define the bounding box for Gates, Northampton and Hertford using **spbal**. ```{r HFex5a, results='hide', message=FALSE} bb_strata <- spbal::BoundingBox(shapefile = shp_subset) ``` Generate an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame for the Gates, Northampton and Hertford regions. ```{r HFex6a, results='hide', message=FALSE, warning=FALSE} set.seed(511) result9 <- spbal::HaltonFrame(shapefile = shp_subset, N = n_strata, J = c(8, 5), boundingbox = bb_strata, stratum = "NAME") Frame <- result9$hf.pts.shp ``` Use the **getSample()** function to get $10$ points from the Hertford region. ```{r HFex7a, warning=FALSE} n_samples <- 10 hertford_samp <- spbal::getSample(Frame, n = n_samples, strata = "Hertford", stratum = "NAME") hertford_samp <- hertford_samp$sample hertford_samp[1:10, c("NAME", "spbalSeqID", "x")] ``` ### Panel Design from Halton Frame Panel designs from Halton Frames are also possible in **spbal**. The following call sets the last five elements from panel $1$ as the first five elements in panel $2$ and the last five elements from panel $2$ as the first five elements in panel $3$ (an over-lapping design with $50$ unique sites). Each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 50$. Generate an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame for Gates. ```{r HFex8a, warning=FALSE, message=FALSE, results='hide'} set.seed(511) panels <- c(20, 20, 20) n_panel_overlap <- c(0, 5, 5) result10 <- spbal::HaltonFrame(shapefile = shp_gates, panels = panels, panel_overlap = n_panel_overlap, J = c(8, 5), boundingbox = bb) HaltonFramePanel <- result10$hf.pts.shp ``` Now we obtain the sites in panel $1$ using the **getPanel()** function. ```{r HFex9a, warning=FALSE, message=FALSE} panelid <- 1 SitesPanel_1 <- spbal::getPanel(HaltonFramePanel, panelid) SitesPanel_1 <- SitesPanel_1$sample SitesPanel_1[1:10, c("x", "spbalSeqID", "panel_id")] ``` ## Halton Iterative Partitioning (HIP) HIP draws spatially balanced samples and over-samples from point resources by partitioning the resource into boxes with the same nested structure as Halton boxes. The **spbal** parameter **iterations** defines the number of boxes used in the HIP partition and should be larger than the sample size but less than the population size. The **iterations parameter** also defines the number of units available in the HIP over-sample, where the over-sample contains one unit from each box in the HIP partition. Halton iterative partitioning (HIP) extends Basic acceptance sampling (BAS) to point resources. It partitions the resource into $B \geq n$ boxes that have the same nested structure as in BAS, but different sizes. These boxes are then uniquely numbered using a random-start Halton sequence of length $B$. The HIP sample is obtained by randomly drawing one point from each of the boxes numbered $1, 2, ..., n$. HIP draws spatially balanced samples from point resources by partitioning the resource into $B = 2^{J_1} * 3^{J_2}$ nested boxes. ### spbal::HIP() The __spbal::HIP()__ function supports the following input parameters: + __population__ A population of point pairs. + __n__ The number of points to draw from the population. Default $20$. + __iterations__ The levels of partitioning required. Default $7$. + __minRadius__ If specified, the minimum distance, in meters, allowed between sample points. This is applied to the *$overSample*. + __panels__ A list of integers that define the size of each panel in a non-overlapping panels design. The length of the list determines the number of panels required. The sum of the integers in the panels parameter will determine the total number of samples selected, __n__. The default value for __panels__ is NULL, this indicates that a non-overlapping panel design is not wanted. + __panel_overlap__ A list of integers that define the overlap into the previous panel. Is only used when the __panels__ parameter is not NULL. The default value for __panel_overlap__ is NULL. The length of __panel_overlap__ must be equal to the length of panels. The first value is always forced to zero as the first panel never overlaps any region. + __verbose__ A boolean, if you want to see any output printed to screen set to TRUE. Helpful if taking a long time. The default is FALSE i.e. no informational messages will be displayed. The __HIP()__ function returns a list containing five variables: + __Population__ The original population. + __HaltonIndex__ The Halton index for the point. Points will be spread equally across all Halton indices. + __sample__ The population sample. + __overSample__ The *overSample* contains one point from each Halton box. All contiguous sub-samples from *oversample* are spatially balanced, and the first $n$ points are identical to *sample*. + __minRadius__ This result variable contains the sample created using the *minRadius* parameter. If the *minRadius* parameter is not specified then the *minRadius* variable will contain NULL. ### spbal::HIP() code example The following sample code is used to demonstrate the use of the __spbal::HIP()__ function. Here we are generating $20$ points from a population of $5,000$ (random) points with $7$ levels of partitioning ($4$ in the first dimension and $3$ in the second) to give $2^4$ * $3^3$ = $32 * 27$, resulting in $864$ boxes. ```{r HIPex1a} # set random seed base::set.seed(511) # define HIP parameters. pop <- matrix(stats::runif(5000*2), nrow = 5000, ncol = 2) n <- 20 its <- 7 # Convert the population matrix to an sf point object. sf_points <- sf::st_as_sf(data.frame(pop), coords = c("X1", "X2")) dim(sf::st_coordinates(sf_points)) # generate HIP sample. result <- spbal::HIP(population = sf_points, n = n, iterations = its) # HaltonIndex HaltonIndex <- result$HaltonIndex # verify all spread equally, should be TRUE. (length(unique(table(HaltonIndex))) == 1) # Population Sample HIPsample <- result$sample HIPsample ``` The HIP over-sample has $432$ sites (one site from each of the $432$ boxes in the HIP partition). ```{r HIPex2a} HIPoverSample <- result$overSample HIPoverSample[1:10, c("geometry", "spbalSeqID")] OverSampleSize <- dim(HIPoverSample)[1] OverSampleSize ``` The first $n$ points in **HIPoverSample** are identical to **HIPsample**. ```{r HIPex3a} # compare the HIP sample and oversample, they will be the same. all.equal(HIPsample$geometry[1:n], HIPoverSample$geometry[1:n]) ``` ### Forcing a minimum distance between sites HIP selects spatially balanced samples, but the algorithm can choose sites closer together than desired, mainly if site density varies significantly over the study region. To enforce a minimum distance between sites, **spbal** has a minimum distance between sites parameter **minRadius**. Setting **minRadius** $= r > 0$ draws an ordered subset $S$ from the HIP over-sample, where the size of $S$ depends on the **iterations** and **minRadius** parameters. Taking the first $n \leq |S|$ sites from $S$ gives a well-spread sample of $n$ sites that satisfies the minimum distance between sites requirement. ### HIP Panel Design HIP Panel designs are also possible in **spbal**. Use the **panels** and **n_panel_overlap** parameters as demonstrated for the **BAS** and **HaltonFrame** functions. ## More Details The following section provides more details on the __spbal__ package functions *getSample* and *getPanel*. These functions only accept objects created by __spbal__ as input via the *shapefile* parameter. ### *getSample* The **spbal** *getSample* function can be applied to outputs produced by the __spbal__ *BAS* and *HaltonFrame* functions. #### Non-Stratified samples Let 'shp' be the output from *BAS* or *HaltonFrame*, then - __getSample(shapefile = shp, n = n)__ will output the first $n$ points from $shp$. - __getSample(shapefile = shp, n = n, randomStart = TRUE)__ will output $n$ points (in the order of the frame) from a random starting point in $shp$. #### Stratified samples In the case where the input is a stratified sample, both the $strata$ and $stratum$ parameters must be specified. If they are specified then the $n$ parameter need not be specified, if it is, it will be ignored. In this case *getSample* will return all points of a stratified sample. The parameter $randomStart$ is not supported when dealing with stratified samples. If specified it will be ignored. - __getSample(shapefile = shp, strata = strata, stratum = stratum)__ will output all points from $shp$ where the column name specified by the $stratum$ parameter contains the value specified by the $strata$ parameter. ### *getPanel* The **spbal** *getPanel* function can be applied to outputs produced by the __spbal__ *BAS* and *HaltonFrame* functions. Let 'shp' be the output from *BAS* or *HaltonFrame*, then - __getPanel(shapefile = shp, panelid = panelid)__ will output all the points belonging to panel $panelid$ from $shp$. Use the *R* help function for more information on these functions. ## The End