Run a subset of the baseline model a number of times while resampling data.
Bootstrap(
outputData,
outputMemoryFile,
projectPath,
BootstrapMethodTable = data.table::data.table(),
NumberOfBootstraps = 1L,
OutputProcesses = character(),
OutputVariables = character(),
UseOutputData = FALSE,
NumberOfCores = 1L,
BaselineSeedTable = data.table::data.table()
)
The output of the function from an earlier run.
The path to the output memory file to copy the outputData
to in the case that UseOutputData
is TRUE.
The path to the project to containing the baseline to bootstrap.
A table of the columns ProcessName, ResampleFunction and Seed, where each row defines the resample function to apply to the output of the given process, and the seed to use in the resampling. The seed is used to draw one seed per bootstrap run using getSeedVector
. See Details for a list of resampling functions and the limitations and benefits of each function.
Integer: The number of bootstrap replicates.
A vector of the processes to save from each bootstrap replicate.
An optional list of variables to keep in the output. A typical set of variables could be ["Survey", "Stratum", "SpeciesCategory", "IndividualTotalLength", "IndividualAge", "Abundance", "Biomass"], which should cover the most frequently used variables in reports. Any variable that is used in a report must be present in OutputVariables
. Empty list (the default) implies to keep all variables. This parameter is included to facilitate smaller disc space for the bootstrap objects and faster writing/reading of that file. The OutputVariables
are extracted for all processes listed in OutputProcesses
. See getBootstrapOutputVariables
for finding the variables used in ReportBootstrap processes of a project.
Logical: Bootstrapping can be time consuming, and by setting UseOutputData
to TRUE the output file generated by a previous run of the process will be used instead of re-running the bootstrapping. Use this parameter with caution. Any changes made to the Baseline model or to the parameters of the Bootstrap itself will not be accounted for unless UseOutputData = FALSE. The option UseOutputData = TRUE is intended only for saving time when one needs to generate a report from an existing Bootstrap run."
The number of cores to use for parallel processing. A copy of the project is created in tempdir() for each core, also when using only one core. Note that this will require disc space equivalent to the NumberOfCores
time the size of the project folder (excluding the output/analysis/Bootstrap folder, which will be deleted before copies are made).
A table of ProcessName and Seed, giving the seed to use for the Baseline processes that requires a Seed parameter. The seed is used to draw one seed per bootstrap run using getSeedVector
.
A BootstrapData
object, which is a list of the RstoxData DataTypes
and RstoxBase DataTypes
.
Resampling of BioticAssignment
For acoustic-trawl survey estimates there are two possible resampling functions for the biotic assignment; ResampleBioticAssignmentByStratum
and ResampleBioticAssignmentByAcousticPSU
. Both of these have their limitations and effect on the mean and variance, and the choice of resampling function should be based on the survey design and how the Hauls are assigned to the AcousticPSUs. The effect on the mean and variance will also depend on the characteristics of the biotic data:
When ResampleFunction is ResampleBioticAssignmentByStratum
in the BootstrapMethodTable
(the only option before StoX 4.0.0), if the AcousticPSUs of a Stratum have different assigned Hauls, there is a probability that none of the assigned Hauls of an AcousticPSU are re-sampled in a bootstrap replicate. Different assigned Hauls can be the result of using DefinitionMethod
"Radius" or "EllipsoidalDistance", or manually assigning different Hauls to each AcousticPSU in DefineBioticAssignment
. This will lead to missing acoustic density for that PSU for the target species, which will propagate throughout to the reports. The only option in order to avoid missing values in the reports in this case is to use RemoveMissingValues = TRUE, which introduces under-estimation of the mean compared to what the estimate would be if none of the AcousticPSUs came out with missing acoustic density. The degree of under-estimation depends on how many of the bootstrap replicates have the problem of AcousticPSUs with missing assignment length distribution.
If the problem of under-estimation due to different assigned hauls per AcousticPSU is present in a StoX project, one alternative is to use the ResampleFunction ResampleBioticAssignmentByAcousticPSU
instead. With this method, the assigned hauls are resampled for each individual AcousticPSU. A potential consequence of this resampling function is however that the variance can be under-estimated compared to using the ResampleFunction ResampleBioticAssignmentByStratum
. The reason for this is that while extreme outcomes of the resampling are equal for all AcousticPSUs of a stratum in the case that ResampleFunction is ResampleBioticAssignmentByStratum
, extreme outcomes are likely to be counteracted by other outcomes from the rest of the AcousticPSUs in the case that ResampleFunction is ResampleBioticAssignmentByAcousticPSU
. The under-estimation can be dependent on the number of AcousticPSUs for each Stratum.
I addition, using ResampleFunction ResampleBioticAssignmentByAcousticPSU
will require a sufficient number of Hauls assigned to each AcousticPSU to achieve meaningful bootstrapping. Surely, only one assigned Haul is not sufficient for any contribution to the variance, and will lead to a warning.
A different option when experiencing missing values in reports due to different assigned Hauls is to split a Stratum into smaller strata, each for which the DefinitionMethod
"Stratum" can be used in DefineBioticAssignment
.
Resampling of MeanNASCData
The ResampleFunction ResampleMeanNASCData
resamples, with replacement, the AcousticPSUs within Stratum in the MeanNASCData
, where Stratum is the stratum associated to the PSU, and not necessarily the actual stratum polygon. The column NASC is scaled by the number of occurrences of each AcousticPSUs from the resampling.
Resampling of ResampleMeanLengthDistributionData
The ResampleFunction ResampleMeanLengthDistributionData
resamples, with replacement, the BioticPSUs within Stratum in the MeanLengthDistributionData
, where Stratum is the stratum associated to the PSU, and not necessarily the actual stratum polygon. The column WeightedNumber is scaled by the number of occurrences of each BioticPSUs from the resampling.
Resampling of ResampleMeanSpeciesCategoryCatchData
The ResampleFunction ResampleMeanSpeciesCategoryCatchData
resamples, with replacement, the BioticPSUs within Stratum in the MeanSpeciesCategoryCatchData
, where Stratum is the stratum associated to the PSU, and not necessarily the actual stratum polygon. The columns TotalCatchWeight and TotalCatchNumber are scaled by the number of occurrences of each BioticPSUs from the resampling.
General
Run RstoxFramework::getRstoxFrameworkDefinitions("resampleFunctions") in R to get a list of the implemented resampling functions. Note that if a process is selected in BootstrapMethodTable
that is not used in the model up to the OutputProcesses
, the bootstrapping of that process will not be effective on the end result (e.g. select the correct process that returns BioticAssignment data type).
A copy of the project is made for each core given by NumberOfCores
.