Reports the sum, mean or other statistics on a variable of the BootstrapData
.
ReportBootstrap(
BootstrapData,
BaselineProcess = character(),
TargetVariable = character(),
TargetVariableUnit = character(),
ReportFunction = RstoxBase::getReportFunctions(use = "Baseline"),
GroupingVariables = character(),
InformationVariables = character(),
WeightingVariable = character(),
ConditionOperator = character(),
ConditionValue = character(),
FractionOverVariable = character(),
BootstrapReportFunction = RstoxBase::getReportFunctions(use = "Bootstrap"),
Percentages = double(),
Filter = character(),
RemoveMissingValues = FALSE,
AggregationFunction = character()
)
The BootstrapData
data.
A strings naming the baseline process to report from the BootstrapData
. If a process with
The variable to report.
The unit to use for the TargetVariable
. See RstoxData::StoxUnits for possible units (look for the appropriate quantity, e.g. "length" for IndividualTotalLength, and use the shortname in the TargetVariableUnit
).
The name of a function to report the Baseline process output by. This must be a function returning a single value. See ReportFunctions
for implemented funcitons.
The variables to report by. For most applications GroupingVariables
should include "Survey" and "SpeciesCategory", unless the user needs to sum over all Survey or SpeciesCategory.
Variables to include as columns to the end of the report table. These cannot have more unique combinations than the GroupingVariables
.
The variable to weight by. Only relevant for ReportFunction
"weighted.mean". Note that missing values in the WeightingVariable
results in missing value from the weighted.mean
function as per the documentation of this function, regardles of any RemoveMissingValues
.
Expressions (strings) giving the condition for the ReportFunction
number
and fractionOfOccurrence
. Supported values for ConditionOperator
are "%in%", "%notin%", "==", "!=", "%notequal%", "<", "<=", ">=", ">". The ConditionOperator
and ConditionValue
are pasted for use in data.table.
When ReportFunction
is a fraction ("fractionOfOccurrence" or "fractionOfSum") FractionOverVariable
is a string naming the variable (one of the GroupingVariables
) to sum over in the denominator of the fraction.
The function to apply across bootstrap run, such as "cv" or "c".
The percentages to report Percentiles for when BootstrapReportFunction = "summaryStox".
A string with an R expression to filter out unwanted rows of the report, e.g. "IndividualAge %notin% NA" or "Survey %notin% NA & SpeciesCategory %notin% NA".
Logical: If TRUE, remove missing values (NAs) from the TargetVariable
. The default (FALSE) implies to report NA if at least one of the values used in the ReportFunction
is NA. Use RemoveMissingValues
= TRUE with extreme caution, as it may lead to under-estimation. E.g., if RemoveMissingValues
= TRUE and a super-individual lacks IndividualRoundWeight
, Biomass
will be NA, and the portion of Abundance
distributed to that super-individual will be excluded when summing Biomass
(but included when summing Abundance
). It is advised to always run with RemoveMissingValues
= FALSE first, and make a thorough investigation to identify the source of any missing values. The function link{ImputeSuperIndividuals}
can be used to impute the missing information from other super-individuals.
Deprecated, use ReportFunction instead. An alias for ReportFunction
, kept for backward compatibility.
A ReportBootstrapData
object.
This function works in two steps. First, the ReportFunction
is applied to the TargetVariable
of the table given by BaselineProcess
for each unique combination of the GroupingVariables
and for each bootstrap run. Second, a grid of all possible combinations of the GroupingVariables
is formed and the result from the first step placed onto the grid. This creates 0 for each position in the grid where data from the first step are not present. E.g., if a particularly large fish is found in only one haul, and this haul by random is not selected in a bootstrap run, the TargetVariable
will be 0 to reflect the variability in the data. To complete the second step, the BootstrapReportFunction
is applied over the bootstrap runs for each cell in the grid.
The parameter RemoveMissingValues
should be used with extreme caution. The effect of setting RemoveMissingValues
to TRUE is that missing values (NAs) are removed in both the first and second step. This can be dangerous both in the first and in the second step. E.g., if the Abundance of SuperIndividualsData
is positive for super-individuals with missing IndividualWeight, then the Biomass of those super-individuals will be missing as well. If one the wants to sum the Biomass by using ReportFunction
= "sum" one will get NA if RemoveMissingValues
= FALSE. If RemoveMissingValues
= TRUE one will ignore the missing Biomass, and the summed Biomass will only include the super-individuals that have non-missing IndividualWeight, effectively discarding a portion of the observed abundance. The summed Biomass will in this case be underestimated!
In the second step, setting RemoveMissingValues
to TRUE can be even more dangerous, as the only option currently available for the BootstrapReportFunction
is the function RstoxBase::summaryStox(), which includes average and standard deivation which are highly influenced by removing missing data.
Instead of setting RemoveMissingValues
to TRUE, it is advised to apply the function ImputeSuperIndividuals
to fill in e.g. IndividualWeight where missing. Missing values in the output of this function can also be avoided by adding variables to GroupingVariables
, such as adding "Stratum" e.g. if there are strata that are known from Baseline to contain no fish. These strata will then be present but with missing values, but these missing values will not affect other strata if "Stratum" is included in GroupingVariables
. It is also recommended to include "Survey" and "SpeciesCategory" in the GroupingVariables
, as these are key variables for which summary statistics should rarely be computed across.