Reports the sum, mean or other statistics on a variable of the BootstrapData.

ReportBootstrap(
  BootstrapData,
  BaselineProcess = character(),
  TargetVariable = character(),
  TargetVariableUnit = character(),
  ReportFunction = RstoxBase::getReportFunctions(use = "Baseline"),
  GroupingVariables = character(),
  InformationVariables = character(),
  WeightingVariable = character(),
  ConditionOperator = character(),
  ConditionValue = character(),
  FractionOverVariable = character(),
  BootstrapReportFunction = RstoxBase::getReportFunctions(use = "Bootstrap"),
  Percentages = double(),
  Filter = character(),
  RemoveMissingValues = FALSE,
  AggregationFunction = character()
)

Arguments

BootstrapData

The BootstrapData data.

BaselineProcess

A strings naming the baseline process to report from the BootstrapData. If a process with

TargetVariable

The variable to report.

TargetVariableUnit

The unit to use for the TargetVariable. See RstoxData::StoxUnits for possible units (look for the appropriate quantity, e.g. "length" for IndividualTotalLength, and use the shortname in the TargetVariableUnit).

ReportFunction

The name of a function to report the Baseline process output by. This must be a function returning a single value. See ReportFunctions for implemented funcitons.

GroupingVariables

The variables to report by. For most applications GroupingVariables should include "Survey" and "SpeciesCategory", unless the user needs to sum over all Survey or SpeciesCategory.

InformationVariables

Variables to include as columns to the end of the report table. These cannot have more unique combinations than the GroupingVariables.

WeightingVariable

The variable to weight by. Only relevant for ReportFunction "weighted.mean". Note that missing values in the WeightingVariable results in missing value from the weighted.mean function as per the documentation of this function, regardles of any RemoveMissingValues.

ConditionOperator, ConditionValue

Expressions (strings) giving the condition for the ReportFunction number and fractionOfOccurrence. Supported values for ConditionOperator are "%in%", "%notin%", "==", "!=", "%notequal%", "<", "<=", ">=", ">". The ConditionOperator and ConditionValue are pasted for use in data.table.

FractionOverVariable

When ReportFunction is a fraction ("fractionOfOccurrence" or "fractionOfSum") FractionOverVariable is a string naming the variable (one of the GroupingVariables) to sum over in the denominator of the fraction.

BootstrapReportFunction

The function to apply across bootstrap run, such as "cv" or "c".

Percentages

The percentages to report Percentiles for when BootstrapReportFunction = "summaryStox".

Filter

A string with an R expression to filter out unwanted rows of the report, e.g. "IndividualAge %notin% NA" or "Survey %notin% NA & SpeciesCategory %notin% NA".

RemoveMissingValues

Logical: If TRUE, remove missing values (NAs) from the TargetVariable. The default (FALSE) implies to report NA if at least one of the values used in the ReportFunction is NA. Use RemoveMissingValues = TRUE with extreme caution, as it may lead to under-estimation. E.g., if RemoveMissingValues = TRUE and a super-individual lacks IndividualRoundWeight, Biomass will be NA, and the portion of Abundance distributed to that super-individual will be excluded when summing Biomass (but included when summing Abundance). It is advised to always run with RemoveMissingValues = FALSE first, and make a thorough investigation to identify the source of any missing values. The function link{ImputeSuperIndividuals} can be used to impute the missing information from other super-individuals.

AggregationFunction

Deprecated, use ReportFunction instead. An alias for ReportFunction, kept for backward compatibility.

Value

A ReportBootstrapData object.

Details

This function works in two steps. First, the ReportFunction is applied to the TargetVariable of the table given by BaselineProcess for each unique combination of the GroupingVariables and for each bootstrap run. Second, a grid of all possible combinations of the GroupingVariables is formed and the result from the first step placed onto the grid. This creates 0 for each position in the grid where data from the first step are not present. E.g., if a particularly large fish is found in only one haul, and this haul by random is not selected in a bootstrap run, the TargetVariable will be 0 to reflect the variability in the data. To complete the second step, the BootstrapReportFunction is applied over the bootstrap runs for each cell in the grid.

The parameter RemoveMissingValues should be used with extreme caution. The effect of setting RemoveMissingValues to TRUE is that missing values (NAs) are removed in both the first and second step. This can be dangerous both in the first and in the second step. E.g., if the Abundance of SuperIndividualsData is positive for super-individuals with missing IndividualWeight, then the Biomass of those super-individuals will be missing as well. If one the wants to sum the Biomass by using ReportFunction = "sum" one will get NA if RemoveMissingValues = FALSE. If RemoveMissingValues = TRUE one will ignore the missing Biomass, and the summed Biomass will only include the super-individuals that have non-missing IndividualWeight, effectively discarding a portion of the observed abundance. The summed Biomass will in this case be underestimated!

In the second step, setting RemoveMissingValues to TRUE can be even more dangerous, as the only option currently available for the BootstrapReportFunction is the function RstoxBase::summaryStox(), which includes average and standard deivation which are highly influenced by removing missing data.

Instead of setting RemoveMissingValues to TRUE, it is advised to apply the function ImputeSuperIndividuals to fill in e.g. IndividualWeight where missing. Missing values in the output of this function can also be avoided by adding variables to GroupingVariables, such as adding "Stratum" e.g. if there are strata that are known from Baseline to contain no fish. These strata will then be present but with missing values, but these missing values will not affect other strata if "Stratum" is included in GroupingVariables. It is also recommended to include "Survey" and "SpeciesCategory" in the GroupingVariables, as these are key variables for which summary statistics should rarely be computed across.