R/ImputeSuperIndividuals_StoX3.R
ImputeSuperIndividuals_StoX3.Rd
WARNING, DEPRECATED FUNCTION: This is the old imputation function used in StoX 3.0.0 through 3.6.2. The function contains a weakness when hauls are assigned to AcousticPSUs in more than one stratum in BioticAssignment. The resulting SuperIndividuals will then have duplicated individuals and consequently non-unique values in the Individual column, which are used to identify rows to impute from in this function. The result is that values are imputed only from the first of the rows with duplicated Individual, so that information in the other rows are not available, which may lead to incomplete imputation.
ImputeSuperIndividuals_StoX3(
SuperIndividualsData,
ImputationMethod = c("RandomSampling", "Regression"),
ImputeAtMissing = character(),
ImputeByEqual = character(),
ToImpute = character(),
ImputationLevels = c("Haul", "Stratum", "Survey"),
Seed = 1,
RegressionDefinition = c("FunctionParameter", "FunctionInput"),
GroupingVariables = character(),
RegressionModel = c("SimpleLinear", "Power"),
RegressionTable = data.table::data.table(),
Regression
)
The SuperIndividualsData
data.
The method to use for the imputation. Currently, only "RandomSampling" is implemented, but may be accompanied "Regression" in a coming release.
A single string naming the variable which when missing identifies target individuals to input data to. I.e., if ImputeAtMissing
is missing for an individual, perform the imputation. In StoX 3.0.0 and older, ImputeAtMissing
was hard coded to IndividualAge.
A vector of strings naming the variable(s) which, when identical to the target individual, identifies the source individuals to impute data from. The source individuals need also to have non-missing ImputeAtMissing
. In StoX 3.0.0 and older, ImputeByEqual
was hard coded to c("SpeciesCategory","IndividualTotalLength").
A vector of strings naming the variable(s) to impute (copy to the target individual). Values that are not missing are not imputed. Note that values are only imputed when ImputeAtMissing
is missing, so including many variables in ToImpute
is only recommended if all these are present for the individuals (see Details). In StoX 3.0.0 and older, ToImpute
was hard coded to all available variables of the BioticData contained in the SuperIndividualsData
.
A vector of strings naming the levels at which to input, defaulted to c("Haul", "Stratum", "Survey"). To prevent imputation at the Survey level, use c("Haul", "Stratum").
An integer giving the seed to use for the random sampling used to obtain the imputed data.
Character: A string naming the method to use, one of FunctionParameter
to define the Regression on the fly in this function (using GroupingVariables
, RegressionModel
and RegressionTable
), or FunctionInput
to import Regression process data from a previously run process using the function
An optional vector of strings defining variables seving as grouping variables in the RegressionTable. Setting this adds the its elements as columns in the RegressionTable in the GUI.
Character: A string naming the model to use for the regression. See Details for options.
A table with one row defining the name of the dependent variable (column name DependentVariable
), the name of the independent variable (column name IndependentVariable
), and the Intersect
and Slope
if RegressionModel
= "SimpleLinear" and Factor
and Exponent
if RegressionModel
= "Power".
The Regression
process data.
An object of StoX data type SuperIndividualsData
.
For this reason the function is deprecated and the function ImputeSuperIndividuals
, which considers the unique Individual column when imputing, should be used instead. However, due to the difference in the imputation method the results will differ between the two functions even when all Individual are unique. Existing StoX projects saved with StoX <= 3.6.2 will be changed to using ImputeSuperIndividuals_StoX3 when opening in StoX >= 4.0.0, but the recommendation is to change these projects to using ImputeSuperIndividuals instead.
For each (target) individual with missing value in ImputeAtMissing
, identify all (source) individuals in the haul for which ImputeAtMissing
is non-missing and for which the values in ImputeByEqual
are identical to the target individual. Then sample one of these source individuals, and copy values of ToImpute
to the target individual. Only values that are non-missing are copied from the sampled individual, and only missing values in the target individual are replaced. If no source individuals are found in the haul, expand the search to the stratum, and finally to the survey. If no source individuals are found in the survey, leave the target individual unchanged.
When ToImpute
contains more variables than that given by ImputeAtMissing
there is a risk that values remain missing even after successful imputation. E.g., if ImputeAtMissing
is IndividualAge, and ToImpute
includes IndividualRoundWeight, then the weight is only imputed when age is missing. Super-individuals with age but not weight will then still have missing weight. Variables that are naturally connected, such as IndividualRoundWeight and WeightMeasurement, or IndividualTotalLength and LengthResolution, should both be included in ToImpute
.
SuperIndividuals
for distributing Abundance to the Individuals.