Title: | Design Experiments for Batches |
---|---|
Description: | Distributes samples in batches while making batches homogeneous according to their description. Allows for an arbitrary number of variables, both numeric and categorical. For quality control it provides functions to subset a representative sample. |
Authors: | Lluís Revilla Sancho [aut, cre] , Juanjo Lozano [ths] , Azucena Salas Martinez [ths] |
Maintainer: | Lluís Revilla Sancho <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.0.9000 |
Built: | 2024-11-17 04:55:46 UTC |
Source: | https://github.com/llrs/experdesign |
Enables easy distribution of samples per batch avoiding batch and confounding effects by randomization of the variables in each batch.
The most important function is design()
, which distributes
samples in batches according to the information provided.
To help in the bench there is the inspect()
function that appends
the group to the data provided.
If you have a grid or some spatial data, you might want to look at the
spatial()
function to distribute the samples while keeping the original
design.
In case an experiment was half processed and you need to extend it you can
use follow_up()
or follow_up2()
. It helps selecting which samples
already used should be used in the follow up.
Lluís Revilla
Useful links:
Report bugs at https://github.com/llrs/experDesign/issues
Given an index return the name of the batches the samples are in
batch_names(i)
batch_names(i)
i |
A list of numeric indices. |
A character vector with the names of the batch for each the index.
create_subset()
, for the inverse look at
use_index()
.
index <- create_subset(100, 50, 2) batch <- batch_names(index) head(batch)
index <- create_subset(100, 50, 2) batch <- batch_names(index) head(batch)
In order to run a successful experiment a good design is needed even before measuring the data. This functions checks several heuristics for a good experiment and warns if they are not found.
check_data(pheno, omit = NULL, na.omit = FALSE)
check_data(pheno, omit = NULL, na.omit = FALSE)
pheno |
Data.frame with the variables of each sample, one row one sample. |
omit |
Character vector with the names of the columns to omit. |
na.omit |
Check the effects of missing values too. |
A logical value indicating if everything is alright (TRUE
or not (FALSE
).
rdata <- expand.grid(sex = c("M", "F"), class = c("lower", "median", "high")) rdata2 <- rbind(rdata, rdata) check_data(rdata2) #Different warnings check_data(rdata) check_data(rdata[-c(1, 3), ]) data(survey, package = "MASS") check_data(survey)
rdata <- expand.grid(sex = c("M", "F"), class = c("lower", "median", "high")) rdata2 <- rbind(rdata, rdata) check_data(rdata2) #Different warnings check_data(rdata) check_data(rdata[-c(1, 3), ]) data(survey, package = "MASS") check_data(survey)
Report the statistics for each subset and variable compared to the original.
check_index(pheno, index, omit = NULL)
check_index(pheno, index, omit = NULL)
pheno |
Data.frame with the sample information. |
index |
A list of indices indicating which samples go to which subset. |
omit |
Name of the columns of the |
The closer the values are to 0, the less difference is with the original distribution, so it is a better randomization.
A matrix with the differences with the original data.
Functions that create an index design()
, replicates()
,
spatial()
. See also create_subset()
for a random index.
index <- create_subset(50, 24) metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) check_index(metadata, index)
index <- create_subset(50, 24) metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) check_index(metadata, index)
Compare the distribution of samples with two different batches.
compare_index(pheno, index1, index2)
compare_index(pheno, index1, index2)
pheno |
A data.frame of the samples with the characteristics to normalize. |
index1 , index2
|
A list with the index for each sample, the name of the
column in |
A matrix with the variables and the columns of of each batch.
Negative values indicate index1
was better.
index1 <- create_subset(50, 24) index2 <- batch_names(create_subset(50, 24)) metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) compare_index(metadata, index1, index2)
index1 <- create_subset(50, 24) index2 <- batch_names(create_subset(50, 24)) metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) compare_index(metadata, index1, index2)
Index of the samples grouped by batches.
create_subset(size_data, size_subset = NULL, n = NULL, name = "SubSet")
create_subset(size_data, size_subset = NULL, n = NULL, name = "SubSet")
size_data |
A numeric value of the amount of samples to distribute. |
size_subset |
A numeric value with the amount of samples per batch. |
n |
A numeric value with the number of batches. |
name |
A character used to name the subsets, either a single one or a
vector the same size as |
A random list of indices of the samples.
batch_names()
, use_index()
if you already
have a factor to be used as index.
index <- create_subset(100, 50, 2)
index <- create_subset(100, 50, 2)
Given some samples it distribute them in several batches, trying to have equal number of samples per batch. It can handle both numeric and categorical data.
design(pheno, size_subset, omit = NULL, iterations = 500, name = "SubSet")
design(pheno, size_subset, omit = NULL, iterations = 500, name = "SubSet")
pheno |
Data.frame with the sample information. |
size_subset |
Numeric value of the number of sample per batch. |
omit |
Name of the columns of the |
iterations |
Numeric value of iterations that will be performed. |
name |
A character used to name the subsets, either a single one or a
vector the same size as |
The indices of which samples go with which batch.
The evaluate_*
functions and create_subset()
.
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) index
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) index
Checks if all the values are maximally distributed in the several batches. Aimed for categorical variables.
distribution(report, column)
distribution(report, column)
report |
A data.frame which must contain a batch column. Which can be
obtained with |
column |
The name of the column one wants to inspect. |
TRUE
if the values are maximal distributed, otherwise FALSE
.
data(survey, package = "MASS") columns <- c("Sex", "Age", "Smoke") nas <- c(137, 70) # Omit rows with NA to avoid warnings in design index <- design(pheno = survey[-nas, columns], size_subset = 70, iterations = 10) batches <- inspect(index, survey[-nas, columns]) distribution(batches, "Sex") distribution(batches, "Smoke")
data(survey, package = "MASS") columns <- c("Sex", "Age", "Smoke") nas <- c(137, 70) # Omit rows with NA to avoid warnings in design index <- design(pheno = survey[-nas, columns], size_subset = 70, iterations = 10) batches <- inspect(index, survey[-nas, columns]) distribution(batches, "Sex") distribution(batches, "Smoke")
Calculates the entropy of a category. It uses the amount of categories to scale between 0 and 1.
entropy(x)
entropy(x)
x |
A character or vector with two or more categories |
The numeric value of the Shannon entropy scaled between 0 and 1.
It omits the NA
if present.
entropy(c("H", "T", "H", "T")) entropy(c("H", "T", "H", "T", "H", "H", "H")) entropy(c("H", "T", "H", "T", "H", "H", NA)) entropy(c("H", "T", "H", "T", "H", "H")) entropy(c("H", "H", "H", "H", "H", "H", NA))
entropy(c("H", "T", "H", "T")) entropy(c("H", "T", "H", "T", "H", "H", "H")) entropy(c("H", "T", "H", "T", "H", "H", NA)) entropy(c("H", "T", "H", "T", "H", "H")) entropy(c("H", "H", "H", "H", "H", "H", NA))
Looks if the nominal or character columns are equally distributed according to the entropy and taking into account the independence between batches. If any column is different in each row it is assumed to be the sample names and thus omitted.
evaluate_entropy(i, pheno)
evaluate_entropy(i, pheno)
i |
list of numeric indices of the data.frame |
pheno |
Data.frame with information about the samples |
Value to minimize
Other functions to evaluate samples:
evaluate_independence()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_orig()
,
evaluate_sd()
Other functions to evaluate categories:
evaluate_independence()
,
evaluate_na()
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that numeric columns will be omitted: evaluate_entropy(index, survey[, c("Sex", "Smoke", "Age")])
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that numeric columns will be omitted: evaluate_entropy(index, survey[, c("Sex", "Smoke", "Age")])
Looks the independence between the categories and the batches.
evaluate_independence(i, pheno)
evaluate_independence(i, pheno)
i |
Index of subsets. |
pheno |
A data.frame with the information about the samples. |
Returns a vector with the p-values of the chisq.test between the category and the subset.
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_orig()
,
evaluate_sd()
Other functions to evaluate categories:
evaluate_entropy()
,
evaluate_na()
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that numeric columns will be omitted: evaluate_independence(index, survey[, c("Sex", "Smoke", "Age")])
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that numeric columns will be omitted: evaluate_independence(index, survey[, c("Sex", "Smoke", "Age")])
Measures several indicators per group
evaluate_index(i, pheno)
evaluate_index(i, pheno)
i |
Index |
pheno |
Data.frame with information about the samples |
An array of three dimensions with the mean, standard deviation
(sd()
), and median absolute deviation (mad()
) of the numeric variables, the
entropy of the categorical and the number of NA
by each subgroup.
If you have already an index you can use use_index()
.
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_orig()
,
evaluate_sd()
data(survey, package = "MASS") index <- create_subset(nrow(survey), 50, 5) ev_index <- evaluate_index(index, survey[, c("Sex", "Smoke")]) ev_index["entropy", , ]
data(survey, package = "MASS") index <- create_subset(nrow(survey), 50, 5) ev_index <- evaluate_index(index, survey[, c("Sex", "Smoke")]) ev_index["entropy", , ]
Looks for the median absolute deviation values in each subgroup.
evaluate_mad(i, pheno)
evaluate_mad(i, pheno)
i |
List of indices |
pheno |
Data.frame with information about the samples |
A vector with the mean difference between the median absolute deviation of each group and the original mad.
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_index()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_orig()
,
evaluate_sd()
Other functions to evaluate numbers:
evaluate_mean()
,
evaluate_na()
,
evaluate_sd()
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_mad(index, survey[, c("Sex", "Smoke", "Age")])
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_mad(index, survey[, c("Sex", "Smoke", "Age")])
Looks for the mean of the numeric values
evaluate_mean(i, pheno)
evaluate_mean(i, pheno)
i |
List of indices |
pheno |
Data.frame with information about the samples |
A matrix with the mean value for each column for each subset
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_na()
,
evaluate_orig()
,
evaluate_sd()
Other functions to evaluate numbers:
evaluate_mad()
,
evaluate_na()
,
evaluate_sd()
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_mean(index, survey[, c("Sex", "Smoke", "Age")])
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_mean(index, survey[, c("Sex", "Smoke", "Age")])
Looks how are NA
distributed in each subset
evaluate_na(i, pheno)
evaluate_na(i, pheno)
i |
list of numeric indices of the data.frame |
pheno |
Data.frame |
The optimum value to reduce
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_orig()
,
evaluate_sd()
Other functions to evaluate categories:
evaluate_entropy()
,
evaluate_independence()
Other functions to evaluate numbers:
evaluate_mad()
,
evaluate_mean()
,
evaluate_sd()
samples <- 10 m <- matrix(rnorm(samples), nrow = samples) m[sample(seq_len(samples), size = 5), ] <- NA # Some NA i <- create_subset(samples, 3, 4) # random subsets evaluate_na(i, m)
samples <- 10 m <- matrix(rnorm(samples), nrow = samples) m[sample(seq_len(samples), size = 5), ] <- NA # Some NA i <- create_subset(samples, 3, 4) # random subsets evaluate_na(i, m)
Measure some summary statistics of the whole cohort of samples
evaluate_orig(pheno)
evaluate_orig(pheno)
pheno |
Data.frame with information about the samples |
A matrix with the mean, standard deviation, MAD values of the
numeric variables, the entropy of the categorical, and the amount of
NA
per variable.
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_sd()
data(survey, package = "MASS") evaluate_orig(survey[, c("Sex", "Age", "Smoke")])
data(survey, package = "MASS") evaluate_orig(survey[, c("Sex", "Age", "Smoke")])
Looks for the standard deviation of the numeric values
evaluate_sd(i, pheno)
evaluate_sd(i, pheno)
i |
List of indices |
pheno |
Data.frame with the samples |
A matrix with the standard deviation value for each column for each subset
Other functions to evaluate samples:
evaluate_entropy()
,
evaluate_independence()
,
evaluate_index()
,
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
,
evaluate_orig()
Other functions to evaluate numbers:
evaluate_mad()
,
evaluate_mean()
,
evaluate_na()
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_sd(index, survey[, c("Sex", "Smoke", "Age")])
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) # Note that categorical columns will be omitted: evaluate_sd(index, survey[, c("Sex", "Smoke", "Age")])
Subset some samples that are mostly different.
extreme_cases(pheno, size, omit = NULL, iterations = 500)
extreme_cases(pheno, size, omit = NULL, iterations = 500)
pheno |
Data.frame with the sample information. |
size |
The number of samples to subset. |
omit |
Name of the columns of the |
iterations |
Numeric value of iterations that will be performed. |
A vector with the number of the rows that are selected.
metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) sel <- extreme_cases(metadata, 10) # We can see that it selected both Female and Males and wide range of height # and weight: metadata[sel, ]
metadata <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female")) sel <- extreme_cases(metadata, 10) # We can see that it selected both Female and Males and wide range of height # and weight: metadata[sel, ]
If an experiment was carried out with some samples and you want to continue with some other samples later on.
follow_up( original, follow_up, size_subset, omit = NULL, old_new = "batch", iterations = 500 )
follow_up( original, follow_up, size_subset, omit = NULL, old_new = "batch", iterations = 500 )
original |
A |
follow_up |
A |
size_subset |
Numeric value of the number of sample per batch. |
omit |
Name of the columns of the |
old_new |
Name of the column where the batch status will be stored. If it matches the name of a column in original it will be used to find previous batches. |
iterations |
Numeric value of iterations that will be performed. |
A data.frame
with the common columns of data, a new column
old_new
, and a batch column filled with the new batches needed.
data(survey, package = "MASS") survey1 <- survey[1:118, ] survey2 <- survey[119:nrow(survey), ] folu <- follow_up(survey1, survey2, size_subset = 50, iterations = 10)
data(survey, package = "MASS") survey1 <- survey[1:118, ] survey2 <- survey[119:nrow(survey), ] folu <- follow_up(survey1, survey2, size_subset = 50, iterations = 10)
Design experiment with all the data new and old together.
follow_up2(all_data, batch_column = "batch", ...)
follow_up2(all_data, batch_column = "batch", ...)
all_data |
A |
batch_column |
The name of the column of |
... |
Arguments passed on to
|
If the batch_column
is empty the samples are considered new.
If the size_subset
is missing, it will be estimated from the previous batch
Similarly, iterations and name will be guessed or inferred from the samples.
A data.frame
with the batch_column
filled with the new batches needed.
data(survey, package = "MASS") # Create the first batch first_batch_n <- 118 variables <- c("Sex", "Smoke", "Age") survey1 <- survey[seq_len(first_batch_n), variables] index1 <- design(survey1, size_subset = 50, iterations = 10) r_survey <- inspect(index1, survey1) # Create the second batch with "new" students survey2 <- survey[seq(from = first_batch_n +1, to = nrow(survey)), variables] survey2$batch <- NA # Prepare the follow up all_classroom <- rbind(r_survey, survey2) follow_up2(all_classroom, size_subset = 50, iterations = 10)
data(survey, package = "MASS") # Create the first batch first_batch_n <- 118 variables <- c("Sex", "Smoke", "Age") survey1 <- survey[seq_len(first_batch_n), variables] index1 <- design(survey1, size_subset = 50, iterations = 10) r_survey <- inspect(index1, survey1) # Create the second batch with "new" students survey2 <- survey[seq(from = first_batch_n +1, to = nrow(survey)), variables] survey2$batch <- NA # Prepare the follow up all_classroom <- rbind(r_survey, survey2) follow_up2(all_classroom, size_subset = 50, iterations = 10)
Given the index and the data of the samples append the batch assignment
inspect(i, pheno, omit = NULL, index_name = "batch")
inspect(i, pheno, omit = NULL, index_name = "batch")
i |
List of indices of samples per batch |
pheno |
Data.frame with the sample information. |
omit |
Name of the columns of the |
index_name |
Column name of the index of the resulting data.frame. |
The data.frame with a new column batch with the name of the batch the sample goes to.
data(survey, package = "MASS") columns <- c("Sex", "Age", "Smoke") index <- design(pheno = survey[, columns], size_subset = 70, iterations = 10) batches <- inspect(index, survey[, columns]) head(batches)
data(survey, package = "MASS") columns <- c("Sex", "Age", "Smoke") index <- design(pheno = survey[, columns], size_subset = 70, iterations = 10) batches <- inspect(index, survey[, columns]) head(batches)
Calculates the optimum values for number of batches or size of the batches. If you need to do several batches it can be better to distribute it evenly and add replicates.
optimum_batches(size_data, size_subset) optimum_subset(size_data, batches) sizes_batches(size_data, size_subset, batches)
optimum_batches(size_data, size_subset) optimum_subset(size_data, batches) sizes_batches(size_data, size_subset, batches)
size_data |
A numeric value of the number of samples to use. |
size_subset |
Numeric value of the number of sample per batch. |
batches |
A numeric value of the number of batches. |
optimum_batches
A numeric value with the number of batches to use.
optimum_subset
A numeric value with the maximum number of samples per batch of the data.
sizes_batches
A numeric vector with the number of samples in each batch.
size_data <- 50 size_batch <- 24 (batches <- optimum_batches(size_data, size_batch)) # So now the best number of samples for each batch is less than the available (size <- optimum_subset(size_data, batches)) # The distribution of samples per batch sizes_batches(size_data, size, batches)
size_data <- 50 size_batch <- 24 (batches <- optimum_batches(size_data, size_batch)) # So now the best number of samples for each batch is less than the available (size <- optimum_subset(size_data, batches)) # The distribution of samples per batch sizes_batches(size_data, size, batches)
Create position names for a grid.
position_name(rows, columns)
position_name(rows, columns)
rows |
Names of the rows. |
columns |
Names of the columns. |
A data.frame with the rows and columns and the resulting name row+column. The name column is a factor for easier sorting in row, column order.
position_name(c("A", "B"), 1:2)
position_name(c("A", "B"), 1:2)
Select randomly some samples from an index
qcSubset(index, size, each = FALSE)
qcSubset(index, size, each = FALSE)
index |
A list of indices indicating which samples go to which subset. |
size |
The number of samples that should be taken. |
each |
A logical value if the subset should be taken from all the samples or for each batch. |
set.seed(50) index <- create_subset(100, 50, 2) QC_samples <- qcSubset(index, 10) QC_samplesBatch <- qcSubset(index, 10, TRUE)
set.seed(50) index <- create_subset(100, 50, 2) QC_samples <- qcSubset(index, 10) QC_samplesBatch <- qcSubset(index, 10, TRUE)
To ensure that the batches are comparable some samples are processed in each
batch. This function allows to take into account that effect.
It uses the most different samples as controls as defined with extreme_cases()
.
replicates(pheno, size_subset, controls, omit = NULL, iterations = 500)
replicates(pheno, size_subset, controls, omit = NULL, iterations = 500)
pheno |
Data.frame with the sample information. |
size_subset |
Numeric value of the number of sample per batch. |
controls |
The numeric value of the amount of technical controls per batch. |
omit |
Name of the columns of the |
iterations |
Numeric value of iterations that will be performed. |
To control for variance replicates are important, see for example https://www.nature.com/articles/nmeth.3091.
A index with some samples duplicated in the batches.
samples <- data.frame(L = letters[1:25], Age = rnorm(25), type = sample(LETTERS[1:5], 25, TRUE)) index <- replicates(samples, 5, controls = 2, omit = "L", iterations = 10) head(index)
samples <- data.frame(L = letters[1:25], Age = rnorm(25), type = sample(LETTERS[1:5], 25, TRUE)) index <- replicates(samples, 5, controls = 2, omit = "L", iterations = 10) head(index)
This function assumes that to process the batch the samples are distributed in a plate with a grid scheme.
spatial( index, pheno, omit = NULL, remove_positions = NULL, rows = LETTERS[1:5], columns = 1:10, iterations = 500 )
spatial( index, pheno, omit = NULL, remove_positions = NULL, rows = LETTERS[1:5], columns = 1:10, iterations = 500 )
index |
A list with the samples on each subgroup, as provided from
|
pheno |
Data.frame with the sample information. |
omit |
Name of the columns of the |
remove_positions |
Character, name of positions to be avoided in the grid. |
rows |
Character, name of the rows to be used. |
columns |
Character, name of the rows to be used. |
iterations |
Numeric value of iterations that will be performed. |
The indices of which samples go with which batch.
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) index2 <- spatial(index, survey[, c("Sex", "Smoke", "Age")], iterations = 10) head(index2)
data(survey, package = "MASS") index <- design(survey[, c("Sex", "Smoke", "Age")], size_subset = 50, iterations = 10) index2 <- spatial(index, survey[, c("Sex", "Smoke", "Age")], iterations = 10) head(index2)
Convert a given factor to an accepted index
use_index(x)
use_index(x)
x |
A character or a factor to be used as index |
You can use evaluate_index()
to evaluate how good an
index is. For the inverse look at batch_names()
.
plates <- c("P1", "P2", "P1", "P2", "P2", "P3", "P1", "P3", "P1", "P1") use_index(plates)
plates <- c("P1", "P2", "P1", "P2", "P2", "P3", "P1", "P3", "P1", "P1") use_index(plates)
Sometimes some samples are collected and analyzed, later another batch of samples is analyzed. This function tries to detect if there are problems with the data or when the data is combined in a single analysis. To know specific problems with the data you need to use check_data()
valid_followup( old_data = NULL, new_data = NULL, all_data = NULL, omit = NULL, column = "batch" )
valid_followup( old_data = NULL, new_data = NULL, all_data = NULL, omit = NULL, column = "batch" )
old_data , new_data
|
A data.frame with the old and new data respectively. |
all_data |
A |
omit |
Name of the columns of the |
column |
The name of the column where the old data has the batch
information, or whether the data is new or not ( |
Called by its side effects of warnings, but returns a logical value
if there are some issues (FALSE
) or not (TRUE
)
data(survey, package = "MASS") survey1 <- survey[1:118, ] survey2 <- survey[119:nrow(survey), ] valid_followup(survey1, survey2) survey$batch <- NA survey$batch[1:118] <- "old" valid_followup(all_data = survey)
data(survey, package = "MASS") survey1 <- survey[1:118, ] survey2 <- survey[119:nrow(survey), ] valid_followup(survey1, survey2) survey$batch <- NA survey$batch[1:118] <- "old" valid_followup(all_data = survey)