Deduplicates the references from up to three dataframes.
Usage
dedup_refs(
df1 = NULL,
df2 = NULL,
df3 = NULL,
directory,
open_file = FALSE,
dry_run = FALSE
)Value
NULL (invisibly). Called for its side effects:
writes a CSV of deduplicated citations and
an Excel workbook recording the deduplication history.
Examples
# This is a "dry run" example.
# No deduplication will happen.
# It only shows how the function should react.
dedup_refs(df1 = df_vibrio_wos,
df2 = df_vibrio_scp,
df3 = df_vibrio_pmd,
directory = tempdir(),
open_file = FALSE,
dry_run = TRUE
)
#> This is the message from the dry run showing what you should
#> be seeing when the function will be used:
#> Warning: The following columns are missing: pages, number, record_id, isbn
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> flagging potential pairs for manual dedup...
#> Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
#> 254 citations loaded...
#> 14 duplicate citations removed...
#> 240 unique citations remaining!
#> Deduplication script has been executed,
#> concatenated deduplicated references had been exported.
#> Warning message:
#> In add_missing_cols(raw_citations) :
#> Search contains missing values for the record_id column.
#> A record_id will be created using row numbers
