LitFetchR 1.0.0
Major reworking of the retrieval pipeline, consolidating a series of performance, reliability, and data-quality improvements. The exported function interface is unchanged apart from two new optional arguments to save_api_keys() (scp_insttoken and ncbi_api_key), but the returned data frame has changed (see “Output columns” below).
Output columns
- The
issuecolumn has been renamed tonumber. Code that referred to the output column by the nameissuemust be updated. - Added
pagesandisbncolumns to the output of all three extractors. - The returned data frame is now a consistent 12 columns across Web of Science, Scopus, and PubMed:
author, year, title, journal, volume, number, abstract, doi, pages, isbn, source, platform_id.
Performance
- Web of Science and Scopus extractors no longer make individual per-record API calls. All metadata is now extracted directly from the batch search responses, greatly reducing the number of API requests and the time per run.
- PubMed record details are now fetched in batches of 200 PMIDs per call instead of one call per record.
- Replaced row-by-row
rbindaccumulation with list collection and a single combine in all three extractors. - Removed a redundant save-and-reread of
history_id.xlsxduring extraction.
API keys and authentication
- The Scopus API key is now sent as the
X-ELS-APIKeyrequest header instead of a URL query parameter, matching Elsevier’s current requirements. - Added optional support for a Scopus institutional token. Set it with
save_api_keys(scp_insttoken = "..."); when present it is sent as theX-ELS-Insttokenheader on all Scopus requests. Some institutional subscriptions require it. -
save_api_keys()can now save an optional NCBI/PubMed API key withsave_api_keys(ncbi_api_key = "..."), which enables a higher PubMed request rate. Previously this key had to be added to.Renvironby hand.
Reliability and error handling
- Rate-limit responses (HTTP 429) are now retried in a dedicated loop that respects the
Retry-Afterheader. - Authentication failures (HTTP 401/403) now fail immediately with a clear diagnostic message instead of retrying.
- Every request now has a 60-second timeout.
- PubMed requests now respect NCBI rate limits, using a faster rate when the optional
ncbi_api_keyenvironment variable is set. -
create_save_search()now uses the same retrying HTTP helper as the extractors rather than bare requests. - The list of already-seen record IDs is now saved only after extraction succeeds, so an interrupted run no longer marks records as fetched.
- Search strings containing
=are now parsed correctly fromsearch_list.txt. - Fixed handling of searches that return zero results in all three extractors.
Data quality
- Superscript/subscript markup (
<sup>/<inf>tags) is now stripped from Scopus and Web of Science titles and abstracts while preserving their content (e.g.10<sup>2</sup>becomes102), matching Scopus’ own export format. Bare<and>used as mathematical operators (e.g.p < 0.05) are preserved. - Web of Science extraction now handles more record types correctly: DIIDW patents (inventors, year, and Derwent abstract), multi-language abstracts from CSCD/KJD/SCIELO (preferring the English variant), and BCI/ZOOREC book-chapter journal titles. Italic terms in CABI titles and abstracts are reconstructed in place, and the article URL is used as a fallback when no DOI is available.
LitFetchR 0.2.2
CRAN release: 2026-04-14
- Fixes bug in
manual_fetch(): removed a chunk of code that was calling ‘scp_api_key’ before it was created in the internal functionextract_scp_list(). (see github issue https://github.com/thomasdumond/LitFetchR/issues/2#issue-4259003613)
LitFetchR 0.2.1
CRAN release: 2026-02-10
CRAN release: 2026-02-10 * Added ‘directory’ arguments to functions creating files so users can choose in which directory they are created.
Added option to choose which literature platform to access when creating the search string using
create_save_search()function.Information messages can now be suppressed, if needed, using the function
suppressMessages().This was the first version
