QueryBuilder

pygetpapers builds and runs all queries through a query_builder module (Pygetpapers.py) . There are several reasons:

  • each repository may use its own query language and syntax

  • there is frequent need to use punctuation (e.g. (..), “..”, ‘..’) and these may be nested. Punctuation can also interact with command-line syntax

  • complex queries (e.g. repeated OR, AND, NOT ) are tedious and error-prone

  • many values (especially dates) need converting or standardising

  • some options require or forbid other options (e.g. –xml requires an –output value)

  • successful queries can be saved , edited, and rerun

  • queries may be rerun at a later date, or request a larger number of downloads.

Users may wish to build queries:

  • completely from the commandline (argparse Namespace).

  • from a saved query (configparser configuration file)

  • programmatically through an instance of Pygetpapers

  • mixtures of the above

QueryBuilder contains or creates flags indicating which of the following is to be processed

  • query strings to be submitted to the particular repository

  • flags controlling the execution (download rate, limits, formats)

  • creation of the local repository (CProject)

  • creation of the per-article subdirectories (CTree)

  • postprocessing options (e.g. docanalysis and py4ami, and standard Unix/Python libraries)