arxiv module
- class pygetpapers.repository.arxiv.Arxiv
Bases:
RepositoryInterface
arxiv.org repository
This uses a PyPI code arxiv to download metadata. It is not clear whether this is created by the arXiv project or layered on top of the public API.
arXiv current practice for bulk data download (e.g. PDFs) is described in
https://arxiv.org/help/bulk_data. Please be considerate and also include a rate limit.
- apipaperdownload(query_namespace)
Takes in the query_namespace object as the parameter and runs the query search for given search parameters.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse
- arxiv(query, cutoff_size, getpdf=False, makecsv=False, makexml=False, makehtml=False)
Builds the arxiv searcher and writes the xml, pdf, csv and html
- Parameters
query (string) – query given to arxiv
cutoff_size (int) – number of papers to retrieve
getpdf (bool, optional) – whether to get pdf
makecsv (bool) – whether to get csv
makehtml (bool) – whether to get html
makexml (bool) – whether to get xml
- Returns
dictionary of results retrieved from arxiv
- Return type
dict
- download_pdf(metadata_dictionary)
Downloads pdfs for papers in metadata dictionary
- Parameters
metadata_dictionary (dict) – metadata dictionary for papers
- static noexecute(query_namespace)
Takes in the query_namespace object as the parameter and runs the query search for given search parameters but only prints the output and not write to disk.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse
- static update(query_namespace)
If there is a previously existing corpus, this function reads in the ‘cursor mark’ from the previous run, increments in, and adds new papers for the given parameters to the existing corpus.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse
- write_metadata_json_from_arxiv_dict(metadata_dictionary)
Iterates through metadata_dictionary and makes json metadata file for papers
- Parameters
metadata_dictionary (dict) – metadata dictionary for papers