europe_pmc module

class pygetpapers.repository.europe_pmc.EuropePmc

Bases: RepositoryInterface

Downloads metadata and optionally fulltext from https://europepmc.org

apipaperdownload(query_namespace)

Takes in the query_namespace object as the parameter and runs the query search for given search parameters.

Parameters

query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse

build_and_send_query(maximum_hits_per_page, cursor_mark, query, synonym)

Retrieves metadata from EPMC for given query

Parameters
  • maximum_hits_per_page (int) – number of papers to get

  • cursor_mark (string) – cursor mark

  • query (string) – query

  • synonym (bool) – whether to get synonyms, defaults to True

Returns

metadata dictionary

Return type

dict

static buildquery(cursormark, page_size, query, synonym=True)

Builds query parameters

static create_parameters_for_paper_download()

Creates parameters for paper download

Returns

parameters for paper download tuple

Return type

tuple

get_supplementary_metadata(metadata_dictionary_with_all_papers, getpdf=False, makecsv=False, makehtml=False, makexml=False, references=False, citations=False, supplementary_files=False, zip_files=False)

Gets supplementary metadata

Parameters
  • metadata_dictionary_with_all_papers (dict) – metadata dictionary

  • getpdf (bool, optional) – whether to get pdfs

  • makecsv (bool, optional) – whether to create csv output

  • makehtml (bool, optional) – whether to create html output

  • makexml (bool, optional) – whether to download xml fulltext

  • references (bool, optional) – whether to download references

  • citations (bool, optional) – whether to download citations

  • supplementary_files (bool, optional) – whether to download supplementary_files

  • zip_files (bool, optional) – whether to download zip_files from the ftp endpoint

get_urls_to_write_to(identifier_for_paper)

Gets urls to write the metadata to

Parameters

identifier_for_paper (str) – identifier for paper

Returns

urls to write the metadata to

Return type

tuple

make_html_from_dict(dict_to_write_html_from, url, identifier_for_paper)

Makes html from dict

Parameters
  • dict_to_write_html_from (dict) – dict to write html from

  • url (str) – url to write html to

noexecute(query_namespace)

Takes in the query_namespace object as the parameter and runs the query search for given search parameters but only prints the output and not write to disk.

Parameters

query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse

query(query, cutoff_size, synonym=True, cursor_mark='*')

Queries eupmc for given query for given number(cutoff_size) papers

Parameters
  • query (string) – query

  • cutoff_size (int) – number of papers to get

  • synonym (bool, optional) – whether to get synonyms, defaults to True

Returns

list containg the papers

Return type

list

restart(query_namespace)

Restarts query to add new metadata for existing papers

Parameters

query_namespace (dict) – pygetpaper’s name space object

run_eupmc_query_and_get_metadata(query, cutoff_size, update=None, onlymakejson=False, getpdf=False, makehtml=False, makecsv=False, makexml=False, references=False, citations=False, supplementary_files=False, synonym=True, zip_files=False)
update(query_namespace)

If there is a previously existing corpus, this function reads in the ‘cursor mark’ from the previous run, increments in, and adds new papers for the given parameters to the existing corpus.

Parameters

query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse