europe_pmc module

class pygetpapers.repository.europe_pmc.EuropePmc

Bases: RepositoryInterface

Downloads metadata and optionally fulltext from https://europepmc.org

apipaperdownload(query_namespace)

Takes in the query_namespace object as the parameter and runs the query search for given search parameters.

Parameters: query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse

build_and_send_query(maximum_hits_per_page, cursor_mark, query, synonym)

Retrieves metadata from EPMC for given query

Parameters

maximum_hits_per_page (int) – number of papers to get
cursor_mark (string) – cursor mark
query (string) – query
synonym (bool) – whether to get synonyms, defaults to True

Returns

metadata dictionary

Return type

dict

static buildquery(cursormark, page_size, query, synonym=True): Builds query parameters

static create_parameters_for_paper_download()

Creates parameters for paper download

Returns: parameters for paper download tuple
Return type: tuple

get_supplementary_metadata(metadata_dictionary_with_all_papers, getpdf=False, makecsv=False, makehtml=False, makexml=False, references=False, citations=False, supplementary_files=False, zip_files=False)

Gets supplementary metadata

Parameters

metadata_dictionary_with_all_papers (dict) – metadata dictionary
getpdf (bool, optional) – whether to get pdfs
makecsv (bool, optional) – whether to create csv output
makehtml (bool, optional) – whether to create html output
makexml (bool, optional) – whether to download xml fulltext
references (bool, optional) – whether to download references
citations (bool, optional) – whether to download citations
supplementary_files (bool, optional) – whether to download supplementary_files
zip_files (bool, optional) – whether to download zip_files from the ftp endpoint

get_urls_to_write_to(identifier_for_paper)

Gets urls to write the metadata to

Parameters: identifier_for_paper (str) – identifier for paper
Returns: urls to write the metadata to
Return type: tuple

make_html_from_dict(dict_to_write_html_from, url, identifier_for_paper)

Makes html from dict

Parameters

dict_to_write_html_from (dict) – dict to write html from
url (str) – url to write html to

noexecute(query_namespace)

Takes in the query_namespace object as the parameter and runs the query search for given search parameters but only prints the output and not write to disk.

Parameters: query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse

query(query, cutoff_size, synonym=True, cursor_mark='*')

Queries eupmc for given query for given number(cutoff_size) papers

Parameters

query (string) – query
cutoff_size (int) – number of papers to get
synonym (bool, optional) – whether to get synonyms, defaults to True

Returns

list containg the papers

Return type

list

restart(query_namespace)

Restarts query to add new metadata for existing papers

Parameters: query_namespace (dict) – pygetpaper’s name space object

run_eupmc_query_and_get_metadata(query, cutoff_size, update=None, onlymakejson=False, getpdf=False, makehtml=False, makecsv=False, makexml=False, references=False, citations=False, supplementary_files=False, synonym=True, zip_files=False)

update(query_namespace)

If there is a previously existing corpus, this function reads in the ‘cursor mark’ from the previous run, increments in, and adds new papers for the given parameters to the existing corpus.

Parameters: query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse