europe_pmc module
- class pygetpapers.repository.europe_pmc.EuropePmc
Bases:
RepositoryInterface
Downloads metadata and optionally fulltext from https://europepmc.org
- apipaperdownload(query_namespace)
Takes in the query_namespace object as the parameter and runs the query search for given search parameters.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse
- build_and_send_query(maximum_hits_per_page, cursor_mark, query, synonym)
Retrieves metadata from EPMC for given query
- Parameters
maximum_hits_per_page (int) – number of papers to get
cursor_mark (string) – cursor mark
query (string) – query
synonym (bool) – whether to get synonyms, defaults to True
- Returns
metadata dictionary
- Return type
dict
- static buildquery(cursormark, page_size, query, synonym=True)
Builds query parameters
- static create_parameters_for_paper_download()
Creates parameters for paper download
- Returns
parameters for paper download tuple
- Return type
tuple
- get_supplementary_metadata(metadata_dictionary_with_all_papers, getpdf=False, makecsv=False, makehtml=False, makexml=False, references=False, citations=False, supplementary_files=False, zip_files=False)
Gets supplementary metadata
- Parameters
metadata_dictionary_with_all_papers (dict) – metadata dictionary
getpdf (bool, optional) – whether to get pdfs
makecsv (bool, optional) – whether to create csv output
makehtml (bool, optional) – whether to create html output
makexml (bool, optional) – whether to download xml fulltext
references (bool, optional) – whether to download references
citations (bool, optional) – whether to download citations
supplementary_files (bool, optional) – whether to download supplementary_files
zip_files (bool, optional) – whether to download zip_files from the ftp endpoint
- get_urls_to_write_to(identifier_for_paper)
Gets urls to write the metadata to
- Parameters
identifier_for_paper (str) – identifier for paper
- Returns
urls to write the metadata to
- Return type
tuple
- make_html_from_dict(dict_to_write_html_from, url, identifier_for_paper)
Makes html from dict
- Parameters
dict_to_write_html_from (dict) – dict to write html from
url (str) – url to write html to
- noexecute(query_namespace)
Takes in the query_namespace object as the parameter and runs the query search for given search parameters but only prints the output and not write to disk.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse
- query(query, cutoff_size, synonym=True, cursor_mark='*')
Queries eupmc for given query for given number(cutoff_size) papers
- Parameters
query (string) – query
cutoff_size (int) – number of papers to get
synonym (bool, optional) – whether to get synonyms, defaults to True
- Returns
list containg the papers
- Return type
list
- restart(query_namespace)
Restarts query to add new metadata for existing papers
- Parameters
query_namespace (dict) – pygetpaper’s name space object
- run_eupmc_query_and_get_metadata(query, cutoff_size, update=None, onlymakejson=False, getpdf=False, makehtml=False, makecsv=False, makexml=False, references=False, citations=False, supplementary_files=False, synonym=True, zip_files=False)
- update(query_namespace)
If there is a previously existing corpus, this function reads in the ‘cursor mark’ from the previous run, increments in, and adds new papers for the given parameters to the existing corpus.
- Parameters
query_namespace (dict) – pygetpaper’s namespace object containing the queries from argparse