scholarly module

scholarly.py

class scholarly._scholarly._Scholarly[source]

Bases: object

Class that manages the API for scholarly

static _bin_citations_by_year(cites_per_year: Dict[int, int], year_end)[source]
_citedby_long(object: scholarly.data_types.Publication, years)[source]
_construct_url(baseurl: str, patents: bool = True, citations: bool = True, year_low: int = None, year_high: int = None, sort_by: str = 'relevance', include_last_year: str = 'abstracts', start_index: int = 0) → str[source]

Construct URL from requested parameters.

bibtex(object: scholarly.data_types.Publication) → str[source]

Returns a bibtex entry for a publication that has either Scholar source or citation source

Parameters:object (Publication) – The Publication object for the bibtex exportation
citedby(object: scholarly.data_types.Publication) → scholarly.publication_parser._SearchScholarIterator[source]

Searches Google Scholar for other articles that cite this Publication and returns a Publication generator.

Parameters:object (Publication) – The Publication object for the bibtex exportation
download_mandates_csv(filename: str, overwrite: bool = False, include_links: bool = True)[source]

Download the CSV file of the current mandates.

fill(object: dict, sections=[], sortby: str = 'citedby', publication_limit: int = 0) → scholarly.data_types.Author[source]

Fills the object according to its type. If the container type is Author it will fill the additional author fields If it is Publication it will fill it accordingly.

Parameters:
  • object (Author or Publication) – the Author or Publication object that needs to get filled
  • sections (list) – the sections that the user wants filled for an Author object. This can be: [‘basics’, ‘indices’, ‘counts’, ‘coauthors’, ‘publications’, ‘public_access’]
  • sortby (string) – if the object is an author, select the order of the citations in the author page. Either by ‘citedby’ or ‘year’. Defaults to ‘citedby’.
  • publication_limit (int) – if the object is an author, select the max number of publications you want you want to fill for the author. Defaults to no limit.

Note: For Author objects, if ‘public_access’ is filled prior to ‘publications’, only the total counts from the Public Access section of the author’s profile page is filled. If ‘public_access’ is filled along with ‘publications’ or afterwards for the first time, the publication entries are also marked whether they satisfy public access mandates or not.

get_journal_categories()[source]

Get a dict of journal categories and subcategories.

get_journals(category='English', subcategory=None, include_comments: bool = False) → Dict[int, scholarly.data_types.Journal][source]

Search google scholar for related articles to a specific publication.

Parameters:object (Publication) – Publication object used to get the related articles
journal_categories
pprint(object: scholarly.data_types.Author) → None[source]

Pretty print an Author or Publication container object

Parameters:object (Author or Publication) – Publication or Author container object
save_journals_csv(filename, category='English', subcategory=None, include_comments=False)[source]

Save a list of journals to a file in CSV format.

search_author(name: str)[source]

Search by author name and return a generator of Author objects

Example::
search_query = scholarly.search_author('Marty Banks, Berkeley')
scholarly.pprint(next(search_query))
Output::
{'affiliation': 'Professor of Vision Science, UC Berkeley',
 'citedby': 21074,
 'email_domain': '@berkeley.edu',
 'filled': False,
 'interests': ['vision science', 'psychology', 'human factors', 'neuroscience'],
 'name': 'Martin Banks',
 'scholar_id': 'Smr99uEAAAAJ',
 'source': 'SEARCH_AUTHOR_SNIPPETS',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=Smr99uEAAAAJ'}
search_author_by_organization(organization_id: int)[source]

Search for authors in an organization and return a generator of Authors

organization_id can be found from the organization name using search_org. Alternatively, they can be found in the Author object.

The returned authors are typically in the decreasing order of total citations. The authors must have a verified email address and set their affiliation appropriately to appear on this list.

Parameters:organization_id (integer) – unique integer id for each organization
search_author_custom_url(url: str) → scholarly.data_types.Author[source]

Search by custom URL and return a generator of Author objects URL should be of the form ‘/citation?q=…’

Parameters:url (string) – url for the custom author url
search_author_id(id: str, filled: bool = False, sortby: str = 'citedby', publication_limit: int = 0) → scholarly.data_types.Author[source]

Search by author id and return a single Author object :param sortby: select the order of the citations in the author page. Either by ‘citedby’ or ‘year’. Defaults to ‘citedby’. :type sortby: string :param publication_limit: if the object is an author, select the max number of publications you want you want to fill for the author. Defaults to no limit. :type publication_limit: int

Example::
search_query = scholarly.search_author_id('EmD_lTEAAAAJ')
scholarly.pprint(search_query)
Output::
{'affiliation': 'Institut du radium, University of Paris',
 'citedby': 2208,
 'filled': False,
 'interests': [],
 'name': 'Marie Skłodowska-Curie',
 'scholar_id': 'EmD_lTEAAAAJ',
 'source': 'AUTHOR_PROFILE_PAGE',
 'url_picture': 'https://scholar.googleusercontent.com/citations?view_op=view_photo&user=EmD_lTEAAAAJ&citpid=3'}
search_citedby(publication_id: Union[int, str], **kwargs)[source]

Searches by Google Scholar publication id and returns a generator of Publication objects.

Parameters:publication_id (int or str) – Google Scholar publication id

For the remaining parameters, see documentation of search_pubs.

search_keyword(keyword: str)[source]

Search by keyword and return a generator of Author objects

Parameters:keyword (str) – keyword to be searched
Example::
search_query = scholarly.search_keyword('Haptics')
scholarly.pprint(next(search_query))
Output::
{'affiliation': 'Postdoctoral research assistant, University of Bremen',
 'citedby': 56666,
 'email_domain': '@collision-detection.com',
 'filled': False,
 'interests': ['Computer Graphics',
               'Collision Detection',
               'Haptics',
               'Geometric Data Structures'],
 'name': 'Rene Weller',
 'scholar_id': 'lHrs3Y4AAAAJ',
 'source': 'SEARCH_AUTHOR_SNIPPETS',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=lHrs3Y4AAAAJ'}
search_keywords(keywords: List[str])[source]

Search by keywords and return a generator of Author objects

Parameters:keywords – a list of keywords to be searched
Example::
search_query = scholarly.search_keywords(['crowdsourcing', 'privacy'])
scholarly.pprint(next(search_query))
Output::

search_org(name: str, fromauthor: bool = False) → list[source]

Search by organization name and return a list of possible disambiguations

Example::

Output::

search_pubs(query: str, patents: bool = True, citations: bool = True, year_low: int = None, year_high: int = None, sort_by: str = 'relevance', include_last_year: str = 'abstracts', start_index: int = 0) → scholarly.publication_parser._SearchScholarIterator[source]

Searches by query and returns a generator of Publication objects

Parameters:
  • query (str) – terms to be searched
  • patents (bool, optional) – Whether or not to include patents, defaults to True
  • citations (bool, optional) – Whether or not to include citations, defaults to True
  • year_low (int, optional) – minimum year of publication, defaults to None
  • year_high (int, optional) – maximum year of publication, defaults to None
  • sort_by (string, optional) – ‘relevance’ or ‘date’, defaults to ‘relevance’
  • include_last_year (string, optional) – ‘abstracts’ or ‘everything’, defaults to ‘abstracts’ and only applies if ‘sort_by’ is ‘date’
  • start_index (int, optional) – starting index of list of publications, defaults to 0
Returns:

Generator of Publication objects

Return type:

Iterator[Publication]

Example::
search_query = scholarly.search_pubs('Perception of physical stability and center of mass of 3D objects')
scholarly.pprint(next(search_query)) # in order to pretty print the result
Output::
{'author_id': ['4bahYMkAAAAJ', 'ruUKktgAAAAJ', ''],
 'bib': {'abstract': 'Humans can judge from vision alone whether an object is '
                     'physically stable or not. Such judgments allow observers '
                     'to predict the physical behavior of objects, and hence '
                     'to guide their motor actions. We investigated the visual '
                     'estimation of physical stability of 3-D objects (shown '
                     'in stereoscopically viewed rendered scenes) and how it '
                     'relates to visual estimates of their center of mass '
                     '(COM). In Experiment 1, observers viewed an object near '
                     'the edge of a table and adjusted its tilt to the '
                     'perceived critical angle, ie, the tilt angle at which '
                     'the object',
         'author': ['SA Cholewiak', 'RW Fleming', 'M Singh'],
         'pub_year': '2015',
         'title': 'Perception of physical stability and center of mass of 3-D '
                  'objects',
         'venue': 'Journal of vision'},
 'citedby_url': '/scholar?cites=15736880631888070187&as_sdt=5,33&sciodt=0,33&hl=en',
 'eprint_url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
 'filled': False,
 'gsrank': 1,
 'num_citations': 23,
 'pub_url': 'https://jov.arvojournals.org/article.aspx?articleID=2213254',
 'source': 'PUBLICATION_SEARCH_SNIPPET',
 'url_add_sclib': '/citations?hl=en&xsrf=&continue=/scholar%3Fq%3DPerception%2Bof%2Bphysical%2Bstability%2Band%2Bcenter%2Bof%2Bmass%2Bof%2B3D%2Bobjects%26hl%3Den%26as_sdt%3D0,33&citilm=1&json=&update_op=library_add&info=K8ZpoI6hZNoJ&ei=QhqhX66wKoyNy9YPociEuA0',
 'url_scholarbib': '/scholar?q=info:K8ZpoI6hZNoJ:scholar.google.com/&output=cite&scirp=0&hl=en'}
search_pubs_custom_url(url: str) → scholarly.publication_parser._SearchScholarIterator[source]

Search by custom URL and return a generator of Publication objects URL should be of the form ‘/scholar?q=…’

A typical use case is to generate the URL by first typing in search parameters in the Advanced Search dialog box and then use the URL here to programmatically fetch the results.

Parameters:url (string) – custom url to seach for the publication
search_single_pub(pub_title: str, filled: bool = False) → scholarly.publication_parser.PublicationParser[source]

Search by scholar query and return a single Publication container object

Parameters:
  • pub_title (string) – Title of the publication to search
  • filled (bool) – Whether the application should be filled with additional information
set_logger(enable: bool)[source]

Enable or disable the logger for google scholar. Enabled by default

set_retries(num_retries: int) → None[source]

Sets the number of retries in case of errors

Parameters:num_retries (int) – the number of retries
set_timeout(timeout: int)[source]

Set timeout period in seconds for scholarly

use_proxy(proxy_generator: scholarly._proxy_generator.ProxyGenerator, secondary_proxy_generator: scholarly._proxy_generator.ProxyGenerator = None) → None[source]

Select which proxy method to use.

See the available ProxyGenerator methods.

This is used to get some pages that have strong anti-bot prevention. secondary_proxy_generator is used for other pages that do not have a strong anti-bot prevention. If not set, free proxies are used.

Parameters:
  • proxy_generator (ProxyGenerator) – a proxy generator object, typically setup with a premium proxy service (ScraperAPI or Luminati)
  • proxy_generator – a second proxy generator object, optional
Example::
pg = ProxyGenerator()
pg.ScraperAPI(YOUR_SCRAPER_API_KEY)
scholarly.use_proxy(pg)