Usage

Download and process ORCID in bulk.

class Record(*, orcid: Annotated[str, _PydanticGeneralMetadata(pattern='^\\d{4}-\\d{4}-\\d{4}-\\d{3}(\\d|X)$')], name: str, homepage: str | None = None, locale: str | None = None, countries: list[CountryAlpha2] = None, aliases: list[str] = None, xrefs: dict[str, str] = None, works: list[Work] = None, employments: list[Affiliation] = None, educations: list[Affiliation] = None, memberships: list[Affiliation] = None, emails: list[str] = None, keywords: list[str] = None, commons_image: str | None = None)[source]

A model representing a person.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

property commons_image_url: str | None

Get the Wikimedia Commons image URL, if available.

is_high_quality() bool[source]

Return if the record is high quality.

property email: str | None

Get the first email, if available.

property country: str | None

Get the first country, if available.

property github: str | None

Get the researcher’s GitHub username, if available.

property linkedin: str | None

Get the researcher’s LinkedIn username, if available.

property loop: str | None

Get the researcher’s Loop identifier, if available.

property wos: str | None

Get the researcher’s Web of Science identifier, if available.

property dblp: str | None

Get the researcher’s DBLP identifier, if available.

property scopus: str | None

Get the researcher’s Scopus identifier, if available.

property google: str | None

Get the researcher’s Google Scholar identifier, if available.

property wikidata: str | None

Get the researcher’s Wikidata identifier, if available.

property mastodon: str | None

Get the researcher’s Mastodon handle, if available.

property current_affiliation_ror: str | None

Guess the current affiliation and return its ROR identifier, if available.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'aliases': FieldInfo(annotation=list[str], required=False, default_factory=list), 'commons_image': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'countries': FieldInfo(annotation=list[CountryAlpha2], required=False, default_factory=list, description='The ISO 3166-1 alpha-2 country codes (uppercase)'), 'educations': FieldInfo(annotation=list[Affiliation], required=False, default_factory=list), 'emails': FieldInfo(annotation=list[str], required=False, default_factory=list), 'employments': FieldInfo(annotation=list[Affiliation], required=False, default_factory=list), 'homepage': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'keywords': FieldInfo(annotation=list[str], required=False, default_factory=list), 'locale': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'memberships': FieldInfo(annotation=list[Affiliation], required=False, default_factory=list), 'name': FieldInfo(annotation=str, required=True), 'orcid': FieldInfo(annotation=str, required=True, title='Open Researcher and Contributor', description='<p>This field corresponds to a local unique identifier from <i>Open Researcher and Contributor</i></a>.\n</p><h4>Provenance</h4><p>The semantics of this field are derived from the\n<a href="https://bioregistry.io/orcid"><code>orcid</code></a> entry in\nthe <a href="https://bioregistry.io">Bioregistry</a>: a registry of semantic web and linked\nopen data compact URI (CURIE) prefixes and URI prefixes.\n</p><h4>Description of Semantic Space</h4>ORCID (Open Researcher and Contributor ID) is an open, non-profit, community-based effort to create and maintain a registry of unique identifiers for individual researchers. ORCID records hold non-sensitive information such as name, email, organization name, and research activities.', json_schema_extra={'bioregistry': {'prefix': 'orcid', 'mappings': {'bartoc': '2021', 'biocontext': 'ORCID', 'biolink': 'ORCID', 'fairsharing': 'FAIRsharing.nx58jg', 'go': 'orcid', 'miriam': 'orcid', 'n2t': 'orcid', 'wikidata': 'P496'}}, 'example': '0000-0003-4423-4370'}, metadata=[_PydanticGeneralMetadata(pattern='^\\d{4}-\\d{4}-\\d{4}-\\d{3}(\\d|X)$')]), 'works': FieldInfo(annotation=list[Work], required=False, default_factory=list), 'xrefs': FieldInfo(annotation=dict[str, str], required=False, default_factory=dict, title='Database Cross-references')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

ensure_summaries() Path[source]

Ensure the ORCID summaries file (32+ GB) is downloaded.

get_records(*, force: bool = False) dict[str, Record][source]

Parse ORCID summary XML files, takes about an hour.

ground_researcher(name: str) list[gilda.ScoredMatch][source]

Ground a name based on ORCID names/aliases.

ground_researcher_unambiguous(name: str) str | None[source]

Ground a name based on ORCID names/aliases.

iter_records(*, force: bool = False, records_path: Path | None = None, desc: str = 'Loading ORCID') Iterable[Record][source]

Parse ORCID summary XML files, takes about an hour.