Persistent Identifiers 101

Persistent Identifiers • DOI • dPID

Persistent Identifiers

A Persistent Identifier or PID, is a unique identifier for a specific object - much like a driver's license or social security number. These identifiers act as a long-lasting reference to an object. The vast majority of entries into the scientific record are accompanied by a PID, typically a Digital Object Identifier (DOI). A DOI is a PID that resolves to the page of the publisher that contains the specific resource in question.

Brief history of the DOI system

The history of the DOI system is fascinating. Digital object identifiers (DOIs) have emerged as a response to the need for cataloguing and interoperability between scientific publishers. This fulfilled important business priorities:

  • Cataloguing ownership of copyright

  • Metadata for digital distribution management and content repurposing

  • Maintaining control over accessing the data (e.g. through paywalls)

  • Preserve stable URLs when content is transferred following an acquisition to a new IT infrastructure owned by a different entity

These business constraints came with substantial tradeoffs - DOIs are based on a social contract between the maintainer of the registry's lookup table, and the registrant's promise to maintain the URL to their proprietary server. DOIs are not 'persistent' nor securely mapped to their underlying content, and inconsistent resolution is a tremendous obstacle that stands in the way of the goals set by the FAIR principles.

Though one of the best solutions at the time, shortcomings of the DOI system include:

  • Not persistent: content can change, either intentionally or not. There is no versioning schema for DOIs. DOIs need to be crawled for broken links and are expensive to maintain.

  • Fragmented: DOIs lack support for Linked Data - leading to the need to mint a PID for every digital object. This is not efficient and causes fragmentation of our knowledge graphs.

  • Inconsistent resolution: DOIs rarely resolve to their content. This makes machine-readability extremely arduous.

Now

Fast forward twenty years, the DOI is used almost universally and is the de-facto primary key of the scientific record.

In the meantime, the content monetisation strategy of the industry and the needs of primary research content consumers have radically shifted. We live in the era of open access, and the release of the OSTP Nelson memo and Plan S have called into question the business imperatives of gating content access.

Simultaneously, there is an increasing demand for the accessibility of interactive research files such as models, datasets and notebooks. As this demand often comes from analysts interested in reusing the information, access formats that easily facilitate reuse, for example through convenient web apps or via APIs calls integrated into the researcher's computational workflow, are popular.

This has an effect on the incentives of publishers: Why pay for proprietary storage infrastructure when access cannot be monetised? Why incur high maintenance costs on data cataloguing?

Transitioning to digital content has been prohibitively expensive for small publishers, largely driven by labour costs associated with the personalised IT infrastructure to organise content and gate content access.

New requirements on the horizon - such as FAIR data storage, high-quality metadata, machine resolution of PIDs, and cloud computing - are certain to lead to soaring IT costs, which will aggravate the industry's ownership consolidation by favouring large players over small and medium operations. These costs will inevitably be passed onto researchers and funders through rising APC costs.

A new PID system

With these large systemic changes in mind, there is a window of opportunity to fix the primary key of the scientific record and rethink architectural requirements from first principles. DeSci Lab is building PID system which lies at the heart of our vision for a truly Open and Decentralized repository for knowledge, and aims to solve four main challenges:

You can learn more about our PID system and the Open State Repository here.

An upgrade - not a new standard

We can preserve the DOI system as the topmost overlay on this new PID system, essentially augmenting the DOI system without changing it fundamentally. Compatibility is important because we want to prevent the proliferation of standards, preserve familiarity, and lower adoption costs.

Last updated