Phenopackets - Concepts and Technology

Phenopackets Are …

A Vision

An open standard for sharing disease and phenotype information will improve our ability to understand, diagnose, and treat both rare and common diseases. A Phenopacket links detailed phenotype descriptions with disease, patient, and genetic information, enabling clinicians, biologists, and disease and drug researchers to build more complete models of disease. The standard is designed to encourage wide adoption and synergy between the people, organizations and systems that comprise the joint effort to address human disease and biological understanding.

A Technology

Phenopackets are represented as PXF (Phenotype Exchange Format) files, which may be encoded in JSON or YAML. Each packet associates a list of phenotypic abnormalities with a disease and patient, including details about age, sex, onset, and evidence. PXF uses standard ontologies to ensure interoperability between diverse sources and consumers, to simplify text-mining, and to enable machine reasoning. Software libraries supporting PXF have been written for Java, Python, and Javascript, and the open standard makes it easy to adapt to other languages, systems and applications.

A Community

Phenopackets is an evolving standard jointly developed by researchers, clinicians, curators and authors. Journals, model organism databases, medical data repositories and commercial efforts are encouraged to adopt Phenopackets as a way to improve their use and publication of detailed and computable characterizations of disease. This will enable new modes of treatment and drug discovery, including translational research, precision medicine, and automated pipelines revealing knowledge in existing publications and databases.

The Vision of Phenopackets

Using Phenopackets to communicate bioinformation ensures that knowledge is liberated and useable by the existing and nascent computational pipelines, databases and journals. This enables new possibilities for research, diagnosis and treatment. Some of the features of Phenopackets are listed below.

Searchability, Comparability
Accessible outside paywalls and private data sources
Attributable and Citeable
Interoperable and Computable
Exchangeable across contexts and disciplines

Details

The Technology of Phenopackets

Phenopackets are defined using a protobuf schema that allows implementations to be automatically generated for many languages. The phenopacket-schema build process automatically produces language bindings for Java, Python and C++.

Source code and examples of Phenopackets technology may be found in the following GitHub repositories:

Phenopackets organization

Phenopacket schema

Phenopacket schema documentation

The Community of Phenopackets

Phenopackets were designed by, and intended to be used by, the diverse community of researchers, data modelers, computer scientists, bioinformaticians, environmental scientists, and clinicians dedicated to maximizing the value of existing and new data.

Details

Documentation and References

Phenopackets organization

Phenopacket schema

Phenopacket schema documentation

Processing phenotype data using Phenopackets-API and PXFTools
BOSC 2016 Slides Video
Phenopackets: Making phenotype profiles FAIR++ for disease diagnosis and discovery
Force2016 Slides

Details