An open standard for sharing disease and phenotype information will improve our ability to understand, diagnose, and treat both rare and common diseases. A Phenopacket links detailed phenotype descriptions with disease, patient, and genetic information, enabling clinicians, biologists, and disease and drug researchers to build more complete models of disease. The standard is designed to encourage wide adoption and synergy between the people, organizations and systems that comprise the joint effort to address human disease and biological understanding.
Phenopackets are represented as PXF (Phenotype Exchange Format) files, which may be encoded in JSON or YAML. Each packet associates a list of phenotypic abnormalities with a disease and patient, including details about age, sex, onset, and evidence. PXF uses standard ontologies to ensure interoperability between diverse sources and consumers, to simplify text-mining, and to enable machine reasoning. Software libraries supporting PXF have been written for Java, Python, and Javascript, and the open standard makes it easy to adapt to other languages, systems and applications.
Phenopackets is an evolving standard jointly developed by researchers, clinicians, curators and authors. Journals, model organism databases, medical data repositories and commercial efforts are encouraged to adopt Phenopackets as a way to improve their use and publication of detailed and computable characterizations of disease. This will enable new modes of treatment and drug discovery, including translational research, precision medicine, and automated pipelines revealing knowledge in existing publications and databases.
Using Phenopackets to communicate bioinformation ensures that knowledge is liberated and useable by the existing and nascent computational pipelines, databases and journals. This enables new possibilities for research, diagnosis and treatment. Some of the features of Phenopackets are listed below.
Phenopackets are defined using a protobuf schema that allows implementations to be automatically generated for many languages. The phenopacket-schema build process automatically produces language bindings for Java, Python and C++.
Source code and examples of Phenopackets technology may be found in the following GitHub repositories:
Phenopackets were designed by, and intended to be used by, the diverse community of researchers, data modelers, computer scientists, bioinformaticians, environmental scientists, and clinicians dedicated to maximizing the value of existing and new data.
Details