RESOURCES
Training material
Coming soon…
Presentations
Coming soon…
Minutes Consortium meetings
Minutes Executive Board meetings
Frequenty Asked Questions
General
On our website (to be developed), we will have dedicated space for the Open Community Forum and for anyone to make suggestions or requests.
The Data Portal services will span all pathogens determined to be of interest to NIH-NIAID, gradually also capturing host, intermediate host and vector species of relevance to the listed pathogen species and groups.
The PDN comprises a large consortium with wide expertise. We however welcome suggestions and expertise from others, as part of the Open Community Forum and through the Community of Practice where we will reach out to domain experts. PDN will also liaise with other initiatives relevant to this topic (e.g. PHA4GE, GMI, ELIXIR…).
Training and Events
We will have a dedicated website to update regularly on the BRC releases and events. We will also make use of social media (e.g. LinkedIn, X). The Open Community Forum will be accessible from the website.
All the training material will be freely accessible and comply with FAIR principles.
FAIRness of data and resources
The Pathogens Portal hosted at EMBL-EBI will let investigators access and search the data. Users may blast the data trough the ENA portal.
Yes, the analysis pipelines will be made publicly-available on code repositories; in this case users will be required to set up and operate the appropriate workflow management environments for their work.
PDN will cover diverse biodata types, including host and pathogen genomics, transcriptomics, proteins, pathways and networks, imaging and cohorts.
The data should be submitted to INSDC. The Pathogens PortalRefers to the Pathogens Portal, a resource hosted and developed by the EMBL-EBI that enables researchers, clinicians and policymakers access the latest and most comprehensive datasets on pathogens. Link to the resource: https://www.pathogensportal.org/ offers the possibility to request a private data hub to host non-sensitive data, and share them privately within a group of collaborators for a limited time (up to 2 years in the first instance, but extensible). After this time, data are expected to be released fully openly in accordance with best open data practice.
Data will be routed into PDN through data submission tools and services appropriate for the data type concerned. This routing will assure that data type-appropriate validation, standards compliance, curation and integration are applied to data available through PDN services. The Portal will not host data directly, rather will provide a fully synchronous view on the underlying data resources upon whose data it serves. The core data resource, ENA, is refreshed on a daily basis; as soon as data have been through submission services, they are released into ENA and indexed on a daily basis, and hence become available for indexing in the Portal.
The current bulk download system is described here. This will be evolved throughout the project in response to user requirements. Several pipelines will be integrated into the Pathogen Analysis System (PAS). For the data types and species covered by the PAS, pre-computed datasets will also be made available; for others primary (but validated) data will be presented.
Bulk downloads will be supported. Scaling of downloads and analytics of many hundreds of thousands of datasets, or indeed entire collections will follow what was implemented on the Covid-19 Data Portal during the pandemic.
All the resources developed within PDN will be free and accessible to investigators (via web interfaces or by providing open source code under permissive licences on open repositories). All data will also be available for download, subject to data protection restrictions.
No. The appropriate routing for this information is via the owners of the genome records, who can consider incoming experimental data for inclusion into their annotations; we therefore recommend in these cases that experimentalists work with data owners to drive these annotation updates.
We will provide and support the submission tools and services that ENA provides. There will not be PDN-specific tools, but the development cycles for the existing tools will be informed by what we hear and learn from their use by the PDN data provide community. In addition, PDN includes a Community of Practice where knowledge sharing on data brokering will be a central element, providing further streamlined options for those providing their data into PDN.
Analyses
The Pathogens Portal hosted at EMBL-EBI will let investigators access and search the data. Users may blast the data trough the ENA portal.
The PDN Community of Practice and Open Community Forum will be starting places to get help. The Pathogen Analysis System will also pre-compute datasets based on standard pipelines.
The current bulk download system is described here. This will be evolved throughout the project in response to user requirements. Several pipelines will be integrated into the Pathogen Analysis System (PAS). For the data types and species covered by the PAS, pre-computed datasets will also be made available; for others primary (but validated) data will be presented.
Bulk downloads will be supported. Scaling of downloads and analytics of many hundreds of thousands of datasets, or indeed entire collections will follow what was implemented on the Covid-19 Data Portal during the pandemic.
The tools integrated into the Pathogen Analysis System will run automatically for the matching data types and species. A standardised workflow template based on DSL2 Nextflow will also be available for users of local data hubs to run these workflows on their data. We consider providing a workflow repository to deposit analysis pipelines, share and collaborate, but we do not anticipate command-line or click and drag functions directly on arbitrary (user-selected) data sets in the Portal.