RESOURCES

Pathogens Portal

The Pathogens Portal, launched in July 2023, is an invaluable resource for researchers, clinicians, and policymakers who need access to the latest and most comprehensive datasets on pathogens. The portal is a collaborative effort between the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) and partners. The Pathogens Portal is based on the European COVID-19 Data Portal. A manuscript describing the European COVID-19 Data Portal was published in 2021: https://europepmc.org/article/MED/34048576 .

Pathogens Portal Nodes

Pathogens Portal Nodes are locally-deployed resources that:

  • Provide a one-stop portal for local pathogen data and projects.
  • Foster collaborations and synergies by creating a forum for discussion.
  • Provide documentation and training material on FAIR research data management, especially for pathogen omics data.
  • Showcase local resources that contribute to pathogen data sharing, access, analysis, and interpretation.
  • Offer content also in the local language(s).

Pathogen Data Platform Capacity Framework

Starting from work done within the ELIXIR-CONVERGE and BY-COVID projects, we will develop a Capacity Framework for Pathogen Data Platforms.

Training material

Coming soon…

Presentations

Coming soon…

Minutes Consortium meetings

Minutes Executive Board meetings

Minutes – 22 Nov 2024 – PDN All-Hands Meeting (2)

Forty-two PDN members, representing all 12 institutions, were present at the kick-off meeting on 20th September 2024.

Minutes – 20 Sept 2024 – PDN All-Hands Meeting (kick-off)

Forty-two PDN members, representing all 12 institutions, were present at the kick-off meeting on 20th September 2024.

Minutes – 20 December 2024 – PDN Executive Board meeting

The PDN Executive Board gathered to discuss and make decisions on several strategic priorities, including the renaming of portals, highlighting impactful activities, refining the data-sharing model, and reviewing budget allocations. Below is a summary of the key...

Minutes – 25 October 2024 – PDN Executive Board meeting

The PDN Executive Board meeting took place virtually on 25th October with 12 PIs/Key Personnel, covering updates, decisions, and planning initiatives across several areas: Restructuring into Workstreams: As presented and agreed during the PDN All Hands kick-off...

Frequenty Asked Questions

General

On our website (to be developed), we will have dedicated space for the Open Community Forum and for anyone to make suggestions or requests.

The Data Portal services will span all pathogens determined to be of interest to NIH-NIAID, gradually also capturing host, intermediate host and vector species of relevance to the listed pathogen species and groups.

The PDN comprises a large consortium with wide expertise. We however welcome suggestions and expertise from others, as part of the Open Community Forum and through the Community of Practice where we will reach out to domain experts. PDN will also liaise with other initiatives relevant to this topic (e.g. PHA4GE, GMI, ELIXIR…).

Training and Events

We will have a dedicated website to update regularly on the BRC releases and events. We will also make use of social media (e.g. LinkedIn, X). The Open Community Forum will be accessible from the website.

All the training material will be freely accessible and comply with FAIR principles.

FAIRness of data and resources

The Pathogens Portal hosted at EMBL-EBI will let investigators access and search the data. Users may blast the data trough the ENA portal.

Yes, the analysis pipelines will be made publicly-available on code repositories; in this case users will be required to set up and operate the appropriate workflow management environments for their work.

PDN will cover diverse biodata types, including host and pathogen genomics, transcriptomics, proteins, pathways and networks, imaging and cohorts.

The data should be submitted to INSDC. The Pathogens Portal offers the possibility to request a private data hub to host non-sensitive data, and share them privately within a group of collaborators for a limited time (up to 2 years in the first instance, but extensible). After this time, data are expected to be released fully openly in accordance with best open data practice.

Data will be routed into PDN through data submission tools and services appropriate for the data type concerned. This routing will assure that data type-appropriate validation, standards compliance, curation and integration are applied to data available through PDN services. The Portal will not host data directly, rather will provide a fully synchronous view on the underlying data resources upon whose data it serves. The core data resource, ENA, is refreshed on a daily basis; as soon as data have been through submission services, they are released into ENA and indexed on a daily basis, and hence become available for indexing in the Portal.

The current bulk download system is described here. This will be evolved throughout the project in response to user requirements. Several pipelines will be integrated into the Pathogen Analysis System (PAS). For the data types and species covered by the PAS, pre-computed datasets will also be made available; for others primary (but validated) data will be presented.

Bulk downloads will be supported. Scaling of downloads and analytics of many hundreds of thousands of datasets, or indeed entire collections will follow what was implemented on the Covid-19 Data Portal during the pandemic.

All the resources developed within PDN will be free and accessible to investigators (via web interfaces or by providing open source code under permissive licences on open repositories). All data will also be available for download, subject to data protection restrictions.

No. The appropriate routing for this information is via the owners of the genome records, who can consider incoming experimental data for inclusion into their annotations; we therefore recommend in these cases that experimentalists work with data owners to drive these annotation updates.

We will provide and support the submission tools and services that ENA provides. There will not be PDN-specific tools, but the development cycles for the existing tools will be informed by what we hear and learn from their use by the PDN data provide community. In addition, PDN includes a Community of Practice where knowledge sharing on data brokering will be a central element, providing further streamlined options for those providing their data into PDN.

Analyses

The Pathogens Portal hosted at EMBL-EBI will let investigators access and search the data. Users may blast the data trough the ENA portal.

The PDN Community of Practice and Open Community Forum will be starting places to get help. The Pathogen Analysis System will also pre-compute datasets based on standard pipelines.

The current bulk download system is described here. This will be evolved throughout the project in response to user requirements. Several pipelines will be integrated into the Pathogen Analysis System (PAS). For the data types and species covered by the PAS, pre-computed datasets will also be made available; for others primary (but validated) data will be presented.

Bulk downloads will be supported. Scaling of downloads and analytics of many hundreds of thousands of datasets, or indeed entire collections will follow what was implemented on the Covid-19 Data Portal during the pandemic.

The tools integrated into the Pathogen Analysis System will run automatically for the matching data types and species. A standardised workflow template based on DSL2 Nextflow will also be available for users of local data hubs to run these workflows on their data. We consider providing a workflow repository to deposit analysis pipelines, share and collaborate, but we do not anticipate command-line or click and drag functions directly on arbitrary (user-selected) data sets in the Portal.