The vision of PSDI

 

The data needs for research are growing at previously unimaginable rates and the need for collaboration around data has never been clearer. Data cannot be considered as simply an output of research, as it is itself a driver of further discovery. Experiments, observations, computations and simulations all generate data and data flows form the very fabric of research in and across the physical sciences. But, for the most part, each physical science research infrastructure, from laboratory to large facility, has essentially its own data infrastructure.

Whilst centralised, data-centric infrastructures for collecting and reusing data can act as community hubs and drive new methods and discoveries, the current diversity of data infrastructures enables each platform to be tailored to the specific needs of its field. PSDI, therefore aims to provide an additional layer of infrastructure that enables sharing of existing resources whilst ensuring that each can remain dedicated to its specific application.

 

PSDI will form a socio-technical data infrastructure that connects many of these systems across existing experimental and computational facilities. It aims to:

  • Support multiscale modelling and multimodal research
  • Leverage simulation data to drive experimental science and vice versa
  • Surface data from many sources
  • Provide reference-quality data
  • Standardise, normalise and aggregate data and metadata
  • Enable data to be exploited by AI methods
  • Support workflows that automate data processing
  • Provide a common platform to run models and codes from different sources
  • Seamlessly access performance compute for scaling up
  • Enable software curation and publication
  • Be a place for curation of legacy beyond individual projects

Statement of Need

The PSDI project was initially discussed as part of the EPSRC Large Infrastructure Investments Statement of Need (SoN) call, which was conducted in late 2020 – early 2021. During this SoN exercise a project team from STFC and the University of Southampton developed the outline plan for the PSDI. This included commentary on the ambition of the project and the strategic importance of investment in infrastructure for the physical sciences. This SoN exercise was well supported across the physical sciences community. Contributions and backing from a wide range of projects and initiatives demonstrated a community need and support for such an initiative.

The full text for our large infrastructure SoN can be found linked below:

The Statement of Need confirmed a widespread consensus in the community that investment in research data infrastructure is lagging behind investment in data sources and identifying an urgent need for integration of data and computational infrastructures. It identified four ‘pillars’ of user communities that would benefit from the proposed PSDI.

Pillar 1. Facilities, Institutes and Hubs – significant centralised national facilities and activities that serve a large number of researchers based on a common need.

Pillar 2. National Research Facilities – medium-scale centralised facilities operating at a world leading level to perform research that cannot be addressed in a standard laboratory.

Pillar 3. Computational Initiatives – uniting performing simulations with the communities and tools required to do so.

Pillar 4. Research Institutions, research groups and laboratories – community of institutions

Community Pillars identified in SoN – non exhaustive examples of initiatives within that pillar

Pilot Phase

Following on from the SoN application, the PSDI team were requested by EPSRC to complete a proposal for a short phase pilot project. This pilot project was funded through the UKRI Digital Research Infrastructure (DRI) programme. The PSDI as a whole is a large undertaking, involving a wide range of stakeholders both within our proposed pillars and in the wider community including data managers, data providers, system architects and many more roles that are crucial to underpinning our national data landscape. This pilot study was intended to expand on the ambitions of the project from the SoN, and undertook a wide range of community consultation on the scope and requirements of the PSDI. It developed a roadmap for future investment in PSDI.

More information about the structure of the pilot project can be found on the pilot page and the individual workpackage pages. The pilot activities ran as a short intense phase and this portion of the work was completed on the 31st March 2022. Both the Pilot report and the Recommendations from the pilot phase can be found on the website.

Phase 1

Phase 1 of PSDI ran from October 2022 to September 2023 and built upon the scoping work carried out in the Pilot Phase and the recommendations and outputs that were generated through that work. This 12-month phase focused on activities towards the establishment of PSDI whilst continuing community engagement activities. The phase was split into 5 work packages focusing on Management and Governance, Communications and Training, Platform Development and Operation, First Pathfinders and Future Pathfinders.

Phase 2

Phase 2 of PSDI, launched in January 2024, focuses on finalising the initial 30-month development to deliver PSDI version 1 by March 2025. Following the launch, PSDI will operate dual-mode, maintaining version 1 services while developing additional capabilities. The work is structured across seven work packages, including Gateway development, secure infrastructure management, data services, content integration, governance, project management, and community building.

Phase 2 also introduces eight pathfinder projects in areas like experimental data capture, computational workflows, and advanced materials processing. These projects aim to create exemplar systems for future PSDI expansions, supporting diverse data types, techniques, and user communities in the physical sciences.

Loading...