Skip to end of metadata
Go to start of metadata

HDF5 Developments

Introduction

NeXus is developed as an international standard by scientists and programmers representing major scientific facilities in Europe, Asia, Australia, and North America in order to facilitate greater cooperation in the analysis and visualization of neutron, x-ray, and muon data. A number of our organisations have committed to the NeXus file format, and a number of others like the idea, but have reservations because of some HDF5 limitations - because HDF5 is the most popular file format underpinning the NeXus standards. After a successful collaboration between Dectris, PSI, DESY and The HDF Group last year, we are trying to lead some more development over the next year, with the goal of adapting the HDF5 library to overcome some of these shortcomings, and thereby promote the use of NeXus over HDF5 across all domains.

Last year's developments focussed on compression performance. The Dectris/PSI funded development allows detector developers to compress data outside of the library and just feed the compressed chunks in to be written. The DESY funded development allows the user to provide their own pluggable filters for specialist compression methods.

Developments

This year we are proposing two developments, Single Writer Multiple Reader and Parallel Compressed Writing.

Single Writer Multiple Reader

Single Writer Multiple Reader (abbreviated to SWMR and pronounced "swimmer") allows data analysis programs to read an HDF5 file while it is still being written. As detectors get faster the traditional approach of writing one data file per frame leads to thousands or millions of files - often in a single directory. To mitigate this, data is written in the form of data cubes, with the third axis being a frame number or time axis. However, in the current version of HDF5 files cannot be read while still being written and so data analysis and display must be postponed until a whole series of frames have been acquired. This is unsatisfactory and is being addressed by the SWMR development. A side effect of this development, which in some use cases may be more important, is that the way SWMR is implemented it ensures that the file on disk is always consistent. In existing versions of HDF5 the file consistency is only guaranteed when the file is closed and hence the file can be corrupted if the writing process doesn't terminate cleanly. 

The current draft of a set of requirements and use cases describing this functionality is available here.

At the moment prototypes of SWMR exist and Diamond Light Source is funding The HDF Group to do a design study to determine the amount of effort and cost required to bring the prototype to a working release level.

Update 1 July 2013

At this point The HDF group has completed the design study. You may download the documents and the source from ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/SWMR/.

Update 17 October 2013

The HDF group posted updated design documents. Again look at  ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/SWMR/.

 

HDF5 Virtual Dataset

This project was originally designed to overcome the problem that datasets written using parallel HDF5 could not be compressed, and so was called "Parallel Compressed Writing". In the course of fleshing out the requirements we found that parallel HDF5 had lower ultimate throughput than the same number of normal HDF5 processes, and so we proposed an idea of the writing processes writing multiple files independently which could be read as a single dataset of an upper level file via an extension of the HDF5 dataset definition. The original set of use cases and requirements is described in a draft RFC issues by Diamond to The HDF group in April 2013:

After consideration by The HDF Group, this has evolved into a proposal called "Virtual Dataset", but this document probably contains the clearest statement of requirements available at the moment. The concept of the Virtual DataSet came about when The HDF Group generalised the proposal to defined the linkage as a series of mappings from the upper level "Virtual Dataset" onto physical datasets in the lower level files. This is  a very flexible architecture and has a number of use cases beyond simple parallel compressed writing, and also allows higher ultimate throughput on modern parallel file  systems. The current document defining the requirements and use cases is available: 

Update 1 October 2013

The proposal has now moved to the feasibility stage and Diamond Light Source has commissioned a design study to generate a costed design for the proposal. At this stage we would encourage anyone who wants to add to the RFC to contact us as soon as possible. Additional funding for the implementation phase has been secured from DESY and the Percival detector project.

 

Update 26 May 2015

The VDS development project, contracted to The HDF Group is now well under way. Further information, use-cases and example usage of VDS can be found on The HDF Group confluence pages and website.

Support

Organisations can support these developments via a number of routes:

Support contracts

The first route is a support contract raised separately between each large facility and The HDF group. This would guarantee support for bug fixes, small developments, testing and specific, small, synchrotron based requirements (such as support by the HDF group of specific compression plug-ins, so that users are guaranteed to have access to the plug-ins after a specific HDF release). It would also contribute to what The HDF Group calls General Maintenance Quality Assurance and Support (GMQS). This is the cost of on-going support of the library and in the current HDF Group cost model it is funded by 37.5% of every contract. By generating a separate support revenue stream we hope we can reduce the project development costs. The annual cost is of the order of 10k monetary units (£/€/$ take your pick and probably depending on the size of your organisation) annually. We are discussing the detail of this sort of support with The HDF Group and please contact them for further information.

Project development support

The second route is through supporting targeted project work. The cost of this will be in the order of (£/€/$)100k for a phase of development and will generate a function that will benefit the community as a whole. One organisation would be responsible for funding one phase of development, but we could coordinate development across the community by developing agreed requirements by an RFC process. At present, if you are interested in contributing to a phase of development of either SWMR or Parallel Compressed Writer please contact Nick Rees (Nick.Rees at diamond.ac.uk).

Requirements development support

Of course, we also need active parties to review the requirements documentation. These will be published on this website and any comments are welcome. However, if your organisation has particular needs that need a significant resource to develop, you will also be expected to support the development financially.

 

Recently Updated

Navigate space