is  likely  to  be  un-representative  of  most  research 
collaborations.  Though  the  funding  bodies,  such  as 
the  EJP  RD  (European  Joint  Programme  –  Rare 
Diseases) encourage collaborations at distances such 
as these, it may still be the case that local relationships 
still  predominate  precisely  because  of  the  logistical 
challenges  involved  in  transferring data,  processing 
libraries  and  hardware  over  such  large  distances 
(though  the  same  challenges  no  doubt  exist  intra-
region too, for instance between individual European 
countries,  with  national  infrastructures  at  varying 
stages of development). 
This is also a consideration when it comes to the 
regional jurisdictions in terms of  data  sharing laws. 
Whilst  the  security  of  the  data  in  transit  has  been 
addressed in this proposal, there must be agreement 
at a legal level of the usage and privacy laws at each 
endpoint  of  the  network.  Again,  this  is  a  technical 
proposal that attempts to be general in scope – but this 
is  a  specific  consideration  that  would  always  be 
relevant  in  a  network  like  the  one  proposed  here. 
Equivalence with the European GDPR legislation is 
generally considered the gold-standard in this regard, 
and  Australia  is  amongst  the  various  developed 
nations  that  is  pursuing  this  equivalence  nationally 
(Review of the Privacy Act, 2020). 
In terms of generalised re-usability, most – but not 
all  –  aspects  are  covered  in  this  proposal.  It  builds 
upon the idea presented by GA4GH of a generalised 
ML workflow, made accessible by specifying Docker 
execution scripts using the YAML specification. The 
other prominent feature of sharing and repeatability is 
the presentation of all internal data and meta-data in 
JSON-LD  interfaces,  to  make  the  data  accessible 
according  to  the  FAIR  principles.  These 
standardisations  are  untested,  and  it  may  yet  be  the 
case  –  even  once  fully  implemented  –  that  their 
adoption may be limited. Only the test of time and re-
use will prove this. 
It is also the case that providing concrete features 
such  as  commercial  CDN  usage  and  hardware  for 
processing are not easy to generalise. The most that 
can be provided is the “gateway” schema descriptions 
that  allow  these  to  be  integrated  into  a  project  as 
easily  as  possible.  However,  the  apparently 
reasonable cost of  CDN usage, does appear to  be  a 
significant step change  in  the mode  of  operation of 
high-volume data research projects, one that appears 
to be generally un-reported. There may be unforeseen 
barriers  or  consequences  to  the  use  of  these 
commercial offerings that will only become apparent 
as  wider  scale  usage  increases.  However,  this  is  an 
option that will appear to serve the specific needs of 
the Hypox-PD project well in the short- to medium-
term  and  could  perhaps  be  submitted  for 
consideration as a step towards a general “Research 
CDN”. 
7  CONCLUSIONS 
A novel mechanism has been presented in this paper, 
facilitating  the  exchange  of  data,  algorithms  and 
processing  when  developing  a  multi-centre 
clinical/bioinformatics  research  project.  It  has  the 
potential  to  significantly  improve  the  feasibility  of 
transfer  of  data,  analysis  and  results  between 
geographically  disparate  partner  nodes,  but  has 
potential  limitations  of  budget  (with  the  content 
delivery  network),  complex  orchestration  and 
synchronisation.  As  the  Hypox-PD  project 
progresses, the development of this infrastructure will 
continue to be reported as an outcome of the research, 
additional  to the clinical  and  bioinformatics outputs 
of metabolite identification. 
ACKNOWLEDGEMENTS 
The  members  of  the  Hypox-PD  consortium 
acknowledge  the  support  obtained  during  the 
development of  the proposal for the European Joint 
Program  Rare  Diseases  (EJP-RD)  through  the 
European  Advanced  Translational  Research 
Infrastructure in Medicine (EATRIS). 
REFERENCES 
Suetake  H,  Tanjo  T,  Ishii  M  et  al.,  (2022),  Sapporo: A 
workflow execution service that encourages the reuse 
of workflows in various languages in bioinformatics. 
F1000Research, 11:889 
O'Connor  BD,  Yuen  D,  Chung  V,  Duncan  AG,  Liu  XK, 
Patricia  J,  Paten  B,  Stein  L,  Ferretti  V.,  (2017), The 
Dockstore: enabling modular, community-focused 
sharing of Docker-based genomics tools and 
workflows. F1000Res. 2017 Jan 18;6:52. 
Yuen  D.,  et  al,  (2021),  The Dockstore: enhancing a 
community platform for sharing reproducible and 
accessible computational protocols,  Nucleic  Acids 
Research,  Volume  49,  Issue  W1,  2  July  2021,  Pages 
W624–W632 
Rehm L. H., et al., (2021), GA4GH: International policies 
and standards for data sharing across genomic 
research and healthcare,  Cell  Genomics,  Volume  1, 
Issue 2, 2021, 100029, ISSN 2666-979X