SC14 BOF Community Input
The following page is from the SuperComputing 2014 Conference Birds Of Feather for "Genomics Sequencing".
The following question was posed, the audience was polled in real-time, and the answers are below:

What do you think are the biggest computational or data challenges facing genomic sequencing? Tell us now via this webpage and/or in person at SC14 where we will have an open discussion with scientists and infrastructure specialists like you. In this BOF, three experts will answer the questions with the most votes below, followed by audience discussion. Our goal is to share best practices, and multiply our efforts towards longer-term challenges.

Question Total Vote!
POSIX semantics become brittle/unwieldy for large scale NGS. How to get our scientists & pipeline devs to embrace object & API access? 15 +1
6 answers
  • data virtualization
  • "fake" filesystems that abstract real filesystems below
  • GPFS
  • Irods
  • Nirvana
  • Irods
How do we encourage development of node-level parallelism in order to effectively utilize manycore technology? 12 +1
1 answers
  • Foster closer interaction between bioinformaticians who know their data and the science, with HPC-specific software developers, who know parallel operations and can write robust parallel code
How are data providence, security, and latency solved in cloud-based genomics data workflows? 11 +1
2 answers
  • p2p data distribution network
  • Nirvana
What are some strategies to improve performance for the vast number of tiny files created by the pipeline? 11 +1
1 answers
  • re-engineer software to use a more sensible on-disk structure; e.g. a database
Traditional pipeline workflow can overwhelm a resource manager with job status requests. How can we introduce a lightweight mechanism for job management? 9 +1
0 answers
What technologies will move NGS software development away from modular, rapid prototyping and into HPC-style optimization? 9 +1
2 answers
  • is this shift really a good idea / necessary to achieve performance?
  • education and close support from software engineers
How can we solve the problem of the expectation of infinite runtimes for Bio codes? 5 +1
1 answers
  • Develop predictive failure methods to be included in user codes
Why is it important to increase the efficiency and speed of NGS pipelines? What science or medicine is enabled by single-sample speed? 5 +1
0 answers
How do we get advanced computational tools into the hands of the "general" or "non-computational" researcher? 3 +1
0 answers
Can we adapt time-tested data management practices from the 'hard' sciences and engineering to NGS data management? 2 +1
0 answers