Lossless Compression
Robust, high performance FASTQ.gz and BAM compression. Reduce the footprint of genomics datasets in FASTQ.gz and BAM by between 60% and 90% while preserving the original files bit-for-bit. We preserve the MD5 checksum on not just the internal raw data in the files, but also on the container files (.gz or BAM representation which is already internally compressed).
Compression is fast, at 290+ MBytes/sec (4-core i7) for FASTQ and only uses 3GB of RAM. Unlike CRAM, all data is fully preserved, and you do not need to specify a reference for compression or decompression - not even for BAM. The species is automatically detected, for simple and optimal compression.
PetaSuite incorporates full validation and MD5 checksums for detecting and handling bit-errors that may occur from corruption events in long-term archival storage, so that it can safely restore the remaining data unaffected by the isolated bit-errors.
Faster Transfer
PetaGene’s improved file compression can cut the time spent waiting for network data transfer by between 60& and 90% – this could transform a 10 hour transfer into a matter of 1 hour, or reduce transfer times from more than a week to less than a day. PetaSuite can therefore drastically reduce the time-to-completion of scientists’ workflows, giving a significant boost to the productivity of the entire organisation.
PetaGene allows a customer to perform streaming compression directly to/from AWS, Azure, GCP or any other local/hybrid cloud.
Streaming compression enables FASTQ.gz or BAM files to be compressed, transferred and decompressed in a streaming fashion. PetaLink can be used to accelerate WAN random access of BAM files such as for interactive Genome Browsers. Smaller files from BayesCal and PetaLink enable faster transfers more generally.
Transparent usage
PetaLink gives access to compressed files in their native format, as BAM or FASTQ.gz at the exact same filename and path (or cloud location) as before on existing storage.
Our compression even preserves access control permissions, extended attributes and timestamps. Your tools and pipelines won’t even know that anything has changed.
PetaGene’s software lets researchers and clinicians continue using their FASTQ and BAM files in their existing tools and pipelines. It integrates into existing storage infrastructures to provide transparent compression and access.
Speeds up Analysis
Transfer times and access times of genomic sequencing data constitute one of the main bottlenecks in sequence data analysis. Almost all genomic analysis occurs on clusters of servers operating on shared storage resources. Since disk I/O and network bandwidth, to and from storage, is limited and shared by all servers, there are bottlenecks when multiple servers attempt to access stored data at the same time.
These I/O bottlenecks effectively slow down overall performance, sometimes to a fraction of their full computational capacity. By compressing the data, systems can achieve faster load times, even if the data needs to be decompressed on the fly. The negligible runtime overhead incurred by the decompression itself is offset by the increase in reading speed from disk. The PetaView command line file access system is lightweight and I/O reductions dominate. Therefore, using PetaView’s on-the-fly random-access client-side decompression can actually speed up your analysis, tools and pipelines, especially in HPC environments.
NGS quality score refinement
Customers can benefit from BayesCal (optional), which is PetaGene’s revolutionary Bayesian approach to NGS quality score refinement for FASTQ and BAM files. It calculates a more complete posterior estimation of sequencer error, which preserves the genotyping accuracy at all points on the ROC curve, and gives a significant net increase in genotyping accuracy. There is no charge if a customer chooses to use BayesCal, which can be specified on a file-by-file basis, but an additional benefit is that it increases the compressibility of the data by a further 30-70%. For example, a GATK BAM file that PetaGene’s software losslessly compresses by a factor of 10.9x, is instead losslessly compressed by a factor of 15.5x if BayesCal is used.
No lock-in
We believe that customers shouldn’t be locked in by software, and for this reason we make all decompression free with a perpetual license, whether using the binary or using our PetaLink access shim.
We encourage customers to distribute any PetaGene-compressed content. We freely allow anyone to use PetaLink to access and use PetaGene-compressed files as BAM virtual files or FASTQ virtual files.
Easy IT Deployment
There are no restrictions on the number of parallel instances or the number of users within your organization. Administrator privileges are not essential to install the compression software. The PetaLink LD_PRELOAD library is user-mode, so can be run by any user in his/her shell, to enable the transparent access to the compressed files as if they were uncompressed. Since the LD_PRELOAD library is user-mode, there are no security or kernel issues for sysadmins to consider.
Easy IT Deployment
There are no restrictions on the number of parallel instances or the number of users within your organization. Administrator privileges are not essential to install the compression software. The PetaLink LD_PRELOAD library is user-mode, so can be run by any user in his/her shell, to enable the transparent access to the compressed files as if they were uncompressed. Since the LD_PRELOAD library is user-mode, there are no security or kernel issues for sysadmins to consider.
Lossless Compression
Robust, high performance FASTQ.gz and BAM compression. Reduce the footprint of genomics datasets in FASTQ.gz and BAM by up to 10x while preserving the original files bit-for-bit. We preserve the MD5 checksum on not just the internal raw data in the files, but also on the container files (.gz or BAM representation which is already internally compressed).
Compression is fast, at 290+ MBytes/sec (4-core i7) for FASTQ and only uses 3GB of RAM. Unlike CRAM, all data is fully preserved, and you do not need to specify a reference for compression or decompression - not even for BAM. The species is automatically detected, for simple and optimal compression.
PetaSuite incorporates full validation and MD5 checksums for detecting and handling bit-errors that may occur from corruption events in long-term archival storage, so that it can safely restore the remaining data unaffected by the isolated bit-errors.
Faster Transfer
PetaGene’s improved file compression can cut the time spent waiting for network data transfer by up to 90% - this could transform a 10 hour transfer into a matter of 1 hour, or reduce transfer times from more than a week to less than a day. PetaSuite can therefore drastically reduce the time-to-completion of scientists’ workflows, giving a significant boost to the productivity of the entire organisation.
PetaGene allows a customer to perform streaming compression directly to/from AWS, Azure, GCP or any other local/hybrid cloud.
Streaming compression enables FASTQ.gz or BAM files to be compressed, transferred and decompressed in a streaming fashion. PetaLink can be used to accelerate WAN random access of BAM files such as for interactive Genome Browsers. Smaller files from BayesCal and PetaLink enable faster transfers more generally.
Transparent usage
PetaLink gives access to compressed files in their native format, as BAM or FASTQ.gz at the exact same filename and path (or cloud location) as before on existing storage.
Our compression even preserves access control permissions, extended attributes and timestamps. Your tools and pipelines won’t even know that anything has changed.
PetaGene’s software lets researchers and clinicians continue using their FASTQ and BAM files in their existing tools and pipelines. It integrates into existing storage infrastructures to provide transparent compression and access.
No lock-in
We believe that customers shouldn’t be locked in by software, and for this reason we make all decompression free with a perpetual license, whether using the binary or using our PetaLink access shim.
We encourage customers to distribute any PetaGene-compressed content. We freely allow anyone to use PetaLink to access and use PetaGene-compressed files as BAM virtual files or FASTQ virtual files.
Cost Savings
Long-term storage of genomic data is one of the main expenses of sequencing experiments. It will become the single largest expense within the next five years. Compressed files decrease the per-file cost of storage and thus form an integral part of a cost saving strategy.
Unlike generic storage software, PetaSuite understands the internals of genomics files. For lossless storage, PetaSuite offers cost reductions of up to 10:1 compared to BAM or gzipped FASTQ files. This is a 96% reduction compared to raw FASTQ files, and a 90% reduction in your storage bills.
Reduced file size leads to a reduction in overall network traffic. Besides time savings, this results in substantial cost savings for transfer to and from cloud storage, where traffic is typically billed by volume. The reduction in traffic volume can also give big costs savings for data transfers directly across the internet to collaborators, customers, and repositories.
Faster Transfer
PetaGene’s improved file compression can cut the time spent waiting for network data transfer by up to 90% - this could transform a 10 hour transfer into a matter of 1 hour, or reduce transfer times from more than a week to less than a day. PetaSuite can therefore drastically reduce the time-to-completion of scientists’ workflows, giving a significant boost to the productivity of the entire organisation.
PetaGene allows a customer to perform streaming compression directly to/from AWS, Azure, GCP or any other local/hybrid cloud.
Streaming compression enables FASTQ.gz or BAM files to be compressed, transferred and decompressed in a streaming fashion. PetaLink can be used to accelerate WAN random access of BAM files such as for interactive Genome Browsers. Smaller files from BayesCal and PetaLink enable faster transfers more generally.
No lock-in
We believe that customers shouldn’t be locked in by software, and for this reason we make all decompression free with a perpetual license, whether using the binary or using our PetaLink access shim.
We encourage customers to distribute any PetaGene-compressed content. We freely allow anyone to use PetaLink to access and use PetaGene-compressed files as BAM virtual files or FASTQ virtual files.