Got a question about PetaSuite?
We’ve created this FAQ page to help you find answers to your questions about PetaSuite genomic data compression software for FASTQ and BAM files.
We also have a glossary of terms and file formats that you will encounter when using PetaSuite compression software.
If you cannot find the answer to your query here or on the glossary page please use the contact us form to send us a message.
A: PetaSuite is a command line tool for explicit conversion to and from the PetaGene formats; and PetaLink is a user mode library to instantly extend existing applications to handle the PetaGene formats.
A: License checking requires HTTPS access to a specific domain. License compliance checking occurs over encrypted TLS connections. Alternative arrangements are possible, according to client needs.
A: PetaSuite can currently be installed on Debian or RedHat based operating systems. We also support Integrative Genomics Viewer (IGV) for Windows and Mac.
A: Yes, admin privileges are not necessary for installing PetaSuite.
A: PetaSuite uses a corpus to help maximise compression and decompression. We recommend that you install at least the human corpus. Seventy other species corpuses are available. These corpuses work independently of the reference used for alignment — so it does not matter which reference was used to align your data, and it even works for compressing de-novo aligned data.
A: Yes, PetaSuite works with data from any species, even if no specific corpus for the target species is available. That includes de-novo aligned data. PetaSuite can also auto-detect the closest matching corpus for optimising compression.
A: Yes, if md5match lossless compression is selected, the compressed data can be restored as bit-for-bit identical to the original BAM and FASTQ.gz files. Therefore, we recommend that the original data is deleted once verification is complete.
A: Yes, quick validation of the first one million reads is enabled as the default. You can also choose to directly check MD5 checksums.
A: No, the PetaLink user mode library ensures that your pipelines will access the compressed virtual versions, with no modifications.
A: Yes, this is the equivalent of concatenating FASTQ.gz files.
A: PetaSuite supports load balancing using Slurm and similar compatible utilities when processing multiple files.
A: AWS, Google Cloud, Azure, and S3-compatible (e.g. Ali Baba, Oracle), hybrid and private cloud platforms are supported transparently. PetaSuite CE treats cloud destinations as though they were regular directories. The compression operation streams the file from the cloud, compresses locally and streams it back to the cloud with output to the same or different destination. PetaLink also streams decompression operations from cloud platforms.
A: We have an ongoing program of formal testing for popular analysis applications. This list shows the applications we have successfully tested so far. If you do not see an application you use, please contact us to arrange an evaluation of our software.
Tool | Version(s) tested | Application type |
samtools | 1.0–1.9 | Toolkit |
bamtools | 2.4.0–2.5.1 | Toolkit |
bcftools | 1.4–1.9 | Variant caller |
bedtools | 2.18.2–2.28.0 | Toolkit |
BWA-MEM | 0.7.13–0.7.17 | Mapper |
bwa-mem2 | 2.0pre1 | Mapper |
GATK 3 | 3.8-1-0 | Pipeline |
GATK 4 | 4.1.2.0 | Pipeline |
Manta | 1.5.0 | Variant caller |
Picard | 2.9.5–2.20.3 | Toolkit |
PySAM | 0.15.3 | Toolkit |
Sambamba | 0.5.1–0.7.0 | Toolkit |
seqtk | 1.1–1.3 | Toolkit |
Strelka2 | 2.9.10 | Variant caller |