EPI2ME
analysis
Ethernet: HTTPS/port: 443
TCP access to AWS eu-west-1 IP ranges: http://docs.aws.amazon.com/general/latest/gr/aws-ip-
ranges.html
Software
updates
HTTPS/port 443 to 178.79.175.200 and 96.126.99.215 (outbound-only access)
or DNS rule for cdn.oxfordnanoportal.com
Component Minimum requirement
Telemetry
MinKNOW collects telemetry information during sequencing runs as per the Terms and Conditions to allow monitoring of device
performance and enable remote troubleshooting. Some of this information comes from free-form text entry fields, therefore no
personally-identifiable information should be included. We do not collect any sequence data.
The EPI2ME platform is hosted within AWS and provides cloud-based analysis solutions for multiple applications. Users upload
sequence data in FASTQ format via the EPI2ME Agent, which processes the data through defined pipelines within the EPI2ME Portal.
Downloads from EPI2ME are either in Data+Telemetry or Telemetry form. The EPI2ME portal uses telemetry information to populate
reports.
Software updates
Depending on your geographical region, only one of 178.79.175.200 or 96.126.99.215 will be used for provision of updates to device
software. The Updates are triggered as pull requests, therefore outbound-only access is required.
File types
Nanopore sequencing data is stored in three file types: POD5, FASTQ and BAM. Basecalling summary information is stored in a
sequencing_summary.txt file:
POD5 is an Oxford Nanopore-developed file format which stores nanopore data in an accessible way and replaces the legacy
.fast5 format. This output also reads and writes data faster, uses less compute and has smaller raw data file size than .fast5.
POD5 files are generated in batches every 10 minutes. The files can be split by barcode if barcoding is used, but splitting by
barcode is off by default.
.fast5 is a legacy file format based upon the .hdf5 file type, which contains all information needed for analysing nanopore
sequencing data and tracking it back to its source. A .fast5 file contains data from multiple reads (4000 reads as default), and is
several hundred Mb in size.
FASTQ is a text-based sequence storage format, containing both the sequence of DNA/RNA and its quality scores. FASTQ files are
generated in batches by time, with a default of one file generated every 10 minutes. However, you can configure this frequency
to 10 minutes, one hour, or one file generated at the end of the run. You can also batch the reads based on the number of reads
per file.
BAM files are output if you perform alignment or modified base calling on the basecalled dataset. BAM file generation options are
the same as for FASTQ files. BAM files are off by default and switched on automatically if alignment or modified base calling is
used.
sequencing_summary.txt contains metadata about all basecalled reads from an individual run. Information includes read ID,
sequence length, per-read q-score, duration etc. The size of a sequence summary file will depend on the number of reads
sequenced.
Example file sizes below are based on different throughputs from an individual flow cell, with a run saving POD5, FASTQ, and BAM files
with a read N50 of 23 kb.
Flow cell output
(Gbases)
POD5 storage
(Gbytes)
FASTQ.gz storage
(Gbytes)
Unaligned BAM with modifications
(Gbytes)
10 70 6.5 6