Synopsis

Usage:

mirUtils operation [options] <files or arguments>

Operation:

mbaseMirStats
Generate miRNA statistics from miRBase-aligned BAM or SAM file(s).
 mirUtils mbaseMirStats --organism=hsa mirdata.bam
filterAligns
Extract 'good fit' alignments from miRBase-aligned BAM or SAM file(s).
 mirUtils filterAligns --min-overlap=18 --margin=2 mirdata.bam
mbaseMirInfo
Summarize mirUtils miRBase metadata for the specified organism(s) in searchable tab-delimited output format.
 mirUtils mbaseMirInfo --version=v20 hsa mmu
mbaseRefFa
Make a cDNA fasta file of hairpin miRNAs for the specified organism(s).
 mirUtils mbaseRefFa --version=v21 ath zma

Commands

mirUtils mbaseMirStats [options] bamFile(s)

Generate miRNA statistics for one or more miRBase-aligned BAM or SAM file(s).

Options:

--show-only
Show what would be done only.
--organism=<miRBase_oranism_prefix>

miRBase prefix for the organism annotations to use. Default is hsa (human). Can be the prefix for any .gff3 file in the implied mirbase/<version>/genomes subdirectory.

See miRBase artifacts for more information on miRBase components included with mirUtils.

--version=<miRBase_version>

miRBase version to use. One of: v19, v20, v21. Default v21. This must correspond to the version of miRBase used to align the input BAM/SAM file(s).

--min-overlap=<number_of_bases>
Minimum number of bases of overlap with a mature miRNA locus that an alignment must have in order to be considered a 'good fit' to the mature locus. Default 13.
--margin=<number_of_bases>
Distance in bases before the start and after the end of a mature miRNA locus that defines the boundaries of the extended mature locus. An alignment must fall entirely within this region in order to be considered a 'good fit' to the mature locus. Default 5.
--cluster-distance=<number_of_bases>
Inter-hairpin distance used to define genomic clusters for mirUtils cluster statistics. Default 10000 bases.
--out-prefix=<string>
Prefix for the output SAM files generated. Default is based on the input file name(s).
--cmb-prefix=<string>
Prefix for combined output files if additional 'grand total' statistics are desired when multiple input alignment files are supplied.
--bam-flags=<'quoted string'>
Flags and options given to samtools for filtering input alignments. Be sure to enclose these options in single quotes since they contain spaces. Default is '-F 0x4' (all aligned records). Not applicable for SAM files.
--bam-locs=<'quoted string'>
One or more region specifications to be given to samtools for filtering input BAM alignments. In the context of miRBase-aligned BAMs, these are usually miRNA hairpin precursor names (e.g. hsa-mir-98). Multiple names may be provided if enclosed in single quotes and separated by spaces. Requires a sorted BAM file with associated BAM index (.bai file). Not applicable for SAM files.

Arguments: One or more miRBase-aligned BAM or SAM files. The samtools program must be installed in order to process BAM files.

Output: For each input BAM or SAM file, a set of output files is generated. If 'grand total' statistics are requested by supplying a --cmb-prefix option and providing more than one BAM/SAM file, an additional set of output files is generated combining counts from all the input files.

The generated files can be divided into several broad categories.

  • Alignment counts based on miRNA hairpin precursors, individually and at various taxonomy levels such as groups, families and genomic clusters. See mirUtils hairpin statistics files for details.
  • Alignment counts based on mature miRNAs, by both mature miRNA loci and by mature miRNA sequences. See mirUtils mature statistics files for details.
  • Per-base coverage and start position counts for all positions in each miRNA hairpin precursor locus. See mirUtils per-base count files for details.
  • Metadata output files for miRNA hairpin precursor and mature miRNA loci. See mirUtils metadata output for details.

Examples:

Generate statistics for a single-end read BAM aligned to miRBase v21 human miRNA hairpins, using default parameters.

 mirUtils mbaseMirStats mirdata.bam

Generate statistics for a single-end read BAM aligned to miRBase v20 mouse miRNAs.

 mirUtils mbaseMirStats --organism=mmu --version=v20 mousemirs.bam

Generate statistics for a paired-end read BAM aligned to miRBase v21 human miRNAs. Here the --bam-flags '-F 0x4 -f 0x40' are provided so that only R1 (read 1) alignments are considered in order to avoid double-counting of sequenced fragments.

 mirUtils mbaseMirStats --bam-flags='-F 0x4 -f 0x40' pe_mirdata.bam

Generate statistics for two replicate single-end read BAMs aligned to miRBase v20 corn miRNAs. Output report sets will be generated for both cornmirs_b1 and cornmirs_b2. Additionally, since a --cmb-prefix was specified, a set of cmb_cornmirs output files will be generated with 'grand total' counts combining counts from both replicates.

 mirUtils mbaseMirStats --organism=zma --version=v20 --cmb-prefix=cmb_cornmirs \
	cornmirs_b1.bam cornmirs_b2.bam

Quickly generate statistics for two miRNA precursor hairpins of interest represented in a large BAM aligned to miRBase v21 human miRNAs. The input BAM file is sorted and indexed.

 mirUtils mbaseMirStats --bam-locs='hsa-mir-20a hsa-mir-30a' big_mirdata.bam
mirUtils filterAligns [options] bamFile(s)

Split one or more miRBase-aligned BAM or SAM file(s) into output SAM files based on the mature locus 'good fit' parameters specified.

Options:

--show-only
Show what would be done only.
--organism=<miRBase_oranism_prefix>

miRBase prefix for the organism annotations to use. Default is hsa (human). Can be the prefix for any .gff3 file in the implied mirbase/<version>/genomes subdirectory.

See miRBase artifacts for more information on miRBase components included with mirUtils.

--version=<miRBase_version>

miRBase version to use. One of: v19, v20, v21. Default v21. This must correspond to the version of miRBase used to align the input BAM/SAM file(s).

--min-overlap=<number_of_bases>
Minimum number of bases of overlap with a mature miRNA locus that an alignment must have in order to be considered a match. Default 13.
--margin=<number_of_bases>
Distance in bases before the start and after the end of a mature miRNA locus that defines the boundaries of the extended mature locus. An alignment must fall entirely within this region in order to be considered a match. Default 5.
--out-prefix=<string>
Prefix for the output SAM files generated. Default is based on the input file name(s).
--bam-flags=<'quoted string'>
Flags and options given to samtools for filtering input alignments. Be sure to enclose these options in single quotes since they contain spaces. Default is '-F 0x4' (all aligned records). Not applicable for SAM files.
--bam-locs=<'quoted string'>
One or more region specifications to be given to samtools for filtering input BAM alignments. In the context of miRBase-aligned BAMs, these are usually miRNA hairpin precursor names (e.g. hsa-mir-98). Multiple names may be provided if enclosed in single quotes and separated by spaces. Requires a sorted BAM file with associated BAM index (.bai file). Not applicable for SAM files.

Arguments: One or more miRBase-aligned BAM or SAM file(s). The samtools program must be installed in order to process BAM files.

Output: Two output SAM files are written for each input file. Note that unmapped alignment records are not included in either output file.

  • A file of alignment records that match the filter parameters, with file name of the form <out-prefix>.match.sam.
  • A file of alignment records that do not match the filter parameters, with file name of the form <out-prefix>.other.sam.

Examples:

From a BAM file aligned to miRBase v20, extract alignments that overlap a mature miRNA locus by at least 18 bases and that start and end within the extended mature locus region defined as 3 margin positions before the annotated start to 3 margin positions after the annotated end.

 mirUtils filterAligns --version=v20 --min-overlap=18 --margin=3 mirdata.bam

Convert the SAM file for matching alignments to a sorted, indexed BAM file using samtools.

 samtools view -bS mirdata.match.sam | samtools sort - mirdata.sorted 
 samtools index mirdata.sorted.bam
mirUtils mbaseMirInfo [options] organism(s)

Summarize mirUtils miRBase metadata for the specified organism(s) in searchable tab-delimited output format.

Options:

--show-only
Show what would be done only.
--version=<miRBase_version>
miRBase version to use. One of: v19, v20, v21. Default v21.
--cluster-distance=<number_of_bases>
Inter-hairpin distance used to define genomic clusters for mirUtils cluster statistics. Default 10000 bases.

Arguments: One or more miRBase prefixes for the organism annotations to summarize (e.g. hsa for human). Can be the prefix for any .gff3 file in the implied mirbase/<version>/genomes subdirectory. See miRBase artifacts for more information on miRBase components included with mirUtils.

Output: Two output files are generated for each organism. See mirUtils metadata output files for more information on the format of these files.

  • Hairpin locus hpInfo file containing one entry for each miRBase miRNA precursor hairpin locus, with columns describing its genomic location, GFF ID, miRBase family, genomic clusters, and so forth.
  • Mature locus matInfo file containing one entry for each miRBase mature miRNA locus, with columns describing its genomic location, GFF ID, display name, mature sequence name, and so forth.

Examples:

Generate mirUtils miRBase v20 metadata summary files for Homo sapiens using an inter-hairpin distance of 5000 bases to define clusters.

 mirUtils mbaseMirInfo --version=v20 --cluster-distance=5000 hsa

Use the resulting hpInfo metadata to count the number of chr9 clusters containing more than one hairpin.

 cut -f 1,13,14 hsa_v20_cluster5000.hpInfo | \
	grep -P -v '\t1$' | grep -P '^chr9\t'| wc -l

Use the resulting matInfo metadata to list the mature miRNA sequences that are common to 3 or more miRNA precursor hairpins.

 tail -n +2 hsa_v20_cluster5000.matInfo | cut -f 10,11 | \
	perl -ane 'print $_ if $F[1] >= 3;' | \
	sort | uniq | sort -k 2 -n -r
mirUtils mbaseRefFa [options] organism(s)

Generate a cDNA fasta file for the specified organism(s) using the all-organism hairpin.fa RNA fasta file provided with each miRBase version. The resulting fasta file can be used to build an index for performing sequence alignments to miRBase hairpins.

Options:

--show-only
Show what would be done only.
--version=<miRBase_version>

miRBase version to use. One of: v19, v20, v21. Default v21.

Arguments: One or more miRBase prefixes for the desired organisms (e.g. hsa for human). Can be the prefix for any .gff3 file in the implied mirbase/<version>/genomes subdirectory. See miRBase artifacts for more information on miRBase components included with mirUtils.

Output: For each supplied organism, a cDNA fasta file is written containing miRBase miRNA hairpin precursor sequences for the organism. Each file has a file name of the form hairpin_cDNA_<organism>.fa.

Examples:

Generate cDNA fasta files for Arabidopsis thaliana and Zea mays based on miRBase version v21 miRNA hairpin fasta.

 mirUtils mbaseRefFa --version=v21 ath zma

Use the two output fasta files to build indexes for the bwa aligner, as described here.

 bwa index -a is hairpin_cDNA_ath.fa
 bwa index -a is hairpin_cDNA_zma.fa

mirUtils output file formats

mirUtils metadata output files

When mirUtils mbaseMirStats or mbaseMirInfo is run, two metadata output files are generated for the specified organism(s), one with miRNA precursor hairpin locus metadata and one with mature miRNA locus metadata.

Both these files are simple tab-delimited text files with descriptive headers, designed to be opened by a spreadsheet programs such as Excel, loaded into database tables such as those of MySQL, or searched with command line utilities such as grep, awk and/or perl.

The metadata in these files describes the relationship among the taxonomy categories for which mirUtils mbaseMirStats reports statistics. These categories are either directly defined by miRBase annotations (such as miRNA precursor hairpins or hairpin families) or are assigned by mirUtils based on interpreting those annotations (such as groups of related precursor hairpins).

One important distinction to keep in mind is that between genomic miRNA loci and miRNA sequences. For example, a gene duplication event might create two miRNA precursor hairpin loci in the genome which share the same hairpin sequence. Similarly, two different miRNA hairpin loci with different mature miRNA loci annotations, might give rise to identical or highly similar mature miRNA sequences.

See Taxonomy for a discussion of miRBase terminology and how miRBase annotations are used by mirUtils. The remainder of this section describes the format of these metadata files.

mirUtils miRNA precursor hairpin locus metadata
The miRNA precursor hairpin locus metadata file (or hpInfo file for short), has a filename of the form <organism>_<version>_cluster<distance>.hpInfo (for example, hsa_v21_cluster10000.hpInfo), that incorporates information about the miRBase organism and version from which the metadata was derived along with the user-defined inter-hairpin cluster distance.

This tab-delimited file has a header followed by one line for each miRNA precursor hairpin locus from the organism's miRBase <organism>.gff3 file. Note that the corresponding GFF entries have type miRNA_primary_transcript.

Col # Name Description
1 chrom Name of genomic chromosome for this miRNA precursor hairpin locus (e.g. chr9).
2 strand Strand for this hairpin locus (+ or -).
3 start 1-based start position.
4 end 1-based end position.
5 length Length of hairpin locus in bases.
6 hpid Unique miRBase identifier for this hairpin locus (e.g. MI0000060 for hsa-let-7a-1). Corresponds to the hairpin locus GFF entry's ID attribute.

Note that genomic duplicates are indicated with an hpid of the form MInnnnnnn_n. For example, MI0003127_2 for the 2nd copy of hsa-mir-511 in miRBase v20.

7 name miRBase name for this hairpin (e.g. hsa-let-7a-1). Corresponds to the hairpin locus GFF entry's Name attribute.
8 dname The mirUtils display name for this hairpin locus. This is usually the same as the name except for genomic duplicates (e.g. hsa-mir-511(dup1) and hsa-mir-511(dup2) for the two copies of hsa-mir-511 in miRBase v20).
9 group Name of the group of related miRNA precursor hairpins to which this hairpin belongs. mirUtils assigns this group membership by interpreting the miRBase hairpin name as follows.

For orgnisms other than plants, miRBase hairpin names with the same prefix but that end in -N where -N is a number, are assigned to the same group by mirUtils. For example, hsa-let-7a-1, hsa-let-7a-2 and hsa-let-7a-3 are assigned to a group called hsa-let-7a[3] where the number in brackets is the number of group members.

For plants, miRBase hairpin names with the same prefix but that end in different lowercase letters are assigned to the same mirUtils group. For example, ath-MIR5014a and ath-MIR5014b are assigned to a group called ath-MIR5014[2].
10 grpct The total count of miRNA precursor hairpins in this hairpin's mirUtils group.
11 family Display name of the miRBase family to which this hairpin belongs, based on the miRBase miFam.dat file.

For miRNA precursor hairpins that are listed in a family block in miFam.dat, the family name prefix corresponds to the family's unique ID. For example, hsa-let-7a-1 and hsa-mir-98 belong to the miRBase family with ID let-7a. mirUtils gives this family the display name let-7a[12] where the number in brackets is the number of hsa organism hairpins in the family. Note that this miRBase-derived family name does not start with and organism prefix, since miRBase families include miRNAs from all annotated organisms.

For miRNA hairpins that are not listed in miFam.dat, the family name prefix is the same as the hairpin name. For example, hsa-mir-3658[unk] is the mirUtils display name for the hsa-mir-3658 "family", with the [unk] suffix and the presence of an organism prefix indicating that this designation is not based on miRBase family annotations.
12 famct The total number of miRNA precursor hairpins in this hairpin's miRBase family, counting only hairpins from this organism.
13 cluster Display name of the genomic cluster to which this hairpin belongs. A cluster is a region where the distance separating precursor hairpin loci is less than the --cluster-distance parameter used to generate this metadata file (10000 bases by default).

Hairpins located on either strand may be part of the same cluster.

For example, in miRBase v20, where genomic loci for hsa are given in UCSC hg19 coordinates, the cluster with mirUtils display name cluster(chr1:172107938-172113784)[3] includes one plus strand hairpin, hsa-mir-3120 (chr1:172107948-172108028), and two minus strand hairpins, hsa-mir-214 (chr1:172107938-172108047) and hsa-mir-199a-2 (chr1:172113675-172113784).
14 clct The total count of miRNA precursor hairpins in this hairpin's cluster, including both plus and minus strand genes.
15 cluster+- Display name of the plus strand or minus strand genomic cluster to which this hairpin belongs. A cluster+ is a region where the distance separating plus strand precursor hairpin loci is less than the --cluster-distance parameter used to generate this metadata file (10000 bases by default). A cluster- is similarly defined for minus strand hairpins.

For example, in miRBase v20, where genomic loci for hsa are given in UCSC hg19 coordinates, the cluster with mirUtils display name cluster-(chr1:172107938-172113784)[2] includes two minus strand hairpins, hsa-mir-214 (chr1:172107938-172108047) and hsa-mir-199a-2 (chr1:172113675-172113784).

16 cl+-ct The total count of miRNA precursor hairpins in this hairpin's plus or minus strand cluster, including only genes on the appropriate strand.
17 matseq5p The mirUtils display name for this hairpin's 5' mature miRNA sequence, if one is annotated. The prefix of this name corresponds to the GFF Name attribute of the first 5' mature miRNA derived from this miRNA precursor hairpin.

For example, the mirUtils 5' mature sequence display name for the hsa-let-7a-1 hairpin is called hsa-let-7a-5p[3], where the number in brackets indicates that this mature sequence is common to three miRNA precursor hairpins.

Not all hairpins have both 5' and 3' mature miRNAs annotated. Some have only one or the other, with the name not reflecting which was the source arm of the hairpin. In this case mirUtils designates the mature miRNA as 5p or 3p depending on its location relative to its parent hairpin.

For example, the hsa-mir-325 minus strand hairpin locus (chrX:77005404-77005501) has only one annotated mature miRNA, designated hsa-miR-325 in miRBase v21. Because its annotated coordinates (chrX:77005404-77005501) are in the 5' "half" of the hairpin, mirUtils marks it as the 5' mature miRNA for the hsa-mir-325 hairpin, and gives it the display name hsa-miR-3251[1].
18 matseq3p The mirUtils display name for this hairpin's 3' mature miRNA sequence, if one is annotated. The prefix of this name corresponds to the GFF Name attribute of the last 3' mature miRNA derived from this miRNA precursor hairpin.

For example, the mirUtils 3' mature sequence display name for the hsa-let-7a-1 hairpin is called hsa-let-7a-3p[2], where the number in brackets indicates that this mature sequence is common to two miRNA precursor hairpins.

Not all hairpins have both 5' and 3' mature miRNAs annotated. Some have only one or the other, with the name not reflecting which was the source arm of the hairpin. In this case mirUtils designates the mature miRNA as 5p or 3p depending on its location relative to its parent hairpin.

For example, the hsa-mir-429 plus strand hairpin locus (chr1:1169005-1169087) has only one annotated mature miRNA, designated hsa-miR-429 in miRBase v21. Because its annotated coordinates (chr1:1169055-1169076) are in the 3' "half" of the hairpin, mirUtils marks it as the 3' mature miRNA for the hsa-mir-429 hairpin, and gives it the display name hsa-miR-4291[1].
19 mat5pid The unique miRBase mature locus identifier for this hairpin's 5' mature miRNA, if one is annotated. Corresponds to the GFF ID attribute of the first 5' mature miRNA derived from this miRNA precursor hairpin.

For example, the mat5pid for hsa-let-7a-1 is MIMAT0000062_2.

20 mat3pid The unique miRBase mature locus identifier for this hairpin's 3' mature miRNA, if one is annotated. Corresponds to the GFF ID attribute of the last 3' mature miRNA derived from this miRNA precursor hairpin.

For example, the mat3pid for hsa-let-7a-1 is MIMAT0004481_1.

21 hpfa The RNA fasta sequence for this miRNA precursor, from the miRBase hairpin.fa file.
mirUtils mature miRNA locus metadata
The mature miRNA locus metadata file (or matInfo file for short), has a filename of the form <organism>_<version>_cluster<distance>.matInfo (for example, hsa_v21_cluster10000.matInfo), that incorporates information about the miRBase organism and version from which the metadata was derived.

This tab-delimited file has a header followed by one line for each mature miRNA locus from the organism's miRBase <organism>.gff3 file. Note that the corresponding GFF entries have type miRNA.

Col # Name Description
1 chrom Name of genomic chromosome for this mature miRNA locus. E.g. chr9.
2 strand Strand for this hairpin mature (+ or -).
3 start 1-based start position.
4 end 1-based end position.
5 length Length of mature miRNA locus in bases.
6 matlocid The unique miRBase mature locus identifier for this mature miRNA. Corresponds to the GFF mature miRNA's ID attribute.

For example, the matlocid for the hsa-let-7a-5p mature miRNA is MIMAT0000062_2. Note that the _2 suffix of this miRBase identifier indicates that the mature miRNA sequence associated with this locus is shared with other miRNA loci.

7 dname The mirUtils display name for this mature miRNA locus. In order to make this display name unique it has two parts: the name of the miRNA precursor hairpin from which this mature miRNA locus derives, and the name of the miRNA locus itsef. The latter corresponds to the GFF mature miRNA's Name attribute.

For example, the mirUtils display name for the 5' mature miRNA locus derived from the hsa-let-7a-1 hairpin is called hsa-let-7a-1(hsa-let-7a-5p). The GFF Name hsa-let-7a-5p is not sufficient to uniquely identify this mature locus, because that name (and the mature sequence) is shared by three different hairpin precursors.

When the mature sequence name does not specify whether it is 5' or 3', mirUtils designates it as 5p or 3p depending on its location relative to its parent hairpin.

For example, the hsa-mir-1302-2 plus strand hairpin locus (chr1:30366-30503) has only one annotated mature miRNA, designated hsa-miR-1302 in miRBase v21. Because its annotated coordinates (chr1:30438-30458) are in the 3' "half" of the hairpin, mirUtils marks it as a 3p mature miRNA, and gives it the mature miRNA locus display name hsa-mir-1302-2(hsa-miR-1302(3p)).
8 matseqid The miRBase mature sequence identifier for this mature miRNA. Corresponds to the GFF mature miRNA's Alias attribute.

For example, the matseqid for the 5' mature miRNA locus of the hsa-let-7a-1 precursor hairpin is MIMAT0000062, which is the same as that for the hsa-let-7a-2 and hsa-let-7a-3 precursor hairpins.

Note the relationship between this miRBase mature sequence identifier, which is the same for all mature miRNA loci that share the sequence, and the miRBase mature locus identifier (e.g. MIMAT0000062_2) which uniquely identifies a specific genomic locus giving rise to the sequence.
9 name The miRBase mature sequence name for this mature miRNA locus. Corresponds to the GFF mature miRNA's Name attribute.
10 matseq The mirUtils display name for this mature miRNA locus' mature sequence. The prefix of this name corresponds to the GFF mature miRNA's Name attribute.

For example, the mirUtils 3' mature sequence display name for the hsa-let-7a-1 hairpin is called hsa-let-7a-3p[2], where the number in brackets indicates that this mature sequence is common to two miRNA precursor hairpins.

11 msct The total count of mature miRNA loci that share this locus' mature sequence.
12 hpid Unique miRBase identifier for the miRNA precursor hairpin locus from which this mature miRNA locus derives (e.g. MI0000060 for the hsa-let-7a-1(hsa-let-7a-5p) mature miRNA locus).

Note that this identifier matches the hpid field for the parent precursor hairpin in the mirUtils .hpInfo hairpin locus metadata file.

13 hairpin Unique mirUtils display name for the miRNA precursor hairpin locus from which this mature miRNA locus derives (e.g. hsa-let-7a-1 for the hsa-let-7a-1(hsa-let-7a-5p) mature miRNA locus).

Note that this display name matches the dname field for the parent precursor hairpin in the mirUtils .hpInfo hairpin locus metadata file.

14 matfa The RNA fasta sequence for this mature miRNA, from the miRBase mature.fa file.
mirUtils statistics output files

When the mirUtils mbaseMirStats function is invoked, one or more sets of output files are produced. The generated files can be divided into several broad categories.

  • Alignment counts based on miRNA hairpin precursors, individually and at various taxonomy levels such as groups, families and genomic clusters.
  • Alignment counts based on mature miRNAs, by both mature miRNA locus and by mature miRNA sequence.
  • Per-base coverage and start position counts for all positions in each miRNA hairpin precursor locus.
  • Metadata output files for miRNA hairpin precursor and mature miRNA loci.

The format of the (single set of) metadata output files is described above. The three sections below desribe the other output file types. Multiple files of each output type are produced, but all have the same format so are discussed as subsets.

See Taxonomy for a discussion of how mirUtils uses miRBase annotations to define the taxonomy levels at which mirUtils reports statistics. And see Alignment count reporting for an overview of how mirUtils rolls up counts from these different taxonomy levels and how it handles alignments to mature miRNA loci.

mirUtils per-base count files

mirUtils generates two per-position count files in each report set. Both report per-hairpin-position counts, with slightly different interpretations.

  • The coverage file, with file name of form <output_prefix>.coverage, reports the total number of aligned bases covering each hairpin locus position.

    For example if an alignment to a particular hairpin locus is given as extending from positions 4 through 25, it would contribute one to the coverage counts in position columns 4-25.

  • The starts file, with file name of form <output_prefix>.starts, reports the total number of alignments that start at each hairpin locus position.

    For example if an alignment to a particular hairpin locus is given as extending from positions 4 through 25, it would contribute one to the starts count in position column 4.

These are tab-delimited files with a header and one record for each miRNA precursor hairpin encountered in the input. The first 11 columns, described below, contain information about the hairpin. The remaining columns (numbered 1 up to the length of the longest hairpin) contain counts of either all alignments starting at that position or all alignment bases covering that position.

Col# Name Description
1 hairpin Name of this miRNA hairpin precursor (e.g. hsa-mir-30a).
2 rank The rank, by total read count, of this hairpin among those reported.
3 reads Total count of reads aligned to this hairpin.
4 bases Total count of aligning bases from all aligned reads.
5 strand Genomic strand of this hairpin.
6 5pPos1 1-based start offset of the 5' mature miRNA locus, if one is annotated.
7 5pPos2 1-based end offset of the 5' mature miRNA locus, if one is annotated.
8 3pPos1 1-based start offset of the 3' mature miRNA locus, if one is annotated.
9 3pPos2 1-based end offset of the 3' mature miRNA locus, if one is annotated.
10 length Length of this hairpin in bases. Only position fields 1 through this value can contain count values.
11 seq The RNA fasta sequence for this miRNA hairpin precursor, from the miRBase hairpin.fa file.
mirUtils hairpin-related count files

mirUtils generates six count files in each report set that are based on alignments to miRNA precursors, each at a different taxonomy level.

  • The hairpin file, with file name of form <output_prefix>.hairpin.hist, reports alignments to individual miRNA hairpin precursors.

  • The group file, with file name of form <output_prefix>.group.hist, reports alignments to miRNA hairpin groups, related by sequence similarity of their mature miRNA products.

  • The family file, with file name of form <output_prefix>.family.hist, reports alignments to miRBase-defined miRNA hairpin families, related largely by their common functional targets.

  • The cluster+ file, with file name of form <output_prefix>.cluster+.hist, reports alignments to miRNA hairpins belonging to the same genomic plus-strand cluster.

  • The cluster- file, with file name of form <output_prefix>.cluster-.hist, reports alignments to miRNA hairpins belonging to the same genomic minus-strand cluster.

  • The cluster file, with file name of form <output_prefix>.cluster.hist, reports alignments to miRNA hairpins belonging to the same genomic cluster, including both plus and minus strand genes.

These are tab-delimited files with a header and one record for each reporting-level entry (e.g. individual hairpin, group, family or cluster). Field definitions are common to all files in the hairpin-related statistics subset and are described below. The Alignment count reporting section provides important background for understanding many of these count fields.

Col # Name Description
1 name mirUtils display name for the reported entity (hairpin, group, family, or cluster).
2 rank The rank, by total read count, of this entry among those reported.
3 count Total count of alignments to this entry in the input.
4 dup Count of alignment records that had the duplicate SAM flag set (0x400). Only significant if the input BAM/SAM had duplicates marked.
5 oppStrand Count of alignments to the "opposite" strand than expected. For miRBase alignments, which are to 5' to 3' miRNA hairpin precursor sequences, correct alignments are normally expected to be reported as plus strand, so this field counts alignments reported as minus strand.
6 mm0 Count of alignments with no mismatches to the reference hairpin sequence.
7 mm1 Count of alignments with one mismatch to the reference hairpin sequence.
8 mm2 Count of alignments with two mismatches to the reference hairpin sequence.
9 mm3p Count of alignments with three or more mismatches to the reference hairpin sequence.
10 indel Count of alignments with insertions or deletions with respect to the reference hairpin sequence.
11 mq0 Count of alignments marked as having mapping quality 0, indicating that the alignment cannot be uniquely assigned to its reported location.
12 mq1-19 Count of alignments marked as having mapping quality between 1 and 19.
13 mq20-29 Count of alignments marked as having mapping quality between 20 and 29.
14 mq30p Count of alignments marked as having mapping quality 30 and above.
15 5pOnly Count of 'good fit' alignments to the hairpin's 5' mature locus, if one is annotated. These are alignments that sufficiently overlap the annotated 5' mature locus, according to the specified minimum overlap, and fall entirely within the extended mature miRNA locus, the region between the position at the specified margin distance before the mature locus annotated start through the position at the specified margin distance after the annotated mature locus end, where both minimum overlap and margin are parameters of the statistics reporting run.
16 5pPlus Count of alignments that sufficiently overlap the hairpin's 5' mature locus but do not sufficiently overlap the 3' mature locus.
17 3pOnly Count of 'good fit' alignments to the hairpin's 3' mature locus, if one is annotated. These are alignments that sufficiently overlap the annotated 3' mature locus, according to the specified minimum overlap and fall entirely within the extended mature miRNA locus, the region between the position at the specified margin distance before the mature locus annotated start through the position at the specified margin distance after the annotated mature locus end, where both minimum overlap and margin are parameters of the statistics reporting run.
18 3pPlus Count of alignments that sufficiently overlap the hairpin's 3' mature locus but do not sufficiently overlap the 5' mature locus.
19 5and3p Count of alignments that sufficiently overlap both the annotated 5' and 3' mature loci.
20 totBase Total count of bases that aligned to the hairpin.
21 5pBase Count of bases that aligned to the annotated 5' locus.
22 3pBase Count of bases that aligned to the annotated 3' locus.
mirUtils mature-related count files

mirUtils generates two count files relating to mature miRNAs in each report set.

  • The mature file, with file name of form <output_prefix>.mature.hist, reports alignments to individual mature miRNA loci.

  • The mature sequence file, with file name of form <output_prefix>.matseq.hist, reports alignments to mature miRNA sequences that may be derived from more than one hairpin precursor.

These are tab-delimited files with a header and one record for each mature miRNA locus or sequence. Field definitions are a subset of those for haripin-related count files, and are described below.

Note that unlike hairpin-related count files, where the total count includes all alignments from the input, mature miRNA count totals include only 'good fit' alignments to the mature miRNA locus, as defined by the minimum overlap and margin parameters specified for the mirUtils reporting run.

Col # Name Description
1 name mirUtils display name for the mature miRNA locus or sequence.
2 rank The rank, by read count, of this mature miRNA among those reported.
3 count Count of 'good fit' alignments to the mature locus.
4 dup Count of alignment records that had the duplicate SAM flag set (0x400). Only significant if the input BAM/SAM had duplicates marked.
5 oppStrand Count of alignments to the "opposite" strand than expected. For miRBase alignments, which are to 5' to 3' miRNA hairpin precursor sequences, correct alignments are normally expected to be reported as plus strand, so this field counts alignments reported as minus strand.
6 mm0 Count of alignments with no mismatches to the mature locus sequence.
7 mm1 Count of alignments with one mismatch to the mature locus sequence.
8 mm2 Count of alignments with two mismatches to the mature locus sequence.
9 mm3p Count of alignments with three or more mismatches to the mature locus sequence.
10 indel Count of alignments with insertions or deletions with respect to the mature locus sequence.
11 mq0 Count of alignments marked as having mapping quality 0, indicating that the alignment cannot be uniquely assigned to its reported location.
12 mq1-19 Count of alignments marked as having mapping quality between 1 and 19.
13 mq20-29 Count of alignments marked as having mapping quality between 20 and 29.
14 mq30p Count of alignments marked as having mapping quality 30 and above.
15 totBase Total count of bases that aligned to the mature locus.