1. Make the Binned Bed file.
The bed file should be done with as follows:
a. Divide the gene into equal 100 bins,
b. Select 300 bps up or down stream of the gene coordinates and divide them into equal 10 bins.
The first part can be done with perl or bedtools.
Attention: The genes on the - strand should be reversed when calculate the average reads of all genes for metaplot.
perl Gene_bins.pl Gene_ff3 > Gene_bin
the Input Gene_ff3 file:
Chr1 2903 10817 LOC_Os01g01010 +
Chr1 11218 12435 LOC_Os01g01019 +
Chr1 12648 15915 LOC_Os01g01030 +
Chr1 16292 20323 LOC_Os01g01040 +
Chr1 22841 26971 LOC_Os01g01050 +
Chr1 27136 28651 LOC_Os01g01060 +
Chr1 29818 34493 LOC_Os01g01070 +
Chr1 35581 41180 LOC_Os01g01080 +
Chr1 44595 47526 LOC_Os01g01090 +
Chr1 47856 53412 LOC_Os01g01100 -
The generated Gene_bin file:
Chr1 2603 2633 LOC_Os01g01010 + 1
Chr1 2633 2663 LOC_Os01g01010 + 2
Chr1 2663 2693 LOC_Os01g01010 + 3
Chr1 2693 2723 LOC_Os01g01010 + 4
Chr1 2723 2753 LOC_Os01g01010 + 5
Chr1 2753 2783 LOC_Os01g01010 + 6
Chr1 2783 2813 LOC_Os01g01010 + 7
Chr1 2813 2843 LOC_Os01g01010 + 8
Chr1 2843 2873 LOC_Os01g01010 + 9
Chr1 2873 2903 LOC_Os01g01010 + 10
2. Call reads for each bin by bedtools
./bedtools multicov -bams WT_1_rmp.bam -bed Gene_bin >Gene_bin_read_counts
#bedtools should be the latest version.
less -S less -S Gene_read_final.txt
Chr1 2603 2633 LOC_Os01g01010 + 1 30
Chr1 2633 2663 LOC_Os01g01010 + 2 29
Chr1 2663 2693 LOC_Os01g01010 + 3 29
Chr1 2693 2723 LOC_Os01g01010 + 4 29
Chr1 2723 2753 LOC_Os01g01010 + 5 26
Chr1 2753 2783 LOC_Os01g01010 + 6 29
Chr1 2783 2813 LOC_Os01g01010 + 7 39
Chr1 2813 2843 LOC_Os01g01010 + 8 46
Chr1 2843 2873 LOC_Os01g01010 + 9 42
Chr1 2873 2903 LOC_Os01g01010 + 10 42
Chr1 2903 2982 LOC_Os01g01010 + 1 59
Chr1 2982 3061 LOC_Os01g01010 + 2 50
Chr1 3061 3140 LOC_Os01g01010 + 3 55
Chr1 3140 3219 LOC_Os01g01010 + 4 62
Chr1 3219 3298 LOC_Os01g01010 + 5 67
Chr1 3298 3377 LOC_Os01g01010 + 6 69
Chr1 3377 3456 LOC_Os01g01010 + 7 61
Chr1 3456 3535 LOC_Os01g01010 + 8 66
Chr1 3535 3614 LOC_Os01g01010 + 9 69
Chr1 3614 3693 LOC_Os01g01010 + 10 64
Chr1 3693 3772 LOC_Os01g01010 + 11 51
3. Transform the data from a long data into a short data.
perl READS_reformat.pl Gene_bin_read_counts > Gene_read_final.txt
less -S Gene_read_final.txt
LOC_Os06g23090 8 9 8 7 6 5 7 7 7 6 6 8 9 9 10 8 7
LOC_Os01g39830 23 24 24 22 22 23 23 23 26 29 32 17 22 15 20 22 24
LOC_Os02g21850 2 3 3 2 2 2 4 3 3 4 4 4 4 5 5 5 5
LOC_Os04g24190 14 11 10 9 10 12 15 11 10 13 15 12 12 9 9 7 8
LOC_Os06g12180 4 4 5 5 7 7 9 9 6 6 5 5 6 8 7 7 7
LOC_Os10g19150 7 9 8 7 3 1 0 2 2 2 0 0 0 0 0 0 0
LOC_Os01g73720 2 3 5 6 6 7 10 8 6 8 8 9 9 9 8 8 8
LOC_Os08g15288 141 144 121 111 83 59 58 75 84 106 110 115 121 131 130 124 124
LOC_Os01g13810 11 10 9 7 5 4 3 3 2 0 0 2 2 4 4 7 8
LOC_Os04g10214 12 11 11 10 11 14 15 17 17 22 21 26 28 28 28 28 29
4. Calculate the average bin reads of all genes, then make the metaplot . This can be done in Excel.
image.pngNote that: The reads can be normalized with the scaling factor which was value of the total mapped reads/1 million.
网友评论