Bioinformatics/base count: Difference between revisions
m (→{{header|REXX}}: added a REXX stub.) |
m (→{{header|REXX}}: added the REXX computer programming language for this task.) |
||
Line 71: | Line 71: | ||
=={{header|REXX}}== |
=={{header|REXX}}== |
||
<lang rexx>/*REXX program finds the number of each base in a DNA string (along with a total). */ |
|||
parse arg dna . |
|||
if dna=='' | dna=="," then dna= CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG , |
|||
CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG , |
|||
AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT , |
|||
GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT , |
|||
CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG , |
|||
TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA , |
|||
TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT , |
|||
CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG , |
|||
TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC , |
|||
GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT |
|||
dna= space(dna, 0); upper dna /*elide blanks from DNA; uppercase it. */ |
|||
say '────────length of the DNA string: ' length(dna) |
|||
@.=0 /*initialize the count for all bases. */ |
|||
w= 1 /*the maximum width of a base count. */ |
|||
$= /*a placeholder for the names of bases.*/ |
|||
do j=1 for length(dna) /*traipse through the DNA string. */ |
|||
_= substr(dna, j, 1) /*obtain a base name from the DNA str. */ |
|||
if pos(_, $)==0 then $=$ || _ /*if not found before, add it to list. */ |
|||
@._= @._ + 1 /*bump the count of this base. */ |
|||
w= max(w, length(@._) ) /*compute the maximum width number. */ |
|||
end /*j*/ |
|||
say |
|||
do k=0 for 255; z= d2c(k) /*traipse through all possibilities. */ |
|||
if pos(z, $)==0 then iterate /*Was this base found? No, then skip. */ |
|||
say ' base ' z " has a basecount of: " right(@.z, w) |
|||
@.tot= @.tot + @.z /*add to a grand total to verify count.*/ |
|||
end /*k*/ |
|||
say /*stick a fork in it, we're all done. */ |
|||
say '────────total for all basecounts:' right(@.tot, w+1)</lang> |
|||
{{out|output|text= when using the default input:}} |
|||
<pre> |
|||
────────length of the DNA string: 500 |
|||
base A has a basecount of: 129 |
|||
base C has a basecount of: 97 |
|||
base G has a basecount of: 119 |
|||
base T has a basecount of: 155 |
|||
────────total for all basecounts: 500 |
|||
</pre> |
Revision as of 21:58, 25 November 2019
Given this string representing ordered DNA bases:
CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT
- "Pretty print" the sequence followed by a summary of the counts of each of the bases, (A, C, G, and T) in the sequence as well as the total count of bases in the string.
Python
<lang python>from collections import Counter
def basecount(dna):
return sorted(Counter(dna).items())
def seq_split(dna, n=50):
return [dna[i: i+n] for i in range(0, len(dna), n)]
def seq_pp(dna, n=50):
for i, part in enumerate(seq_split(dna, n)): print(f"{i*n:>5}: {part}") print("\n BASECOUNT:") tot = 0 for base, count in basecount(dna): print(f" {base:>3}: {count}") tot += count base, count = 'TOT', tot print(f" {base:>3}= {count}")
if __name__ == '__main__':
print("SEQUENCE:") sequence = \
CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG\ CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG\ AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT\ GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT\ CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG\ TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA\ TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT\ CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG\ TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC\ GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT
seq_pp(sequence)
</lang>
- Output:
SEQUENCE: 0: CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG 50: CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG 100: AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT 150: GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT 200: CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG 250: TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA 300: TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT 350: CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG 400: TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC 450: GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT BASECOUNT: A: 129 C: 97 G: 119 T: 155 TOT= 500
REXX
<lang rexx>/*REXX program finds the number of each base in a DNA string (along with a total). */ parse arg dna . if dna== | dna=="," then dna= CGTAAAAAATTACAACGTCCTTTGGCTATCTCTTAAACTCCTGCTAAATG ,
CTCGTGCTTTCCAATTATGTAAGCGTTCCGAGACGGGGTGGTCGATTCTG , AGGACAAAGGTCAAGATGGAGCGCATCGAACGCAATAAGGATCATTTGAT , GGGACGTTTCGTCGACAAAGTCTTGTTTCGAGAGTAACGGCTACCGTCTT , CGATTCTGCTTATAACACTATGTTCTTATGAAATGGATGTTCTGAGTTGG , TCAGTCCCAATGTGCGGGGTTTCTTTTAGTACGTCGGGAGTGGTATTATA , TTTAATTTTTCTATATAGCGATCTGTATTTAAGCAATTCATTTAGGTTAT , CGCCGCGATGCTCGGTTCGGACCGCCAAGCATCTGGCTCCACTGCTAGTG , TCCTAAATTTGAATGGCAAACACAAATAAGATTTAGCAATTCGTGTAGAC , GACCGGGGACTTGCATGATGGGAGCAGCTTTGTTAAACTACGAACGTAAT
dna= space(dna, 0); upper dna /*elide blanks from DNA; uppercase it. */ say '────────length of the DNA string: ' length(dna) @.=0 /*initialize the count for all bases. */ w= 1 /*the maximum width of a base count. */ $= /*a placeholder for the names of bases.*/
do j=1 for length(dna) /*traipse through the DNA string. */ _= substr(dna, j, 1) /*obtain a base name from the DNA str. */ if pos(_, $)==0 then $=$ || _ /*if not found before, add it to list. */ @._= @._ + 1 /*bump the count of this base. */ w= max(w, length(@._) ) /*compute the maximum width number. */ end /*j*/
say
do k=0 for 255; z= d2c(k) /*traipse through all possibilities. */ if pos(z, $)==0 then iterate /*Was this base found? No, then skip. */ say ' base ' z " has a basecount of: " right(@.z, w) @.tot= @.tot + @.z /*add to a grand total to verify count.*/ end /*k*/
say /*stick a fork in it, we're all done. */ say '────────total for all basecounts:' right(@.tot, w+1)</lang>
- output when using the default input:
────────length of the DNA string: 500 base A has a basecount of: 129 base C has a basecount of: 97 base G has a basecount of: 119 base T has a basecount of: 155 ────────total for all basecounts: 500