FASTA format: Difference between revisions
m (→{{header|Tcl}}: FASTA files use the .fas suffix by default) |
No edit summary |
||
Line 17: | Line 17: | ||
Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size. |
Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size. |
||
=={{header|D}}== |
|||
<lang d> |
|||
import std.exception; |
|||
import std.file; |
|||
import std.stdio; |
|||
void main(string[] args) |
|||
{ |
|||
if (args.length < 2) |
|||
{ |
|||
throw new Exception("You must specify a file."); |
|||
} |
|||
enforce(exists(args[1])); |
|||
bool firstLine = false; |
|||
foreach (line; File(args[1]).byLine) |
|||
{ |
|||
if (line[0] == '>') |
|||
{ |
|||
if (firstLine) |
|||
{ |
|||
writeln(); |
|||
} |
|||
else |
|||
{ |
|||
firstLine = true; |
|||
} |
|||
write(line[1..$] ~ ": "); |
|||
} |
|||
else |
|||
{ |
|||
write(line); |
|||
} |
|||
} |
|||
writeln(); |
|||
} |
|||
</lang> |
|||
=={{header|Perl 6}}== |
=={{header|Perl 6}}== |
Revision as of 15:04, 5 April 2013
In bioinformatics, long character strings are often encoded in a format called FASTA. A FASTA file can contain several strings, each identified by a name marked by a “>” character at the beginning of the line.
Write a program that reads a FASTA file such as:
>Rosetta_Example_1 THERECANBENOSPACE >Rosetta_Example_2 THERECANBESEVERAL LINESBUTTHEYALLMUST BECONCATENATED
And prints the following output:
Rosetta_Example_1: THERECANBENOSPACE Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED
Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size.
D
<lang d> import std.exception; import std.file; import std.stdio;
void main(string[] args) {
if (args.length < 2) { throw new Exception("You must specify a file."); } enforce(exists(args[1])); bool firstLine = false; foreach (line; File(args[1]).byLine) { if (line[0] == '>') { if (firstLine) { writeln(); } else { firstLine = true; } write(line[1..$] ~ ": "); } else { write(line); } } writeln();
} </lang>
Perl 6
Certainly not the most elegant way to do it, but that's a start: <lang Perl 6>say "{.[0]}: {.[1]>>.comb(/\N+/).join}" for ">Rosetta_Example_1 THERECANBENOSPACE >Rosetta_Example_2 THERECANBESEVERAL LINESBUTTHEYALLMUST BECONCATENATED".comb: / '>' (\N+)\n (<!before '>'>\N+\n?)+ /, :match</lang>
Tcl
<lang tcl>proc fastaReader {filename} {
set f [open $filename] set sep "" while {[gets $f line] >= 0} {
if {[string match >* $line]} { puts -nonewline "$sep[string range $line 1 end]: " set sep "\n" } else { puts -nonewline $line }
} puts "" close $f
}
fastaReader ./rosettacode.fas</lang>
- Output:
Rosetta_Example_1: THERECANBENOSPACE Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED