FASTA format: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|Tcl}}: FASTA files use the .fas suffix by default)
No edit summary
Line 17: Line 17:


Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size.
Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size.

=={{header|D}}==

<lang d>
import std.exception;
import std.file;
import std.stdio;
void main(string[] args)
{
if (args.length < 2)
{
throw new Exception("You must specify a file.");
}
enforce(exists(args[1]));
bool firstLine = false;
foreach (line; File(args[1]).byLine)
{
if (line[0] == '>')
{
if (firstLine)
{
writeln();
}
else
{
firstLine = true;
}
write(line[1..$] ~ ": ");
}
else
{
write(line);
}
}
writeln();
}
</lang>


=={{header|Perl 6}}==
=={{header|Perl 6}}==

Revision as of 15:04, 5 April 2013

FASTA format is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In bioinformatics, long character strings are often encoded in a format called FASTA. A FASTA file can contain several strings, each identified by a name marked by a “>” character at the beginning of the line.

Write a program that reads a FASTA file such as:

>Rosetta_Example_1
THERECANBENOSPACE
>Rosetta_Example_2
THERECANBESEVERAL
LINESBUTTHEYALLMUST
BECONCATENATED

And prints the following output:

Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED

Note that a high-quality implementation will not hold the entire file in memory at once; real FASTA files can be multiple gigabytes in size.

D

<lang d> import std.exception; import std.file; import std.stdio;

void main(string[] args) {

   if (args.length < 2)
   { 
       throw new Exception("You must specify a file.");
   }
   enforce(exists(args[1]));
   bool firstLine = false;
   foreach (line; File(args[1]).byLine)
   {
       if (line[0] == '>')
       {
           if (firstLine)
           {
               writeln();
           }
           else
           {
               firstLine = true;
           }
           write(line[1..$] ~ ": ");
       }
       else
       {
           write(line);
       }
   }
   writeln();

} </lang>

Perl 6

Certainly not the most elegant way to do it, but that's a start: <lang Perl 6>say "{.[0]}: {.[1]>>.comb(/\N+/).join}" for ">Rosetta_Example_1 THERECANBENOSPACE >Rosetta_Example_2 THERECANBESEVERAL LINESBUTTHEYALLMUST BECONCATENATED".comb: / '>' (\N+)\n (<!before '>'>\N+\n?)+ /, :match</lang>

Tcl

<lang tcl>proc fastaReader {filename} {

   set f [open $filename]
   set sep ""
   while {[gets $f line] >= 0} {

if {[string match >* $line]} { puts -nonewline "$sep[string range $line 1 end]: " set sep "\n" } else { puts -nonewline $line }

   }
   puts ""
   close $f

}

fastaReader ./rosettacode.fas</lang>

Output:
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED