Strip block comments

A block comment begins with a beginning delimiter and ends with a ending delimiter, including the delimiters. These delimiters are often multi-character sequences.

Task: Strip block comments from program text (of a programming language much like classic C). Your demos should at least handle simple, non-nested and multiline block comment delimiters. The beginning delimiter is the two-character sequence “/*” and the ending delimiter is “*/”.

Sample text for stripping:

  /**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo 
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
    function something() {
    }

Extra credit: Ensure that the stripping code is not hard-coded to the particular delimiters described above, but instead allows the caller to specify them. (If your language supports them, optional parameters may be useful for this.)

C.f: Strip comments from a string

Ada

strip.adb: <lang Ada>with Ada.Strings.Fixed; with Ada.Strings.Unbounded; with Ada.Text_IO; with Ada.Command_Line;

procedure Strip is

  use Ada.Strings.Unbounded;
  procedure Print_Usage is
  begin
     Ada.Text_IO.Put_Line ("Usage:");
     Ada.Text_IO.New_Line;
     Ada.Text_IO.Put_Line ("   strip <file> [<opening> [<closing>]]");
     Ada.Text_IO.New_Line;
     Ada.Text_IO.Put_Line ("      file: file to strip");
     Ada.Text_IO.Put_Line ("      opening: string for opening comment");
     Ada.Text_IO.Put_Line ("      closing: string for closing comment");
     Ada.Text_IO.New_Line;
  end Print_Usage;

  Opening_Pattern : Unbounded_String := To_Unbounded_String ("/*");
  Closing_Pattern : Unbounded_String := To_Unbounded_String ("*/");
  Inside_Comment  : Boolean          := False;

  function Strip_Comments (From : String) return String is
     use Ada.Strings.Fixed;
     Opening_Index : Natural;
     Closing_Index : Natural;
     Start_Index   : Natural := From'First;
  begin
     if Inside_Comment then
        Start_Index :=
           Index (Source => From, Pattern => To_String (Closing_Pattern));
        if Start_Index < From'First then
           return "";
        end if;
        Inside_Comment := False;
        Start_Index    := Start_Index + Length (Closing_Pattern);
     end if;
     Opening_Index :=
        Index
          (Source  => From,
           Pattern => To_String (Opening_Pattern),
           From    => Start_Index);
     if Opening_Index < From'First then
        return From (Start_Index .. From'Last);
     else
        Closing_Index :=
           Index
             (Source  => From,
              Pattern => To_String (Closing_Pattern),
              From    => Opening_Index + Length (Opening_Pattern));
        if Closing_Index > 0 then
           return From (Start_Index .. Opening_Index - 1) &
                  Strip_Comments
                     (From (
              Closing_Index + Length (Closing_Pattern) .. From'Last));
        else
           Inside_Comment := True;
           return From (Start_Index .. Opening_Index - 1);
        end if;
     end if;
  end Strip_Comments;

  File : Ada.Text_IO.File_Type;

begin

  if Ada.Command_Line.Argument_Count < 1
    or else Ada.Command_Line.Argument_Count > 3
  then
     Print_Usage;
     return;
  end if;
  if Ada.Command_Line.Argument_Count > 1 then
     Opening_Pattern := To_Unbounded_String (Ada.Command_Line.Argument (2));
     if Ada.Command_Line.Argument_Count > 2 then
        Closing_Pattern :=
           To_Unbounded_String (Ada.Command_Line.Argument (3));
     else
        Closing_Pattern := Opening_Pattern;
     end if;
  end if;
  Ada.Text_IO.Open
    (File => File,
     Mode => Ada.Text_IO.In_File,
     Name => Ada.Command_Line.Argument (1));
  while not Ada.Text_IO.End_Of_File (File => File) loop
     declare
        Line : constant String := Ada.Text_IO.Get_Line (File);
     begin
        Ada.Text_IO.Put_Line (Strip_Comments (Line));
     end;
  end loop;
  Ada.Text_IO.Close (File => File);

end Strip;</lang> output:

  





   function subroutine() {
    a =  b + c ;
   }
   

   


    function something() {
    }

AutoHotkey

<lang AutoHotkey>code = (

/**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }

)

Open-Close Comment delimiters

openC:="/*" closeC:="*/"

Make it "Regex-Safe"

openC:=RegExReplace(openC,"(\*|\^|\?|\\|\+|\.|\!|\{|\}|\[|\]|\$|\|)","\$0") closeC:=RegExReplace(closeC,"(\*|\^|\?|\\|\+|\.|\!|\{|\}|\[|\]|\$|\|)","\$0")

Display final result

MsgBox % sCode := RegExReplace(code,"s)(" . openC . ").*?(" . closeC . ")")</lang>


   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

BBC BASIC

Works with: BBC BASIC for Windows

<lang bbcbasic> infile$ = "C:\sample.c"

     outfile$ = "C:\stripped.c"
     
     PROCstripblockcomments(infile$, outfile$, "/*", "*/")
     END
     
     DEF PROCstripblockcomments(infile$, outfile$, start$, finish$)
     LOCAL infile%, outfile%, comment%, test%, A$
     
     infile% = OPENIN(infile$)
     IF infile%=0 ERROR 100, "Could not open input file"
     outfile% = OPENOUT(outfile$)
     IF outfile%=0 ERROR 100, "Could not open output file"
     
     WHILE NOT EOF#infile%
       A$ = GET$#infile% TO 10
       REPEAT
         IF comment% THEN
           test% = INSTR(A$, finish$)
           IF test% THEN
             A$ = MID$(A$, test% + LEN(finish$))
             comment% = FALSE
           ENDIF
         ELSE
           test% = INSTR(A$, start$)
           IF test% THEN
             BPUT#outfile%, LEFT$(A$, test%-1);
             A$ = MID$(A$, test% + LEN(start$))
             comment% = TRUE
           ENDIF
         ENDIF
       UNTIL test%=0
       IF NOT comment% BPUT#outfile%, A$
     ENDWHILE
     
     CLOSE #infile%
     CLOSE #outfile%
     ENDPROC</lang>

Output file:

  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

C

<lang C>#include <stdio.h>

include <string.h>
include <stdlib.h>

const char *ca = "/*", *cb = "*/"; int al = 2, bl = 2;

char *loadfile(const char *fn) {

   FILE *f = fopen(fn, "rb");
   int l;
   char *s;

   if (f != NULL) {

fseek(f, 0, SEEK_END); l = ftell(f); s = malloc(l+1); rewind(f); if (s) fread(s, 1, l, f); fclose(f);

   }
   return s;

}

void stripcomments(char *s) {

   char *a, *b;
   int len = strlen(s) + 1;

   while ((a = strstr(s, ca)) != NULL) {

b = strstr(a+al, cb); if (b == NULL) break; b += bl; memmove(a, b, len-(b-a));

}

int main(int argc, char **argv) {

   const char *fn = "input.txt";
   char *s;

   if (argc >= 2)

fn = argv[1];

   s = loadfile(fn);
   if (argc == 4) {

al = strlen(ca = argv[2]); bl = strlen(cb = argv[3]);

   }
   stripcomments(s);
   puts(s);
   free(s);
   return 0;

}</lang>

Usage

Specify an input file via the first command line argument, and optionally specify comment opening and closing delimiters with the next two args, or defaults of /* and */ are assumed.

Output


   function subroutine() {
    a =  b + c ;
   }



    function something() {
    }

C++

<lang cpp>#include <string>

include <iostream>
include <iterator>
include <fstream>
include <boost/regex.hpp>

int main( ) {

   std::ifstream codeFile( "samplecode.txt" ) ;
   if ( codeFile ) {
      boost::regex commentre( "/\\*.*?\\*/" ) ;//comment start and end, and as few characters in between as possible
      std::string my_erase( "" ) ;             //erase them
      std::string stripped ;
      std::string code( (std::istreambuf_iterator<char>( codeFile ) ) ,

std::istreambuf_iterator<char>( ) ) ;

      codeFile.close( ) ;
      stripped = boost::regex_replace( code , commentre , my_erase ) ;
      std::cout << "Code unstripped:\n" << stripped << std::endl ;
      return 0 ;
   }
   else {
      std::cout << "Could not find code file!" << std::endl ;
      return 1 ;
   }

}</lang> Output:

Code unstripped:

   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

C#

<lang Csharp>using System;

   class Program
   {
       private static string BlockCommentStrip(string commentStart, string commentEnd, string sampleText)
       {
           while (sampleText.IndexOf(commentStart) > -1 && sampleText.IndexOf(commentEnd, sampleText.IndexOf(commentStart) + commentStart.Length) > -1)
           {
               int start = sampleText.IndexOf(commentStart);
               int end = sampleText.IndexOf(commentEnd, start + commentStart.Length);
               sampleText = sampleText.Remove(
                   start,
                   (end + commentEnd.Length) - start
                   );
           }
           return sampleText;
       }
   }</lang>

Clojure

<lang Clojure>(defn comment-strip [txt & args]

 (let [args (conj {:delim ["/*" "*/"]} (apply hash-map args)) ; This is the standard way of doing keyword/optional arguments in Clojure

[opener closer] (:delim args)]

   (loop [out "", txt txt, delim-count 0] ; delim-count is needed to handle nested comments
     (let [[hdtxt resttxt] (split-at (count opener) txt)] ; This splits "/* blah blah */" into hdtxt="/*" and restxt="blah blah */"

(printf "hdtxt=%8s resttxt=%8s out=%8s txt=%16s delim-count=%s\n" (apply str hdtxt) (apply str resttxt) out (apply str txt) delim-count) (cond (empty? hdtxt) (str out (apply str txt)) (= (apply str hdtxt) opener) (recur out resttxt (inc delim-count)) (= (apply str hdtxt) closer) (recur out resttxt (dec delim-count)) (= delim-count 0)(recur (str out (first txt)) (rest txt) delim-count) true (recur out (rest txt) delim-count))))))</lang>

user> (comment-strip "This /* is */ some /* /* /* */ funny */ */ text")
hdtxt=      Th resttxt=is /* is */ some /* /* /* */ funny */ */ text out=         txt=This /* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      hi resttxt=s /* is */ some /* /* /* */ funny */ */ text out=       T txt=his /* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      is resttxt= /* is */ some /* /* /* */ funny */ */ text out=      Th txt=is /* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      s  resttxt=/* is */ some /* /* /* */ funny */ */ text out=     Thi txt=s /* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=       / resttxt=* is */ some /* /* /* */ funny */ */ text out=    This txt= /* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      /* resttxt= is */ some /* /* /* */ funny */ */ text out=   This  txt=/* is */ some /* /* /* */ funny */ */ text delim-count=0
hdtxt=       i resttxt=s */ some /* /* /* */ funny */ */ text out=   This  txt= is */ some /* /* /* */ funny */ */ text delim-count=1
hdtxt=      is resttxt= */ some /* /* /* */ funny */ */ text out=   This  txt=is */ some /* /* /* */ funny */ */ text delim-count=1
hdtxt=      s  resttxt=*/ some /* /* /* */ funny */ */ text out=   This  txt=s */ some /* /* /* */ funny */ */ text delim-count=1
hdtxt=       * resttxt=/ some /* /* /* */ funny */ */ text out=   This  txt= */ some /* /* /* */ funny */ */ text delim-count=1
hdtxt=      */ resttxt= some /* /* /* */ funny */ */ text out=   This  txt=*/ some /* /* /* */ funny */ */ text delim-count=1
hdtxt=       s resttxt=ome /* /* /* */ funny */ */ text out=   This  txt= some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      so resttxt=me /* /* /* */ funny */ */ text out=  This   txt=some /* /* /* */ funny */ */ text delim-count=0
hdtxt=      om resttxt=e /* /* /* */ funny */ */ text out= This  s txt=ome /* /* /* */ funny */ */ text delim-count=0
hdtxt=      me resttxt= /* /* /* */ funny */ */ text out=This  so txt=me /* /* /* */ funny */ */ text delim-count=0
hdtxt=      e  resttxt=/* /* /* */ funny */ */ text out=This  som txt=e /* /* /* */ funny */ */ text delim-count=0
hdtxt=       / resttxt=* /* /* */ funny */ */ text out=This  some txt= /* /* /* */ funny */ */ text delim-count=0
hdtxt=      /* resttxt= /* /* */ funny */ */ text out=This  some  txt=/* /* /* */ funny */ */ text delim-count=0
hdtxt=       / resttxt=* /* */ funny */ */ text out=This  some  txt= /* /* */ funny */ */ text delim-count=1
hdtxt=      /* resttxt= /* */ funny */ */ text out=This  some  txt=/* /* */ funny */ */ text delim-count=1
hdtxt=       / resttxt=* */ funny */ */ text out=This  some  txt= /* */ funny */ */ text delim-count=2
hdtxt=      /* resttxt= */ funny */ */ text out=This  some  txt=/* */ funny */ */ text delim-count=2
hdtxt=       * resttxt=/ funny */ */ text out=This  some  txt= */ funny */ */ text delim-count=3
hdtxt=      */ resttxt= funny */ */ text out=This  some  txt=*/ funny */ */ text delim-count=3
hdtxt=       f resttxt=unny */ */ text out=This  some  txt= funny */ */ text delim-count=2
hdtxt=      fu resttxt=nny */ */ text out=This  some  txt=funny */ */ text delim-count=2
hdtxt=      un resttxt=ny */ */ text out=This  some  txt= unny */ */ text delim-count=2
hdtxt=      nn resttxt=y */ */ text out=This  some  txt=  nny */ */ text delim-count=2
hdtxt=      ny resttxt= */ */ text out=This  some  txt=   ny */ */ text delim-count=2
hdtxt=      y  resttxt=*/ */ text out=This  some  txt=    y */ */ text delim-count=2
hdtxt=       * resttxt=/ */ text out=This  some  txt=      */ */ text delim-count=2
hdtxt=      */ resttxt= */ text out=This  some  txt=      */ */ text delim-count=2
hdtxt=       * resttxt=  / text out=This  some  txt=         */ text delim-count=1
hdtxt=      */ resttxt=    text out=This  some  txt=         */ text delim-count=1
hdtxt=       t resttxt=     ext out=This  some  txt=            text delim-count=0
hdtxt=      te resttxt=      xt out=This  some   txt=            text delim-count=0
hdtxt=      ex resttxt=       t out=This  some  t txt=             ext delim-count=0
hdtxt=      xt resttxt=         out=This  some  te txt=              xt delim-count=0
hdtxt=       t resttxt=         out=This  some  tex txt=               t delim-count=0
hdtxt=         resttxt=         out=This  some  text txt=                 delim-count=0
"This  some  text"

D

<lang d>import std.algorithm, std.regex;

string[2] separateComments(in string txt,

                          in string cpat0, in string cpat1) {
   int[2] plen; // to handle /*/
   int i, j; // cursors
   bool inside; // is inside comment?

   // pre-compute regex here if desired
   //auto r0 = regex(cpat0);
   //auto r1 = regex(cpat1);
   //enum rct = ctRegex!(r"\n|\r");

   bool advCursor() {
       auto mo = match(txt[i .. $], inside ? cpat1 : cpat0);
       if (mo.empty)
           return false;
       plen[inside] = max(0, plen[inside], mo.front[0].length);
       j = i + mo.pre.length; // got comment head
       if (inside)
           j += mo.front[0].length; // or comment tail

       // special adjust for \n\r
       if (!match(mo.front[0], r"\n|\r").empty)
           j--;
       return true;
   }

   string[2] result;
   while (true) {
       if (!advCursor())
           break;
       result[inside] ~= txt[i .. j]; // save slice of result

       // handle /*/ pattern
       if (inside && (j - i < plen[0] + plen[1])) {
           i = j;
           if (!advCursor())
               break;
           result[inside] ~= txt[i .. j]; // save result again
       }

       i = j; // advance cursor
       inside = !inside; // toggle search type
   }

   if (inside)
       throw new Exception("Mismatched Comment");
   result[inside] ~= txt[i .. $]; // save rest(non-comment)
   return result;

}

void main() {

   import std.stdio;

   static void showResults(in string e, in string[2] pair) {
       writeln("===Original text:\n", e);
       writeln("\n\n===Text without comments:\n", pair[0]);
       writeln("\n\n===The stripped comments:\n", pair[1]);
   }

   // First example ------------------------------
   immutable ex1 = `  /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }`;

   showResults(ex1, separateComments(ex1, `/\*`, `\*/`));

   // Second example ------------------------------
   writeln("\n");
   immutable ex2 = "apples, pears # and bananas

apples, pears; and bananas "; // test for line comment

   showResults(ex2, separateComments(ex2, `#|;`, `[\n\r]|$`));

}</lang>

Output:

===Original text:
  /**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
    function something() {
    }


===Text without comments:
  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }


===The stripped comments:
/**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   *//* inline comment *//*/ <-- tricky comments *//**
    * Another comment.
    */


===Original text:
apples, pears # and bananas
apples, pears; and bananas 


===Text without comments:
apples, pears 
apples, pears


===The stripped comments:
# and bananas; and bananas

F#

Using .NET's regex counter feature to match nested comments. If comments here are nested, they have to be correctly balanced. <lang fsharp>open System open System.Text.RegularExpressions

let balancedComments opening closing =

   new Regex(
       String.Format("""

{0} # An outer opening delimiter

   (?>                   # efficiency: no backtracking here
       {0} (?<LEVEL>)    # An opening delimiter, one level down
       | 
       {1} (?<-LEVEL>)   # A closing delimiter, one level up
       |
       (?! {0} | {1} ) . # With negative lookahead: Anything but delimiters
   )*                    # As many times as we see these
   (?(LEVEL)(?!))        # Fail, unless on level 0 here

{1} # Outer closing delimiter """, Regex.Escape(opening), Regex.Escape(closing)),

       RegexOptions.IgnorePatternWhitespace ||| RegexOptions.Singleline)

[<EntryPoint>] let main args =

   let sample = """
   /**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo 
   */
   function subroutine() {
   a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
   * Another comment.
   * /* nested balanced
   */ */
   function something() {
   }
   """
   let balancedC = balancedComments "/*" "*/"
   printfn "%s" (balancedC.Replace(sample, ""))
   0</lang>

Output


    function subroutine() {
    a =  b + c ;
    }



    function something() {
    }

Go

For the extra credit: No optional parameters in Go, but documented below is an efficient technique for letting the caller specify the delimiters. <lang go>package main

import (

   "fmt"
   "strings"

)

// idiomatic to name a function newX that allocates an object, initializes it, // and returns it ready to use. the object in this case is a closure. func newStripper(start, end string) func(string) string {

   // default to c-style block comments
   if start == "" || end == "" {
       start, end = "/*", "*/"
   }
   // closes on variables start, end.
   return func(source string) string {
       for {
           cs := strings.Index(source, start)
           if cs < 0 {
               break
           }
           ce := strings.Index(source[cs+2:], end)
           if ce < 0 {
               break
           }
           source = source[:cs] + source[cs+ce+4:]
       }
       return source
   }

}

func main() {

   // idiomatic is that zero values indicate to use meaningful defaults
   stripC := newStripper("", "")

   // strip function now defined and can be called any number of times
   // without respecifying delimiters
   fmt.Println(stripC(`  /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }`))

}</lang>

Groovy

<lang groovy>def code = """

 /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }

"""

println ((code =~ "(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)").replaceAll())</lang>

Haskell

THE FOLLOWING SOLUTION IS WRONG, as it does not take string literals into account. For example: <lang Haskell>test = "This {- is not the beginning of a block comment" -- Do your homework properly -}</lang> Comment delimiters can be changed by calling stripComments with different start and end parameters. <lang Haskell>import Data.List

stripComments :: String -> String -> String -> String stripComments start end = notComment

   where notComment :: String -> String
         notComment "" = ""
         notComment xs
           | start `isPrefixOf` xs = inComment $ drop (length start) xs
           | otherwise             = head xs:(notComment $ tail xs)
         inComment :: String -> String
         inComment "" = ""
         inComment xs
           | end `isPrefixOf` xs = notComment $ drop (length end) xs
           | otherwise           = inComment $ tail xs

main = interact (stripComments "/*" "*/")</lang> Output:

   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

Icon and Unicon

If one is willing to concede that the program file will fit in memory, then the following code works: <lang Icon>procedure main()

  every (unstripped := "") ||:= !&input || "\n"   # Load file as one string
  write(stripBlockComment(unstripped,"/*","*/"))

end

procedure stripBlockComment(s1,s2,s3) #: strip comments between s2-s3 from s1

  result := ""
  s1 ? {
     while result ||:= tab(find(s2)) do {
        move(*s2)
        tab(find(s3)|0)   # or end of string 
        move(*s3)
        }
     return result || tab(0)
     }

end</lang> Otherwise, the following handles an arbitrary length input: <lang Icon>procedure main()

  every writes(stripBlockComment(!&input,"/*","*/"))

end

procedure stripBlockComment(s,s2,s3)

   static inC          # non-null when inside comment
   (s||"\n") ?  while not pos(0) do {
           if /inC then 
               if inC := 1(tab(find(s2))\1, move(*s2)) then suspend inC
               else return tab(0)
           else if (tab(find(s3))\1,move(*s3)) then inC := &null
           else fail
           }

end</lang>

J

<lang j>strip=:#~1 0 _1*./@:(|."0 1)2>4{"1(5;(0,"0~".;._2]0 :0);'/*'i.a.)&;:

)</lang> Example data: <lang j>example=: 0 :0

 /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }

)</lang> Example use: <lang j> strip example

  function subroutine() {
   a =  b + c ;
  }

   function something() {
   }</lang>

Here is a version which allows the delimiters to be passed as an optional left argument as a pair of strings: <lang j>stripp=:3 :0

 ('/*';'*/') stripp y

 'open close'=. x
 marks=. (+./(-i._1+#open,close)|."0 1 open E. y) - close E.&.|. y
 y #~  -. (+._1&|.) (1 <. 0 >. +)/\.&.|. marks

)</lang>

Java

<lang java>import java.io.*;

public class StripBlockComments{

   public static String readFile(String filename) {

BufferedReader reader = new BufferedReader(new FileReader(filename)); try { StringBuilder fileContents = new StringBuilder(); char[] buffer = new char[4096]; while (reader.read(buffer, 0, 4096) > 0) { fileContents.append(buffer); } return fileContents.toString(); } finally { reader.close(); }

   public static String stripComments(String beginToken, String endToken,

String input) { StringBuilder output = new StringBuilder(); while (true) { int begin = input.indexOf(beginToken); int end = input.indexOf(endToken, begin+beginToken.length()); if (begin == -1 || end == -1) { output.append(input); return output.toString(); } output.append(input.substring(0, begin)); input = input.substring(end + endToken.length()); }

   public static void main(String[] args) {

if (args.length < 3) { System.out.println("Usage: BeginToken EndToken FileToProcess"); System.exit(1); }

String begin = args[0]; String end = args[1]; String input = args[2];

try { System.out.println(stripComments(begin, end, readFile(input))); } catch (Exception e) { e.printStackTrace(); System.exit(1); }

}</lang>

jq

Note: A version of jq with gsub/3 is required to compile the function defined in this section.

The filter strip_block_comments/2 as defined here does not attempt to recognize comments-within-comments. <lang jq>def strip_block_comments(open; close):

 def deregex:
   reduce ("\\\\", "\\*", "\\^", "\\?", "\\+", "\\.", 
           "\\!", "\\{", "\\}", "\\[", "\\]", "\\$", "\\|" ) as $c
     (.; gsub($c; $c));
 # "?" => reluctant, "m" => multiline
 gsub( (open|deregex) + ".*?" + (close|deregex); ""; "m") ;

strip_block_comments("/*"; "*/")</lang> Invocation:

$ jq -s -R -r -f Strip_block_comments.jq sample_text_for_stripping.txt

Liberty BASIC

<lang lb>global CRLF$ CRLF$ =chr$( 13) +chr$( 10)

sample$ =" /**"+CRLF$+_ " * Some comments"+CRLF$+_ " * longer comments here that we can parse."+CRLF$+_ " *"+CRLF$+_ " * Rahoo "+CRLF$+_ " */"+CRLF$+_ " function subroutine() {"+CRLF$+_ " a = /* inline comment */ b + c ;"+CRLF$+_ " }"+CRLF$+_ " /*/ <-- tricky comments */"+CRLF$+_ ""+CRLF$+_ " /**"+CRLF$+_ " * Another comment."+CRLF$+_ " */"+CRLF$+_ " function something() {"+CRLF$+_ " }"+CRLF$

startDelim$ ="/*" finishDelim$ ="*/"

print "________________________________" print sample$ print "________________________________" print blockStripped$( sample$, startDelim$, finishDelim$) print "________________________________"

end

function blockStripped$( in$, s$, f$)

   for i =1 to len( in$) -len( s$)
       if mid$( in$, i, len( s$)) =s$ then
           i =i +len( s$)
           do
               if mid$( in$, i, 2) =CRLF$ then blockStripped$ =blockStripped$ +CRLF$
               i =i +1
           loop until ( mid$( in$, i, len( f$)) =f$) or ( i =len( in$) -len( f$))
           i =i +len( f$) -1
       else
           blockStripped$ =blockStripped$ +mid$( in$, i, 1)
       end if
   next i

end function</lang>






function subroutine() {
a = b + c ;
}





function something() {
}

Lua

It is assumed, that the code is in the file "Text1.txt". <lang lua>filename = "Text1.txt"

fp = io.open( filename, "r" ) str = fp:read( "*all" ) fp:close()

stripped = string.gsub( str, "/%*.-%*/", "" ) print( stripped )</lang>

Mathematica

<lang Mathematica>StringReplace[a,"/*"~~Shortest[___]~~"*/" -> ""]

->

  function subroutine() {
   a =  b + c ;
  }

   function something() {
   }</lang>

MATLAB / Octave

<lang Matlab>function str = stripblockcomment(str,startmarker,endmarker)

  while(1) 
     ix1 = strfind(str, startmarker);
     if isempty(ix1) return; end;
     ix2 = strfind(str(ix1+length(startmarker):end),endmarker);
     if isempty(ix2) 
        str = str(1:ix1(1)-1);
        return;
     else
        str = [str(1:ix1(1)-1),str(ix1(1)+ix2(1)+length(endmarker)+1:end)];
     end; 
  end;

end;</lang> Output:

>>t = '  /**\n   * Some comments\n   * longer comments here that we can parse.\n   *\n   * Rahoo \n   */\n   function subroutine() {\n    a = /* inline comment */ b + c ;\n   }\n   /*/ <-- tricky comments */\n\n   /**\n    * Another comment.\n    */\n    function something() {\n    }\n'
>>printf(t);
>>printf('=============\n');
>>printf(stripblockcomment(t));
  /**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo 
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
    function something() {
    }
===============
  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

Nim

Translation of: Python

<lang nim>import strutils

proc commentStripper(txt; delim: tuple[l,r: string] = ("/*", "*/")): string =

 let i = txt.find(delim.l)
 if i < 0:
   return txt

 result = if i > 0: txt[0 .. <i] else: ""
 let tmp = commentStripper(txt[i+delim.l.len .. txt.high])
 let j = tmp.find(delim.r)
 assert j >= 0
 result &= tmp[j+delim.r.len .. tmp.high]

echo "NON-NESTED BLOCK COMMENT EXAMPLE:" echo commentStripper("""/**

  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }""")

echo "\nNESTED BLOCK COMMENT EXAMPLE:" echo commentStripper(""" /**

  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  *//*
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */
  */
  /**
   * Another comment.
   */
   function something() {
   }""")</lang>

Output:

NON-NESTED BLOCK COMMENT EXAMPLE:

   function subroutine() {
    a =  b + c ;
   }
   
 
   
    function something() {
    }

NESTED BLOCK COMMENT EXAMPLE:
  
   
    function something() {
    }

Perl

<lang Perl>#!/usr/bin/perl -w use strict ; use warnings ;

open( FH , "<" , "samplecode.txt" ) or die "Can't open file!$!\n" ; my $code = "" ; {

  local $/ ;
  $code = <FH> ; #slurp mode

} close FH ; $code =~ s,/\*.*?\*/,,sg ; print $code . "\n" ;</lang> Output:

function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

Perl 6

<lang perl6>sample().split(/ '/*' .+? '*/' /).print;

sub sample { ' /**

   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
   function something() {
   }

'}</lang> Output:

   
    function subroutine() {
     a =  b + c ;
    }
    

    
    function something() {
    }

PicoLisp

<lang PicoLisp>(in "sample.txt"

  (while (echo "/*")
     (out "/dev/null" (echo "*/")) ) )</lang>

Output:


   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

PL/I

<lang PL/I>/* A program to remove comments from text. */ strip: procedure options (main); /* 8/1/2011 */

  declare text character (80) varying;
  declare (j, k) fixed binary;

  on endfile (sysin) stop;

  do forever;
     get edit (text) (L);
     do until (k = 0);
        k = index(text, '/*');
        if k > 0 then /* we have a start of comment. */
           do;
              /* Look for end of comment. */
              j = index(text, '*/', k+2);
              if j > 0 then
                 do;
                    text = substr(text, 1, k-1) ||
                           substr(text, j+2, length(text)-(j+2)+1);
                 end;
              else
                 do; /* The comment continues onto the next line. */
                    put skip list ( substr(text, 1, k-1) );

more: get edit (text) (L);

                    j = index(text, '*/');
                    if j = 0 then do; put skip; go to more; end;
                    text = substr(text, j+2, length(text) - (j+2) + 1);
                 end;
           end;
     end;
     put skip list (text);
  end;

end strip;</lang>

PureBasic

Solution using regular expressions. A procedure to stripBlocks() procedure is defined that will strip comments between any two delimeters. <lang PureBasic>Procedure.s escapeChars(text.s)

 Static specialChars.s = "[\^$.|?*+()"
 Protected output.s, nextChar.s, i, countChar = Len(text)
 For i = 1 To countChar
   nextChar = Mid(text, i, 1)
   If FindString(specialChars, nextChar, 1)
     output + "\" + nextChar
   Else
     output + nextChar
   EndIf 
 Next
 ProcedureReturn output

EndProcedure

Procedure.s stripBlocks(text.s, first.s, last.s)

 Protected delimter_1.s = escapeChars(first), delimter_2.s = escapeChars(last)
 Protected expNum = CreateRegularExpression(#PB_Any, delimter_1 + ".*?" + delimter_2, #PB_RegularExpression_DotAll)
 Protected output.s = ReplaceRegularExpression(expNum, text, "")
 FreeRegularExpression(expNum)
 ProcedureReturn output

EndProcedure

Define source.s source.s = " /**" + #CRLF$ source.s + " * Some comments" + #CRLF$ source.s + " * longer comments here that we can parse." + #CRLF$ source.s + " *" + #CRLF$ source.s + " * Rahoo " + #CRLF$ source.s + " */" + #CRLF$ source.s + " function subroutine() {" + #CRLF$ source.s + " a = /* inline comment */ b + c ;" + #CRLF$ source.s + " }" + #CRLF$ source.s + " /*/ <-- tricky comments */" + #CRLF$ source.s + "" + #CRLF$ source.s + " /**" + #CRLF$ source.s + " * Another comment." + #CRLF$ source.s + " */" + #CRLF$ source.s + " function something() {" + #CRLF$ source.s + " }" + #CRLF$

If OpenConsole()

 PrintN("--- source ---")
 PrintN(source)
 PrintN("--- source with block comments between '/*' and '*/' removed ---")
 PrintN(stripBlocks(source, "/*", "*/"))
 PrintN("--- source with block comments between '*' and '*' removed ---")
 PrintN(stripBlocks(source, "*", "*"))
  
 Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input()
 CloseConsole()

EndIf</lang> Sample output:

--- source ---
  /**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
    function something() {
    }

--- source with block comments between '/*' and '*/' removed ---

   function subroutine() {
    a =  b + c ;
   }



    function something() {
    }

--- source with block comments between '*' and '*' removed ---
  /
    longer comments here that we can parse.
    Rahoo
    inline comment / <-- tricky comments  Another comment.
    */
    function something() {
    }

Python

The code has comment delimeters as an argument and will also strip nested block comments. <lang python>def _commentstripper(txt, delim):

   'Strips first nest of block comments'
   
   deliml, delimr = delim
   out = 
   if deliml in txt:
       indx = txt.index(deliml)
       out += txt[:indx]
       txt = txt[indx+len(deliml):]
       txt = _commentstripper(txt, delim)
       assert delimr in txt, 'Cannot find closing comment delimiter in ' + txt
       indx = txt.index(delimr)
       out += txt[(indx+len(delimr)):]
   else:
       out = txt
   return out

def commentstripper(txt, delim=('/*', '*/')):

   'Strips nests of block comments'
   
   deliml, delimr = delim
   while deliml in txt:
       txt = _commentstripper(txt, delim)
   return txt</lang>

Tests and sample output

<lang python>def test():

   print('\nNON-NESTED BLOCK COMMENT EXAMPLE:')
   sample =   /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }
   print(commentstripper(sample))

   print('\nNESTED BLOCK COMMENT EXAMPLE:')
   sample =   /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  *//*
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */
  */
  /**
   * Another comment.
   */
   function something() {
   }
   print(commentstripper(sample))

if __name__ == '__main__':

   test()</lang>

NON-NESTED BLOCK COMMENT EXAMPLE:
  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

NESTED BLOCK COMMENT EXAMPLE:
  
   
    function something() {
    }

Racket

lang at-exp racket

default delimiters (strings -- not regexps)

(define comment-start-str "/*") (define comment-end-str "*/")

(define (strip-comments text [rx1 comment-start-str] [rx2 comment-end-str])

 (regexp-replace* (~a (regexp-quote rx1) ".*?" (regexp-quote rx2))
                  text ""))

((compose1 displayln strip-comments)

@~a{/**
     * Some comments
     * longer comments here that we can parse.
     *
     * Rahoo
     */
     function subroutine() {
      a = /* inline comment */ b + c ;
     }
     /*/ <-- tricky comments */

     /**
      * Another comment.
      */
      function something() {
      }
   })

</lang>

(Outputs the expected text...)

REXX

<lang rexx>/* REXX ***************************************************************

Split comments
This program ignores comment delimiters within literal strings
such as, e.g., in b = "--' O'Connor's widow --";
it does not (yet) take care of -- comments (ignore rest of line)
also it does not take care of say 667/*yuppers*/77 (REXX specialty)
courtesy GS discussion!
12.07.2013 Walter Pachl
- - - - /

fid='in.txt' /* input text */ oic='oc.txt'; 'erase' oic /* will contain comments */ oip='op.txt'; 'erase' oip /* will contain program parts */ oim='om.txt'; 'erase' oim /* oc.txt merged with op.txt */ cmt=0 /* comment nesting */ str= /* ' or " when in a string */ Do ri=1 By 1 While lines(fid)>0 /* loop over input */

 l=linein(fid)                        /* an input line              */
 oc=                                /* initialize line for oc.txt */
 op=                                /* initialize line for op.txt */
 i=1                                  /* start at first character   */
 Do While i<=length(l)                /* loop through input line    */
   If cmt=0 Then Do                   /* we are not in a comment    */
     If str<> Then Do               /* we are in a string         */
       If substr(l,i,1)=str Then Do   /* string character           */
         If substr(l,i+1,1)=str Then Do /* another one              */
           Call app 'P',substr(l,i,2) /* add  or "" to op         */
           i=i+2                      /* increase input pointer     */
           Iterate                    /* proceed in input line      */
           End
         Else Do                      /* end of literal string      */
           Call app 'P',substr(l,i,1) /* add ' or " to op           */
           str=' '                    /* no longer in string        */
           i=i+1                      /* increase input pointer     */
           Iterate                    /* proceed in input line      */
           End
         End
       End
     End
   Select
     When str= &,                   /* not in a string            */
          substr(l,i,2)='/*' Then Do  /* start of comment           */
       cmt=cmt+1                      /* increase commenr nesting   */
       Call app 'C','/*'              /* copy to oc                 */
       i=i+2                          /* increase input pointer     */
       End
     When cmt=0 Then Do               /* not in a comment           */
       If str=' ' Then Do             /* not in a string            */
         If pos(substr(l,i,1),"')>0 Then /* string delimiter     */
           str=substr(l,i,1)          /* remember that              */
         End
       Call app 'P',substr(l,i,1)     /* copy to op                 */
       i=i+1                          /* increase input pointer     */
       End
     When substr(l,i,2)='*/' Then Do  /* end of comment             */
       cmt=cmt-1                      /* decrement nesting depth    */
       Call app 'C','*/'              /* copy to oc                 */
       i=i+2                          /* increase input pointer     */
       End
     Otherwise Do                     /* any other character        */
       Call app 'C',substr(l,i,1)     /* copy to oc                 */
       i=i+1                          /* increase input pointer     */
       End
     End
   End
 Call oc                              /* Write line oc              */
 Call op                              /* Write line op              */
 End

Call lineout oic /* Close File oic */ Call lineout oip /* Close File oip */

Do ri=1 To ri-1 /* merge program with comments*/

 op=linein(oip)
 oc=linein(oic)
 Do i=1 To length(oc)
   If substr(oc,i,1)<> Then
     op=overlay(substr(oc,i,1),op,i,1)
   End
 Call lineout oim,op
 End

Call lineout oic Call lineout oip Call lineout oim Exit

app: Parse Arg which,string /* add str to oc or op */ /* and corresponding blanks to the other (op or oc) */ If which='C' Then Do

 oc=oc||string
 op=op||copies(' ',length(string))
 End

Else Do

 op=op||string
 oc=oc||copies(' ',length(string))
 End

Return

oc: Return lineout(oic,oc) op: Return lineout(oip,op)</lang> Input:

/**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   */
   function subroutine() {
    a = /* inline comment */ b + c ;
    b = "*/' O'Connor's widow /*";
   }
   /*/ <-- tricky comments */

   /**
    * Another comment.
    */
    function something() {
    }

Program:







   function subroutine() {
    a =                      b + c ;
    b = "*/' O'Connor's widow /*";
   }





    function something() {
    }

Comments:

/**
   * Some comments
   * longer comments here that we can parse.
   *
   * Rahoo
   */

        /* inline comment */


   /*/ <-- tricky comments */

   /**
    * Another comment.
    */

Ruby

<lang ruby>def remove_comments!(str, comment_start='/*', comment_end='*/')

 while start_idx = str.index(comment_start) 
   end_idx = str.index(comment_end, start_idx + comment_start.length) + comment_end.length - 1
   str[start_idx .. end_idx] = "" 
 end
 str

end

def remove_comments(str, comment_start='/*', comment_end='*/')

 remove_comments!(str.dup, comment_start, comment_end)

end

example = <<END_OF_STRING

 /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }

END_OF_STRING

puts remove_comments example</lang> outputs

  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

Scala

<lang Scala>import java.util.regex.Pattern.quote def strip1(x: String, s: String = "/*", e: String = "*/") =

 x.replaceAll("(?s)"+quote(s)+".*?"+quote(e), "")</lang>

<lang Scala>def strip2(x: String, s: String = "/*", e: String = "*/"): String = {

 val a = x indexOf s
 val b = x indexOf (e, a + s.length)
 if (a == -1 || b == -1) x
 else strip2(x.take(a) + x.drop(b + e.length), s, e)

}</lang> <lang Scala>def strip3(x: String, s: String = "/*", e: String = "*/"): String = x.indexOf(s) match {

 case -1 => x
 case i => x.indexOf(e, i + s.length) match {
   case -1 => x
   case j => strip2(x.take(i) + x.drop(j + e.length), s, e)
 }

}</lang>

Seed7

The function replace2 can be used to replace unnested comments.

<lang seed7>$ include "seed7_05.s7i";

const proc: main is func

 local
   const string: stri is "\
       \  /**\n\
       \   * Some comments\n\
       \   * longer comments here that we can parse.\n\
       \   *\n\
       \   * Rahoo\n\
       \   */\n\
       \   function subroutine() {\n\
       \    a = /* inline comment */ b + c ;\n\
       \   }\n\
       \   /*/ <-- tricky comments */\n\
       \\n\
       \   /**\n\
       \    * Another comment.\n\
       \    */\n\
       \    function something() {\n\
       \    }";
 begin
   writeln(replace2(stri, "/*", "*/", " "));
 end func;</lang>

Output:

   
   function subroutine() {
    a =   b + c ;
   }
    

    
    function something() {
    }

Sidef

For extra credit, it allows the caller to redefine the delimiters. <lang ruby>func strip_block_comments(code, beg='/*', end='*/') {

   var re = Regex.new(beg.escape + '.*?' + end.escape, 's');
   code.gsub(re, );

}

say strip_block_comments(ARGF.slurp);</lang>

Tcl

<lang tcl>proc stripBlockComment {string {openDelimiter "/*"} {closeDelimiter "*/"}} {

   # Convert the delimiters to REs by backslashing all non-alnum characters
   set openAsRE [regsub -all {\W} $openDelimiter {\\&}]
   set closeAsRE [regsub -all {\W} $closeDelimiter {\\&}]

   # Now remove the blocks using a dynamic non-greedy regular expression
   regsub -all "$openAsRE.*?$closeAsRE" $string ""

}</lang> Demonstration code: <lang tcl>puts [stripBlockComment " /**

  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo 
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments */

  /**
   * Another comment.
   */
   function something() {
   }

"]</lang> Output:

  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }

TUSCRIPT

<lang tuscript> $$ MODE DATA $$ script=*

 /**
  * Some comments
  * longer comments here that we can parse.
  *
  * Rahoo
  */
  function subroutine() {
   a = /* inline comment */ b + c ;
  }
  /*/ <-- tricky comments  */

  /**
   * Another comment.
   */
   function something() {
   }

$$ MODE TUSCRIPT ERROR/STOP CREATE ("testfile",SEQ-E,-std-) ERROR/STOP CREATE ("destfile",SEQ-E,-std-) FILE "testfile" = script BUILD S_TABLE commentbeg=":/*:" BUILD S_TABLE commentend=":*/:"

ACCESS t: READ/STREAM "testfile" s.z/u,a/commentbeg+t+e/commentend,typ ACCESS d: WRITE/STREAM "destfile" s.z/u,a+t+e LOOP READ/EXIT t IF (typ==3) CYCLE t=SQUEEZE(t) WRITE/ADJUST d ENDLOOP ENDACCESS/PRINT t ENDACCESS/PRINT d d=FILE("destfile") TRACE *d </lang> Output:

TRACE *    38    -*TUSTEP.EDT
d            = *
           1 =
           2 = function subroutine() { a =
           3 = b + c ; }
           4 =
           5 = function something() { }

zkl

<lang zkl>fcn stripper(text,a="/*",b="*/"){

  while(xy:=text.span(a,b,True)){ x,y:=xy; text=text[0,x]+text[x+y,*]} 
  text

}</lang> The span method takes two tokens and matches the shortest or longest balanced match (if True). It assumes there are no escape characters (such as \ or ""). So we just repeatedly strip out the longest balanced comments until there aren't any left (span returns the empty list). If a comment was unbalanced, span would fail but this code doesn't check that and just assumes no more matches.

Output:

The input is in a file because I'm too lazy to type it in:

stripper(File("text.txt").read().text);
  
   function subroutine() {
    a =  b + c ;
   }
   

   
    function something() {
    }