Rep-string: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎version 2: added a comment in the REXX section header. -- ~~~~)
(Report longest, shortest or all reps.)
Line 7: Line 7:


The task is to:
The task is to:
* Write a function/subroutine/method/... that takes a string and returns an indication of if it is a rep-string and the repeated string. (Either the string that is repeated, or the number of repeated characters would suffice). If there are multiple solutions, return the longest.
* Write a function/subroutine/method/... that takes a string and returns an indication of if it is a rep-string and the repeated string. (Either the string that is repeated, or the number of repeated characters would suffice).
* There may be multiple sub-strings that make a string a rep-string - in that case an indication of all, the longest, or the shortest would suffice.
* Use the function to indicate the repeating substring if any, in the following:
* Use the function to indicate the repeating substring if any, in the following:
<dl><dd><pre>'1001110011'
<dl><dd><pre>'1001110011'

Revision as of 04:07, 14 May 2013

Rep-string is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Given a series of ones and zeroes in a string, define a repeated string or rep-string as a string which is created by repeating a substring of the first N characters of the string truncated on the right to the length of the input string, and in which the substring appears repeated at least twice in the original.

For example, the string '10011001100' is a rep-string as the leftmost four characters of '1001' are repeated three times and truncated on the right to give the original string.

Note that the requirement for having the repeat occur two or more times means that the repeating unit is never longer than half the length of the input string.

The task is to:

  • Write a function/subroutine/method/... that takes a string and returns an indication of if it is a rep-string and the repeated string. (Either the string that is repeated, or the number of repeated characters would suffice).
  • There may be multiple sub-strings that make a string a rep-string - in that case an indication of all, the longest, or the shortest would suffice.
  • Use the function to indicate the repeating substring if any, in the following:
'1001110011'
'1110111011'
'0010010010'
'1010101010'
'1111111111'
'0100101101'
'0100100'
'101'
'11'
'00'
'1'
  • Show your output on this page.

J

Here's a test:

<lang j>isRepStr=: +./@((] -: $@] $ $)"0 1~1+[:i.<.@-:@#)</lang>

Example use:

<lang j> isRepStr'1001110011' 1

  isRepStr'1110111011'

1

  isRepStr'0010010010'

1

  isRepStr'1010101010'

1

  isRepStr'1111111111'

1

  isRepStr'0100101101'

0

  isRepStr'0100100'

1

  isRepStr'101'

0

  isRepStr'11'

1

  isRepStr'00'

1

  isRepStr'1'

0</lang>

We could also report the lengths of the repeated prefix, though this seems more arbitrary:

<lang j>nRepStr=: 0 -.~ (([ * ] -: $@] $ $)"0 1~1+[:i.<.@-:@#)</lang>

With the above examples:

<lang j> nRepStr'1001110011' 5

  nRepStr'1110111011'

4

  nRepStr'0010010010'

3

  nRepStr'1010101010'

2 4

  nRepStr'1111111111'

1 2 3 4 5

  nRepStr'0100101101'
  nRepStr'0100100'

3

  nRepStr'101'
  nRepStr'11'

1

  nRepStr'00'

1

  nRepStr'1'

</lang>

Here, the "non-str-rep" cases are indicated by an empty list of prefix lengths.

NetRexx

Translation of: REXX

<lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols nobinary

/* REXX ***************************************************************

  • 11.05.2013 Walter Pachl
                                                                                                                                            • /

runSample(arg) return

/**

* Test for rep-strings
* @param s_str a string to check for rep-strings
* @return Rexx string: boolean indication of reps, length, repeated value
*/

method repstring(s_str) public static

 s_str_n = s_str.length()
 rep_str = 
 Loop lx = s_str.length() % 2 to 1 By -1
   If s_str.substr(lx + 1, lx) = s_str.left(lx) Then Leave lx
   End lx
 If lx > 0 Then Do label reps
   rep_str = s_str.left(lx)
   Loop ix = 1 By 1
     If s_str.substr(ix * lx + 1, lx) <> rep_str Then
       Leave ix
     End ix
   If rep_str.copies(s_str_n).left(s_str.length()) <> s_str Then
     rep_str = 
   End reps
 Return (rep_str.length() > 0) rep_str.length() rep_str

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method runSample(arg) public static

 parse arg samples
 if samples =  then -
   samples = -
     '1001110011' -
     '1110111011' -
     '0010010010' -
     '1010101010' -
     '1111111111' -
     '0100101101' -
     '0100100' -
     '101' -
     '11' -
     '00' -
     '1'
 loop w_ = 1 to samples.words()
   in_str = samples.word(w_)
   parse repstring(in_str) is_rep_str rep_str_len rep_str
   sq = 'in_str'
   tstrlen = sq.length().max(20)
   sq=sq.right(tstrlen)
   if is_rep_str then
     Say sq 'has a repetition length of' rep_str_len "i.e. '"rep_str"'"
   else
     Say sq 'is not a repeated string'
   end w_
 return

</lang>

Output:
        '1001110011' has a repetition length of 5 i.e. '10011'
        '1110111011' has a repetition length of 4 i.e. '1110'
        '0010010010' has a repetition length of 3 i.e. '001'
        '1010101010' has a repetition length of 4 i.e. '1010'
        '1111111111' has a repetition length of 5 i.e. '11111'
        '0100101101' is not a repeated string
           '0100100' has a repetition length of 3 i.e. '010'
               '101' is not a repeated string
                '11' has a repetition length of 1 i.e. '1'
                '00' has a repetition length of 1 i.e. '0'
                 '1' is not a repeated string

Perl

<lang perl>foreach (qw(1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1)) {

   print "$_\n";
   if (/^(.+)\1+(.*$)(?(?{ substr($1, 0, length $2) eq $2 })|(?!))/) {
       print ' ' x length $1, "$1\n\n";
   } else {
       print " (no repeat)\n\n";
   }

}</lang>

Output:
1001110011
     10011

1110111011
    1110

0010010010
   001

1010101010
    1010

1111111111
     11111

0100101101
 (no repeat)

0100100
   010

101
 (no repeat)

11
 1

00
 0

1
 (no repeat)

Perl 6

<lang perl6>for <1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1> {

   if /^ (.+) $0+: (.*$) <?{ $0.substr(0,$1.chars) eq $1 }> / {

my $rep = $0.chars; say .substr(0,$rep), .substr($rep,$rep).trans('01' => '𝟘𝟙'), .substr($rep*2);

   }
   else {

say "$_ (no repeat)";

   }

}</lang>

Output:
10011𝟙𝟘𝟘𝟙𝟙
1110𝟙𝟙𝟙𝟘11
001𝟘𝟘𝟙0010
1010𝟙𝟘𝟙𝟘10
11111𝟙𝟙𝟙𝟙𝟙
0100101101 (no repeat)
010𝟘𝟙𝟘0
101 (no repeat)
1𝟙
0𝟘
1 (no repeat)

Here's a technique that relies on the fact that XORing the shifted binary number should set all the lower bits to 0 if there are repeats. (The cool thing is that shift will automatically throw away the bits on the right that you want thrown away.) This produces the same output as above. <lang perl6>sub repstr(Str $s) {

   my $bits = :2($s);
   for reverse 1 .. $s.chars div 2 -> $left {

my $right = $s.chars - $left; return $left if $bits +^ ($bits +> $left) == $bits +> $right +< $right;

   }

}


for '1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1'.words {

   if repstr $_ -> $rep {

say .substr(0,$rep), .substr($rep,$rep).trans('01' => '𝟘𝟙'), .substr($rep*2);

   }
   else {

say "$_ (no repeat)";

   }

}</lang>

Python

Python: Procedural

<lang python>def is_repeated(text):

   'check if the first part of the string is repeated throughout the string'
   for x in range(len(text)//2, 0, -1):
       if text.startswith(text[x:]): return x
   return 0

matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1 """ for line in matchstr.split():

   ln = is_repeated(line)
   print('%r has a repetition length of %i i.e. %s' 
          % (line, ln, repr(line[:ln]) if ln else '*not* a rep-string'))</lang>
Output:
'1001110011' has a repetition length of 5 i.e. '10011'
'1110111011' has a repetition length of 4 i.e. '1110'
'0010010010' has a repetition length of 3 i.e. '001'
'1010101010' has a repetition length of 4 i.e. '1010'
'1111111111' has a repetition length of 5 i.e. '11111'
'0100101101' has a repetition length of 0 i.e. *not* a rep-string
'0100100' has a repetition length of 3 i.e. '010'
'101' has a repetition length of 0 i.e. *not* a rep-string
'11' has a repetition length of 1 i.e. '1'
'00' has a repetition length of 1 i.e. '0'
'1' has a repetition length of 0 i.e. *not* a rep-string

Python: Functional

This returns all the possible repeated substrings <lang python>>>> def reps(text):

   return [text[:x] for x in range(1, 1 + len(text) // 2)
           if text.startswith(text[x:])]

>>> matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1 """ >>> print('\n'.join('%r has reps %r' % (line, reps(line)) for line in matchstr.split())) '1001110011' has reps ['10011'] '1110111011' has reps ['1110'] '0010010010' has reps ['001'] '1010101010' has reps ['10', '1010'] '1111111111' has reps ['1', '11', '111', '1111', '11111'] '0100101101' has reps [] '0100100' has reps ['010'] '101' has reps [] '11' has reps ['1'] '00' has reps ['0'] '1' has reps [] >>> </lang>

Python: Regexp

This version, inspired by the Perl 6 entry uses the regexp substitute where what the match is substituted with is returned by a function. <lang python>import re

matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1"""

def _checker(matchobj):

   g0, (g1, g2, g3, g4) = matchobj.group(0), matchobj.groups()
   if not g4 and g1 and g1.startswith(g3):
       return '%r repeats %r' % (g0, g1)
   return '%r is not a rep-string' % (g0,)

def checkit(txt):

   print(re.sub(r'(.+)(\1+)(.*)|(.*)', _checker, txt))

checkit(matchstr)</lang>

Output:
'1001110011' repeats '10011'
'1110111011' repeats '1110'
'0010010010' repeats '001'
'1010101010' repeats '1010'
'1111111111' repeats '11111'
'0100101101' is not a rep-string
'0100100' repeats '010'
'101' is not a rep-string
'11' repeats '1'
'00' repeats '0'
'1' is not a rep-string

REXX

version 1

<lang rexx>/* REXX ***************************************************************

  • 11.05.2013 Walter Pachl
                                                                                                                                            • /

Call repstring '1001110011' Call repstring '1110111011' Call repstring '0010010010' Call repstring '1010101010' Call repstring '1111111111' Call repstring '0100101101' Call repstring '0100100' Call repstring '101' Call repstring '11' Call repstring '00' Call repstring '1' Exit

repstring: Parse Arg s sq='s' n=length(s) Do l=length(s)%2 to 1 By -1

 If substr(s,l+1,l)=left(s,l) Then Leave
 End

If l>0 Then Do

 rep_str=left(s,l)
 Do i=1 By 1
   If substr(s,i*l+1,l)<>rep_str Then
     Leave
   End
 If left(copies(rep_str,n),length(s))=s Then
   Say sq 'has a repetition length of' length(rep_str),
                                                'i.e.' 'rep_str'
 Else
   Say sq 'is not a repeated string'
 End

Else

 Say sq 'is not a repeated string'

Return</lang> Output:

'1001110011' has a repetition length of 5 i.e. '10011'
'1110111011' has a repetition length of 4 i.e. '1110'
'0010010010' has a repetition length of 3 i.e. '001'
'1010101010' has a repetition length of 4 i.e. '1010'
'1111111111' has a repetition length of 5 i.e. '11111'
'0100101101' is not a repeated string
'0100100' has a repetition length of 3 i.e. '010'
'101' is not a repeated string
'11' has a repetition length of 1 i.e. '1'
'00' has a repetition length of 1 i.e. '0'  
'1' is not a repeated string

version 2

No checking is done to validate if the strings are binary strings (an easy-peasy-lemon-squeezy check)).
This version can handle any characters in the string except blanks (which could be used of a stemmed array would be used). <lang rexx>/*REXX pgm determines if a string is a repString, returns min len repStr*/ s=1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1

    do k=1  for words(s); _=word(s,k) /*process all the binary strings.*/
    say  right(_,20)  rep_string(_)   /*show the original & the result.*/
    end   /*k*/

exit /*stick a fork in it, we're done.*/ /*───────────────────────────────────REP_STRING subroutine──────────────*/ rep_string: procedure; parse arg x; L=length(x); r_=' rep string='

  do j=1  for L-1;    p=left(x,j);    if j>L%2  then iterate
  if left(copies(p,L*L),L)==x  then   return r_ left(p,15) '(length' j")"
  end   /*j*/

return ' (no repetitions)' /*(sigh)∙∙∙ a failure to find rep*/</lang> output

          1001110011   rep string= 10011           (length 5)
          1110111011   rep string= 1110            (length 4)
          0010010010   rep string= 001             (length 3)
          1010101010   rep string= 10              (length 2)
          1111111111   rep string= 1               (length 1)
          0100101101     (no repetitions)
             0100100   rep string= 010             (length 3)
                 101     (no repetitions)
                  11   rep string= 1               (length 1)
                  00   rep string= 0               (length 1)
                   1     (no repetitions)

Tcl

<lang tcl>proc repstring {text} {

   set len [string length $text]
   for {set i [expr {$len/2}]} {$i > 0} {incr i -1} {

set sub [string range $text 0 [expr {$i-1}]] set eq [string repeat $sub [expr {int(ceil($len/double($i)))}]] if {[string equal -length $len $text $eq]} { return $sub }

   }
   error "no repetition"

}</lang> Demonstrating: <lang tcl>foreach sample {

   "1001110011" "1110111011" "0010010010" "1010101010" "1111111111"
   "0100101101" "0100100" "101" "11" "00" "1"

} {

   if {[catch {

set rep [repstring $sample] puts [format "\"%s\" has repetition (length: %d) of \"%s\"" \ $sample [string length $rep] $rep]

   }]} {

puts [format "\"%s\" is not a repeated string" $sample]

   }

}</lang>

Output:
"1001110011" has repetition (length: 5) of "10011"
"1110111011" has repetition (length: 4) of "1110"
"0010010010" has repetition (length: 3) of "001"
"1010101010" has repetition (length: 4) of "1010"
"1111111111" has repetition (length: 5) of "11111"
"0100101101" is not a repeated string
"0100100" has repetition (length: 3) of "010"
"101" is not a repeated string
"11" has repetition (length: 1) of "1"
"00" has repetition (length: 1) of "0"
"1" is not a repeated string