Rep-string
Given a series of ones and zeroes in a string, define a repeated string or rep-string as a string which is created by repeating a substring of the first N characters of the string truncated on the right to the length of the input string, and in which the substring appears repeated at least twice in the original.
For example, the string '10011001100'
is a rep-string as the leftmost four characters of '1001'
are repeated three times and truncated on the right to give the original string.
Note that the requirement for having the repeat occur two or more times means that the repeating unit is never longer than half the length of the input string.
The task is to:
- Write a function/subroutine/method/... that takes a string and returns an indication of if it is a rep-string and the repeated string. (Either the string that is repeated, or the number of repeated characters would suffice).
- There may be multiple sub-strings that make a string a rep-string - in that case an indication of all, or the longest, or the shortest would suffice.
- Use the function to indicate the repeating substring if any, in the following:
'1001110011' '1110111011' '0010010010' '1010101010' '1111111111' '0100101101' '0100100' '101' '11' '00' '1'
- Show your output on this page.
J
Here's a test:
<lang j>isRepStr=: +./@((] -: $@] $ $)"0 1~1+[:i.<.@-:@#)</lang>
Example use:
<lang j> isRepStr'1001110011' 1
isRepStr'1110111011'
1
isRepStr'0010010010'
1
isRepStr'1010101010'
1
isRepStr'1111111111'
1
isRepStr'0100101101'
0
isRepStr'0100100'
1
isRepStr'101'
0
isRepStr'11'
1
isRepStr'00'
1
isRepStr'1'
0</lang>
We could also report the lengths of the repeated prefix, though this seems more arbitrary:
<lang j>nRepStr=: 0 -.~ (([ * ] -: $@] $ $)"0 1~1+[:i.<.@-:@#)</lang>
With the above examples:
<lang j> nRepStr'1001110011' 5
nRepStr'1110111011'
4
nRepStr'0010010010'
3
nRepStr'1010101010'
2 4
nRepStr'1111111111'
1 2 3 4 5
nRepStr'0100101101'
nRepStr'0100100'
3
nRepStr'101'
nRepStr'11'
1
nRepStr'00'
1
nRepStr'1'
</lang>
Here, the "non-str-rep" cases are indicated by an empty list of prefix lengths.
NetRexx
<lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols nobinary
/* REXX ***************************************************************
- 11.05.2013 Walter Pachl
- /
runSample(arg) return
/**
* Test for rep-strings * @param s_str a string to check for rep-strings * @return Rexx string: boolean indication of reps, length, repeated value */
method repstring(s_str) public static
s_str_n = s_str.length() rep_str = Loop lx = s_str.length() % 2 to 1 By -1 If s_str.substr(lx + 1, lx) = s_str.left(lx) Then Leave lx End lx If lx > 0 Then Do label reps rep_str = s_str.left(lx) Loop ix = 1 By 1 If s_str.substr(ix * lx + 1, lx) <> rep_str Then Leave ix End ix If rep_str.copies(s_str_n).left(s_str.length()) <> s_str Then rep_str = End reps Return (rep_str.length() > 0) rep_str.length() rep_str
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method runSample(arg) public static
parse arg samples if samples = then - samples = - '1001110011' - '1110111011' - '0010010010' - '1010101010' - '1111111111' - '0100101101' - '0100100' - '101' - '11' - '00' - '1'
loop w_ = 1 to samples.words() in_str = samples.word(w_) parse repstring(in_str) is_rep_str rep_str_len rep_str
sq = 'in_str' tstrlen = sq.length().max(20) sq=sq.right(tstrlen) if is_rep_str then Say sq 'has a repetition length of' rep_str_len "i.e. '"rep_str"'" else Say sq 'is not a repeated string' end w_ return
</lang>
- Output:
'1001110011' has a repetition length of 5 i.e. '10011' '1110111011' has a repetition length of 4 i.e. '1110' '0010010010' has a repetition length of 3 i.e. '001' '1010101010' has a repetition length of 4 i.e. '1010' '1111111111' has a repetition length of 5 i.e. '11111' '0100101101' is not a repeated string '0100100' has a repetition length of 3 i.e. '010' '101' is not a repeated string '11' has a repetition length of 1 i.e. '1' '00' has a repetition length of 1 i.e. '0' '1' is not a repeated string
Perl
<lang perl>foreach (qw(1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1)) {
print "$_\n"; if (/^(.+)\1+(.*$)(?(?{ substr($1, 0, length $2) eq $2 })|(?!))/) { print ' ' x length $1, "$1\n\n"; } else { print " (no repeat)\n\n"; }
}</lang>
- Output:
1001110011 10011 1110111011 1110 0010010010 001 1010101010 1010 1111111111 11111 0100101101 (no repeat) 0100100 010 101 (no repeat) 11 1 00 0 1 (no repeat)
Perl 6
<lang perl6>for <1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1> {
if /^ (.+) $0+: (.*$) <?{ $0.substr(0,$1.chars) eq $1 }> / {
my $rep = $0.chars; say .substr(0,$rep), .substr($rep,$rep).trans('01' => 'ππ'), .substr($rep*2);
} else {
say "$_ (no repeat)";
}
}</lang>
- Output:
10011πππππ 1110ππππ11 001πππ0010 1010ππππ10 11111πππππ 0100101101 (no repeat) 010πππ0 101 (no repeat) 1π 0π 1 (no repeat)
Here's a technique that relies on the fact that XORing the shifted binary number should set all the lower bits to 0 if there are repeats. (The cool thing is that shift will automatically throw away the bits on the right that you want thrown away.) This produces the same output as above. <lang perl6>sub repstr(Str $s) {
my $bits = :2($s); for reverse 1 .. $s.chars div 2 -> $left {
my $right = $s.chars - $left; return $left if $bits +^ ($bits +> $left) == $bits +> $right +< $right;
}
}
for '1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1'.words {
if repstr $_ -> $rep {
say .substr(0,$rep), .substr($rep,$rep).trans('01' => 'ππ'), .substr($rep*2);
} else {
say "$_ (no repeat)";
}
}</lang>
Python
Python: Procedural
<lang python>def is_repeated(text):
'check if the first part of the string is repeated throughout the string' for x in range(len(text)//2, 0, -1): if text.startswith(text[x:]): return x return 0
matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1 """ for line in matchstr.split():
ln = is_repeated(line) print('%r has a repetition length of %i i.e. %s' % (line, ln, repr(line[:ln]) if ln else '*not* a rep-string'))</lang>
- Output:
'1001110011' has a repetition length of 5 i.e. '10011' '1110111011' has a repetition length of 4 i.e. '1110' '0010010010' has a repetition length of 3 i.e. '001' '1010101010' has a repetition length of 4 i.e. '1010' '1111111111' has a repetition length of 5 i.e. '11111' '0100101101' has a repetition length of 0 i.e. *not* a rep-string '0100100' has a repetition length of 3 i.e. '010' '101' has a repetition length of 0 i.e. *not* a rep-string '11' has a repetition length of 1 i.e. '1' '00' has a repetition length of 1 i.e. '0' '1' has a repetition length of 0 i.e. *not* a rep-string
Python: Functional
This returns all the possible repeated substrings <lang python>>>> def reps(text):
return [text[:x] for x in range(1, 1 + len(text) // 2) if text.startswith(text[x:])]
>>> matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1 """ >>> print('\n'.join('%r has reps %r' % (line, reps(line)) for line in matchstr.split())) '1001110011' has reps ['10011'] '1110111011' has reps ['1110'] '0010010010' has reps ['001'] '1010101010' has reps ['10', '1010'] '1111111111' has reps ['1', '11', '111', '1111', '11111'] '0100101101' has reps [] '0100100' has reps ['010'] '101' has reps [] '11' has reps ['1'] '00' has reps ['0'] '1' has reps [] >>> </lang>
Python: Regexp
This version, inspired by the Perl 6 entry uses the regexp substitute where what the match is substituted with is returned by a function. <lang python>import re
matchstr = """\ 1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1"""
def _checker(matchobj):
g0, (g1, g2, g3, g4) = matchobj.group(0), matchobj.groups() if not g4 and g1 and g1.startswith(g3): return '%r repeats %r' % (g0, g1) return '%r is not a rep-string' % (g0,)
def checkit(txt):
print(re.sub(r'(.+)(\1+)(.*)|(.*)', _checker, txt))
checkit(matchstr)</lang>
- Output:
'1001110011' repeats '10011' '1110111011' repeats '1110' '0010010010' repeats '001' '1010101010' repeats '1010' '1111111111' repeats '11111' '0100101101' is not a rep-string '0100100' repeats '010' '101' is not a rep-string '11' repeats '1' '00' repeats '0' '1' is not a rep-string
REXX
version 1
<lang rexx>/* REXX ***************************************************************
- 11.05.2013 Walter Pachl
- /
Call repstring '1001110011' Call repstring '1110111011' Call repstring '0010010010' Call repstring '1010101010' Call repstring '1111111111' Call repstring '0100101101' Call repstring '0100100' Call repstring '101' Call repstring '11' Call repstring '00' Call repstring '1' Exit
repstring: Parse Arg s sq='s' n=length(s) Do l=length(s)%2 to 1 By -1
If substr(s,l+1,l)=left(s,l) Then Leave End
If l>0 Then Do
rep_str=left(s,l) Do i=1 By 1 If substr(s,i*l+1,l)<>rep_str Then Leave End If left(copies(rep_str,n),length(s))=s Then Say sq 'has a repetition length of' length(rep_str), 'i.e.' 'rep_str' Else Say sq 'is not a repeated string' End
Else
Say sq 'is not a repeated string'
Return</lang> Output:
'1001110011' has a repetition length of 5 i.e. '10011' '1110111011' has a repetition length of 4 i.e. '1110' '0010010010' has a repetition length of 3 i.e. '001' '1010101010' has a repetition length of 4 i.e. '1010' '1111111111' has a repetition length of 5 i.e. '11111' '0100101101' is not a repeated string '0100100' has a repetition length of 3 i.e. '010' '101' is not a repeated string '11' has a repetition length of 1 i.e. '1' '00' has a repetition length of 1 i.e. '0' '1' is not a repeated string
version 2
No checking is done to validate if the strings are binary strings (an easy-peasy-lemon-squeezy check)).
This version can handle any characters in the string except blanks (which could be used if a stemmed array would be used).
<lang rexx>/*REXX pgm determines if a string is a repString, returns min len repStr*/
s=1001110011 1110111011 0010010010 1010101010 1111111111 0100101101 0100100 101 11 00 1
do k=1 for words(s); _=word(s,k) /*process all the binary strings.*/ say right(_,20) rep_string(_) /*show the original & the result.*/ end /*k*/
exit /*stick a fork in it, we're done.*/ /*βββββββββββββββββββββββββββββββββββREP_STRING subroutineββββββββββββββ*/ rep_string: procedure; parse arg x; L=length(x); r_=' rep string='
do j=1 for L-1; p=left(x,j); if j>L%2 then iterate if left(copies(p,L*L),L)==x then return r_ left(p,15) '(length' j")" end /*j*/
return ' (no repetitions)' /*(sigh)βββ a failure to find rep*/</lang> output
1001110011 rep string= 10011 (length 5) 1110111011 rep string= 1110 (length 4) 0010010010 rep string= 001 (length 3) 1010101010 rep string= 10 (length 2) 1111111111 rep string= 1 (length 1) 0100101101 (no repetitions) 0100100 rep string= 010 (length 3) 101 (no repetitions) 11 rep string= 1 (length 1) 00 rep string= 0 (length 1) 1 (no repetitions)
Tcl
<lang tcl>proc repstring {text} {
set len [string length $text] for {set i [expr {$len/2}]} {$i > 0} {incr i -1} {
set sub [string range $text 0 [expr {$i-1}]] set eq [string repeat $sub [expr {int(ceil($len/double($i)))}]] if {[string equal -length $len $text $eq]} { return $sub }
} error "no repetition"
}</lang> Demonstrating: <lang tcl>foreach sample {
"1001110011" "1110111011" "0010010010" "1010101010" "1111111111" "0100101101" "0100100" "101" "11" "00" "1"
} {
if {[catch {
set rep [repstring $sample] puts [format "\"%s\" has repetition (length: %d) of \"%s\"" \ $sample [string length $rep] $rep]
}]} {
puts [format "\"%s\" is not a repeated string" $sample]
}
}</lang>
- Output:
"1001110011" has repetition (length: 5) of "10011" "1110111011" has repetition (length: 4) of "1110" "0010010010" has repetition (length: 3) of "001" "1010101010" has repetition (length: 4) of "1010" "1111111111" has repetition (length: 5) of "11111" "0100101101" is not a repeated string "0100100" has repetition (length: 3) of "010" "101" is not a repeated string "11" has repetition (length: 1) of "1" "00" has repetition (length: 1) of "0" "1" is not a repeated string