Lyndon word: Difference between revisions

From Rosetta Code
Content added Content deleted
imported>CosmiaNebula
(initial page)
 
 
(13 intermediate revisions by 8 users not shown)
Line 5: Line 5:
A [[wp:Lyndon_word|Lyndon word]] is a non-empty string that is ''strictly'' lower in lexicographic order than all its circular rotations. In particular, if a string is equal to a circular rotation, then it is not a Lyndon word. For example, since 0101 = 0101 (rotation by 2), it is not a Lyndon word.
A [[wp:Lyndon_word|Lyndon word]] is a non-empty string that is ''strictly'' lower in lexicographic order than all its circular rotations. In particular, if a string is equal to a circular rotation, then it is not a Lyndon word. For example, since 0101 = 0101 (rotation by 2), it is not a Lyndon word.


The first Lyndon words on the binary alphabet {0, 1} are:<blockquote>
The first Lyndon words on the binary alphabet {0, 1} are:

: 0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...'''Task:''' Given a positive number and an ordered list of alphabet, all Lyndon words in the lexicographic order.
: 0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...
Duval (1988) provides an efficient algorithm for listing the Lyndon words of length at most <math>n</math> with a given alphabet size <math>s</math> in [[wp:Lexicographic_order|lexicographic order]]. If <math>w</math> is one of the words in the sequence, then the next word after <math>w</math> can be found by the following steps:
:

'''Task:''' Given a positive number <math>n</math> and an ordered list of alphabet, list all Lyndon words of length at most <math>n</math>, in the lexicographic order.

Duval (1988) provides an efficient algorithm. If <math>w</math> is one of the words in the sequence, then the next word after <math>w</math> can be found by the following steps:


# Repeat <math>w</math> and truncate it to a new word <math>x</math> of length exactly <math>n</math>.
# Repeat <math>w</math> and truncate it to a new word <math>x</math> of length exactly <math>n</math>.
Line 19: Line 24:
# Increment the last symbol to obtain <math>x = 00111\;01 </math>.
# Increment the last symbol to obtain <math>x = 00111\;01 </math>.


=={{header|EasyLang}}==
== [[Python]] ==
{{trans|Python}}
<syntaxhighlight>
func$ nextword n w$ alpha$ .
alpha$[] = strchars alpha$
while len x$ < n
x$ &= w$
.
x$[] = strchars substr x$ 1 n
while len x$[] > 0 and x$[len x$[]] = alpha$[len alpha$[]]
len x$[] -1
.
lx = len x$[]
if lx > 0
repeat
i += 1
until alpha$[i] = x$[lx]
.
x$[lx] = alpha$[i + 1]
.
return strjoin x$[]
.
proc lyndon n alpha$ . .
w$ = substr alpha$ 1 1
while len w$ <= n and len w$ > 0
print w$
w$ = nextword n w$ alpha$
.
.
lyndon 5 "01"
</syntaxhighlight>

=={{header|jq}}==
{{Works with|jq}}

'''Works with gojq, the Go implementation of jq'''

This entry defines a filter, `lyndon`, for finding the Lyndon word of the input string, if any, but is otherwise adapted from the [[#Wren|Wren]]
entry.
<syntaxhighlight lang="jq">
# Generic stream-oriented min function
# Assume the stream s is not empty and does contain null
def min(s):
reduce s as $x (null; if . == null then $x elif . < $x then . else $x end);

# Input: a string
# Assume 0 <= $n <= length
def rotate($n):
.[$n:] + .[:$n];

# Emit the Lyndon word corresponding to the input string, else empty.
def lyndon:
def circular:
. as $in
| any( range(1; length/2 + 1); . as $n | $in | rotate($n) == .) ;

if circular then empty
else min(range(0;length) as $i | rotate($i))
end;

# Input: a Lyndon word
# Output: the next Lyndon word relative to $alphabet
def nextWord($n; $alphabet):
def a($i): $alphabet[$i:$i+1];
((($n/length)|floor + 1) * .)[:$n]
| until (. == "" or .[-1:] != $alphabet[-1:]; .[:-1])
| if . == "" then .
else .[-1:] as $lastChar
| ($alphabet|index($lastChar) + 1) as $nextCharIndex
| .[0:-1] + a($nextCharIndex)
end ;

def lyndon_words($n):
. as $alphabet
| .[:1]
| while ( length <= $n and . != "";
nextWord($n; $alphabet) ) ;

"01" | lyndon_words(5)
</syntaxhighlight>
{{output}}
As for [[#Wren|Wren]].

=={{header|Julia}}==
{{trans|Python}}
<syntaxhighlight lang="julia">function nextword(n, w, alphabet)
x = (w ^ ((n ÷ length(w)) + 1))[begin:n]
while x != "" && x[end] == alphabet[end]
x = x[begin:end-1]
end
if x != ""
last_char = x[end]
next_char_index = something(findfirst(==(last_char), alphabet), 0) + 1
x = x[begin:end-1] * alphabet[next_char_index]
end
return x
end

function generate_lyndon_words(n, alphabet)
lwords = String[]
w = string(alphabet[begin])
while length(w) <= n
push!(lwords, w)
w = nextword(n, w, alphabet)
w == "" && break
end
return lwords
end

const lyndon_words = generate_lyndon_words(5, "01")
foreach(println, lyndon_words)
</syntaxhighlight>{{out}} Same as Python example.

== {{header|Phix}} ==
<syntaxhighlight lang="phix">
with javascript_semantics
function generate_lyndon_words(integer maxlen, string alphabet="01")
sequence res = {}
string w = alphabet[1..1]
while length(w) do
res = append(res,w)
while length(w)<maxlen do
w &= w
end while
w = trim_tail(w[1..maxlen],alphabet[$])
if length(w) then
-- w[$] += 1 -- not eg "0123456789ABCDEF":
w[$] = alphabet[find(w[$],alphabet)+1]
end if
end while
return res
end function

printf(1,"%s\n",join(generate_lyndon_words(5,"01"),"\n"))
</syntaxhighlight>
{{out}}
<pre>
0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1
</pre>

== {{header|Python}} ==
<syntaxhighlight lang="python3">
<syntaxhighlight lang="python3">
def next_word(n, w, alphabet):
def next_word(n, w, alphabet):
Line 58: Line 216:


</syntaxhighlight>
</syntaxhighlight>

=={{header|Raku}}==
{{trans|Julia}}
<syntaxhighlight lang="raku" line># 20240211 Raku programming solution

sub nextword($n, $w, $alphabet) {
my $x = ($w x ($n div $w.chars + 1)).substr(0, $n);
while $x.Bool && $x.substr(*-1) eq $alphabet.substr(*-1) {
$x.substr-rw(*-1) = ''
}
if $x.Bool {
my $next_char_index = ($alphabet.index($x.substr(*-1)) // 0) + 1;
$x.substr-rw(*-1, 1) = $alphabet.substr($next_char_index, 1);
}
return $x;
}

.say for sub ($n, $alphabet) {
my $w = $alphabet.substr(0, 1);
return gather while $w.chars <= $n {
take $w;
last unless $w = nextword($n, $w, $alphabet);
}
}(5, '01');</syntaxhighlight>
{{out}} Same as Julia example.
You may [https://ato.pxeger.com/run?1=fVLLToQwFE1c8hVnQQaqwMDCxARn449MOlKEDBZti2Vi-BI3s9Cf8mtsedSMjpLQEO49j3t6394F3XfH40enyvjm84LIbgfOeqVbUYQ-j-Br89LmqaI7pghePQCPB_g9Ngh9jd6cHEX9YjqT-4oKiStkhCSGSSoRpgbOSW5huqobZpDJXds2WK3s59x1GWcE7Plb6aQwiprH9cdCT5UNgsAWB3vUpSNfENapHWdrnW1rXrAeo3GnM_4LT50QrNdIiR0k_0M6wqj-y-9PNduXO4eCqU5wQ5Z7g-clkh5QtgI29CnsM0HrczKp450pH6iqmFgiXm7i1kC5C0PRva0tIzVUKnS8YVJOKv_c-zzCEF5HCNIsIPm0MfPiLAv0BQ Attempt This Online!]

=={{header|RPL}}==
<code>REVSTR</code> is defined at [[Reverse a string#RPL|Reverse a string]].
{{works with|HP|48G}}
« OVER <span style="color:blue">REVSTR</span> HEAD → w alphabet n lastsymbol
« IF w lastsymbol == '''THEN''' ""
'''ELSE'''
w
'''WHILE''' DUP SIZE n ≤ '''REPEAT''' w + '''END'''
1 n SUB <span style="color:blue">REVSTR</span> <span style="color:grey">''@ reversing string to use HEAD and TAIL rather than SUB''</span>
'''WHILE''' DUP HEAD lastsymbol == '''REPEAT''' TAIL '''END'''
TAIL LASTARG HEAD
alphabet DUP ROT POS 1 + DUP SUB SWAP +
<span style="color:blue">REVSTR</span>
'''END'''
» » '<span style="color:blue">NEXTLYNDON</span>' STO <span style="color:grey">''@ ( "word" "alphabet" n → "nextword" )''</span>
« → alphabet n
« { } alphabet HEAD
'''WHILE''' DUP SIZE '''REPEAT'''
SWAP OVER + SWAP alphabet n <span style="color:blue">NEXTLYNDON</span>
'''END''' DROP
» » '<span style="color:blue">TASK</span>' STO

"01" 5 <span style="color:blue">TASK</span>
{{out}}
<pre>
1: { "0" "00001" "0001" "00011" "001" "00101" "0011" "00111" "01" "01011" "011" "0111" "01111" "1" }
</pre>

=={{header|Wren}}==
{{trans|Python}}
<syntaxhighlight lang="wren">var alphabet = "01"

var nextWord = Fn.new { |n, w|
var x = (w * ((n/w.count).floor + 1))[0...n]
while (x != "" && x[-1] == alphabet[-1]) x = x[0...-1]
if (x != "") {
var lastChar = x[-1]
var nextCharIndex = alphabet.indexOf(lastChar) + 1
x = x[0...-1] + alphabet[nextCharIndex]
}
return x
}

var generateLyndonWords = Fiber.new { |n|
var w = alphabet[0]
while (w.count <= n) {
Fiber.yield(w)
w = nextWord.call(n, w)
if (w == "") break
}
}

while (true) {
var word = generateLyndonWords.call(5)
if (!word) break
System.print(word)
}</syntaxhighlight>

{{out}}
<pre>
0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1
</pre>

Latest revision as of 19:14, 9 March 2024

Task
Lyndon word
You are encouraged to solve this task according to the task description, using any language you may know.

Given a finite alphabet, we can lexicographically order all strings in the alphabet. If two strings have different lengths, then pad the shorter string on the right with the smallest letter. For example, we have 01 > 001, but 01 = 010. As we see, this order is a total preorder, but not a total order, since it identifies different strings.

A Lyndon word is a non-empty string that is strictly lower in lexicographic order than all its circular rotations. In particular, if a string is equal to a circular rotation, then it is not a Lyndon word. For example, since 0101 = 0101 (rotation by 2), it is not a Lyndon word.

The first Lyndon words on the binary alphabet {0, 1} are:

0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...

Task: Given a positive number and an ordered list of alphabet, list all Lyndon words of length at most , in the lexicographic order.

Duval (1988) provides an efficient algorithm. If is one of the words in the sequence, then the next word after can be found by the following steps:

  1. Repeat and truncate it to a new word of length exactly .
  2. As long as the final symbol of is the last symbol in the sorted ordering of the alphabet, remove it, producing a shorter word.
  3. Replace the final remaining symbol of by its successor in the sorted ordering of the alphabet.

For example, suppose we have generated the binary Lyndon words of length up to 7, and we have generated up to , then the steps are:

  1. Repeat and truncate to get
  2. Since the last symbol is 0, it is not the final symbol.
  3. Increment the last symbol to obtain .

EasyLang

Translation of: Python
func$ nextword n w$ alpha$ .
   alpha$[] = strchars alpha$
   while len x$ < n
      x$ &= w$
   .
   x$[] = strchars substr x$ 1 n
   while len x$[] > 0 and x$[len x$[]] = alpha$[len alpha$[]]
      len x$[] -1
   .
   lx = len x$[]
   if lx > 0
      repeat
         i += 1
         until alpha$[i] = x$[lx]
      .
      x$[lx] = alpha$[i + 1]
   .
   return strjoin x$[]
.
proc lyndon n alpha$ . .
   w$ = substr alpha$ 1 1
   while len w$ <= n and len w$ > 0
      print w$
      w$ = nextword n w$ alpha$
   .
.
lyndon 5 "01"

jq

Works with: jq

Works with gojq, the Go implementation of jq

This entry defines a filter, `lyndon`, for finding the Lyndon word of the input string, if any, but is otherwise adapted from the Wren entry.

# Generic stream-oriented min function
# Assume the stream s is not empty and does contain null
def min(s):
  reduce s as $x (null; if . == null then $x elif . < $x then . else $x end);

# Input: a string
# Assume 0 <= $n <= length
def rotate($n):
  .[$n:] + .[:$n];

# Emit the Lyndon word corresponding to the input string, else empty.
def lyndon:
  def circular:
    . as $in
    | any( range(1; length/2 + 1);  . as $n | $in | rotate($n) == .) ;

  if circular then empty
  else min(range(0;length) as $i | rotate($i))
  end;

# Input: a Lyndon word
# Output: the next Lyndon word relative to $alphabet
def nextWord($n; $alphabet):
  def a($i): $alphabet[$i:$i+1];
  ((($n/length)|floor + 1) * .)[:$n]
  | until (. == "" or .[-1:] != $alphabet[-1:]; .[:-1])
  | if . == "" then .
    else .[-1:] as $lastChar
    | ($alphabet|index($lastChar) + 1) as $nextCharIndex
    | .[0:-1] + a($nextCharIndex)
    end ;

def lyndon_words($n):
  . as $alphabet
  | .[:1]
  | while ( length <= $n and . != "";
      nextWord($n; $alphabet) ) ;

"01" | lyndon_words(5)
Output:

As for Wren.

Julia

Translation of: Python
function nextword(n, w, alphabet)
    x = (w ^ ((n ÷ length(w)) + 1))[begin:n]
    while x != "" && x[end] == alphabet[end]
        x = x[begin:end-1]
    end
    if x != ""
        last_char = x[end]
        next_char_index = something(findfirst(==(last_char), alphabet), 0) + 1
        x = x[begin:end-1] * alphabet[next_char_index]
    end
    return x
end

function generate_lyndon_words(n, alphabet)
    lwords = String[]
    w = string(alphabet[begin])
    while length(w) <= n
        push!(lwords, w)
        w = nextword(n, w, alphabet)
        w == "" && break
    end
    return lwords
end

const lyndon_words = generate_lyndon_words(5, "01")
foreach(println, lyndon_words)
Output:

Same as Python example.

Phix

with javascript_semantics
function generate_lyndon_words(integer maxlen, string alphabet="01")
    sequence res = {}
    string w = alphabet[1..1]
    while length(w) do
        res = append(res,w)
        while length(w)<maxlen do
            w &= w
        end while
        w = trim_tail(w[1..maxlen],alphabet[$])
        if length(w) then
--          w[$] += 1 -- not eg "0123456789ABCDEF":
            w[$] = alphabet[find(w[$],alphabet)+1]
        end if
    end while
    return res
end function

printf(1,"%s\n",join(generate_lyndon_words(5,"01"),"\n"))
Output:
0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1

Python

def next_word(n, w, alphabet):
    x = (w * ((n // len(w)) + 1))[:n]
    while x and x[-1] == alphabet[-1]:
        x = x[:-1]
    if x:
        last_char = x[-1]
        next_char_index = alphabet.index(last_char) + 1
        x = x[:-1] + alphabet[next_char_index]
    return x

def generate_lyndon_words(n, alphabet):
    w = alphabet[0]
    while len(w) <= n:
        yield w
        w = next_word(n, w, alphabet)
        if not w: break

lyndon_words = generate_lyndon_words(5, [str(i) for i in range(2)])
for word in lyndon_words:
    print(word)

Output:

0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1

Raku

Translation of: Julia
# 20240211 Raku programming solution

sub nextword($n, $w, $alphabet) {
   my $x = ($w x ($n div $w.chars + 1)).substr(0, $n);
   while $x.Bool && $x.substr(*-1) eq $alphabet.substr(*-1) {
      $x.substr-rw(*-1) = ''
   }
   if $x.Bool {
      my $next_char_index  = ($alphabet.index($x.substr(*-1)) // 0) + 1;
      $x.substr-rw(*-1, 1) = $alphabet.substr($next_char_index, 1);
   }
   return $x;
}

.say for sub ($n, $alphabet) {
   my $w = $alphabet.substr(0, 1);
   return gather while $w.chars <= $n {
      take $w;
      last unless $w = nextword($n, $w, $alphabet);
   }
}(5, '01');
Output:

Same as Julia example.

You may Attempt This Online!

RPL

REVSTR is defined at Reverse a string.

Works with: HP version 48G
« OVER REVSTR HEAD → w alphabet n lastsymbol
  « IF w lastsymbol == THEN ""
    ELSE 
       w
       WHILE DUP SIZE n ≤ REPEAT w + END
       1 n SUB REVSTR                                 @ reversing string to use HEAD and TAIL rather than SUB
       WHILE DUP HEAD lastsymbol == REPEAT TAIL END
       TAIL LASTARG HEAD
       alphabet DUP ROT POS 1 + DUP SUB SWAP +
       REVSTR
    END
» » 'NEXTLYNDON' STO                                  @ ( "word" "alphabet" n → "nextword" )

« → alphabet n 
  « { } alphabet HEAD 
    WHILE DUP SIZE REPEAT 
       SWAP OVER + SWAP alphabet n NEXTLYNDON
    END DROP
» » 'TASK' STO
"01" 5 TASK
Output:
1: { "0" "00001" "0001" "00011" "001" "00101" "0011" "00111" "01" "01011" "011" "0111" "01111" "1" }

Wren

Translation of: Python
var alphabet = "01"

var nextWord = Fn.new { |n, w|
    var x = (w * ((n/w.count).floor + 1))[0...n]
    while (x != "" && x[-1] == alphabet[-1]) x = x[0...-1]
    if (x != "") {
        var lastChar = x[-1]
        var nextCharIndex = alphabet.indexOf(lastChar) + 1
        x = x[0...-1] + alphabet[nextCharIndex]
    }
    return x
}

var generateLyndonWords = Fiber.new { |n|
    var w = alphabet[0]
    while (w.count <= n) {
        Fiber.yield(w)
        w = nextWord.call(n, w)
        if (w == "") break
    }
}

while (true) {
    var word = generateLyndonWords.call(5)
    if (!word) break
    System.print(word)
}
Output:
0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1