Lyndon word

Revision as of 20:34, 2 February 2024 by Petelomax (talk | contribs) (Added Phix)

Given a finite alphabet, we can lexicographically order all strings in the alphabet. If two strings have different lengths, then pad the shorter string on the right with the smallest letter. For example, we have 01 > 001, but 01 = 010. As we see, this order is a total preorder, but not a total order, since it identifies different strings.

Task
Lyndon word
You are encouraged to solve this task according to the task description, using any language you may know.

A Lyndon word is a non-empty string that is strictly lower in lexicographic order than all its circular rotations. In particular, if a string is equal to a circular rotation, then it is not a Lyndon word. For example, since 0101 = 0101 (rotation by 2), it is not a Lyndon word.

The first Lyndon words on the binary alphabet {0, 1} are:

0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...

Task: Given a positive number and an ordered list of alphabet, all Lyndon words in the lexicographic order.

Duval (1988) provides an efficient algorithm for listing the Lyndon words of length at most with a given alphabet size in lexicographic order. If is one of the words in the sequence, then the next word after can be found by the following steps:

  1. Repeat and truncate it to a new word of length exactly .
  2. As long as the final symbol of is the last symbol in the sorted ordering of the alphabet, remove it, producing a shorter word.
  3. Replace the final remaining symbol of by its successor in the sorted ordering of the alphabet.

For example, suppose we have generated the binary Lyndon words of length up to 7, and we have generated up to , then the steps are:

  1. Repeat and truncate to get
  2. Since the last symbol is 0, it is not the final symbol.
  3. Increment the last symbol to obtain .

Phix

with javascript_semantics
function generate_lyndon_words(integer maxlen, string alphabet="01")
    sequence res = {}
    string w = alphabet[1..1]
    while length(w) do
        res = append(res,w)
        while length(w)<maxlen do
            w &= w
        end while
        w = trim_tail(w[1..maxlen],alphabet[$])
        if length(w) then
--          w[$] += 1 -- not eg "0123456789ABCDEF":
            w[$] = alphabet[find(w[$],alphabet)+1]
        end if
    end while
    return res
end function

printf(1,"%s\n",join(generate_lyndon_words(5,"01"),"\n"))
Output:
0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1

Python

def next_word(n, w, alphabet):
    x = (w * ((n // len(w)) + 1))[:n]
    while x and x[-1] == alphabet[-1]:
        x = x[:-1]
    if x:
        last_char = x[-1]
        next_char_index = alphabet.index(last_char) + 1
        x = x[:-1] + alphabet[next_char_index]
    return x

def generate_lyndon_words(n, alphabet):
    w = alphabet[0]
    while len(w) <= n:
        yield w
        w = next_word(n, w, alphabet)
        if not w: break

lyndon_words = generate_lyndon_words(5, [str(i) for i in range(2)])
for word in lyndon_words:
    print(word)

Output:

0
00001
0001
00011
001
00101
0011
00111
01
01011
011
0111
01111
1