Word break problem: Difference between revisions
m
→{{header|Wren}}: Minor tidy
m (→{{header|Wren}}: Minor tidy) |
|||
(6 intermediate revisions by 4 users not shown) | |||
Line 9:
{{trans|D}}
<
String val
[String] parsed
Line 44:
d = [‘abc’, ‘a’, ‘ac’, ‘b’, ‘c’, ‘cb’, ‘d’]
process(d, [‘abcd’, ‘abbc’, ‘abcbcd’, ‘acdbc’, ‘abcdd’])</
{{out}}
Line 78:
=={{header|Aime}}==
<
wordbreak(record dict, text phrase, integer p, list words)
{
Line 133:
return 0;
}</
{{Out}}
<pre>abcd: a b cd
Line 140:
acdbc: a cd bc
abcdd: can't break</pre>
=={{header|ALGOL 68}}==
{{Trans|Nim}}
<syntaxhighlight lang="algol68"># utility functions: #
PROC includes = ([]STRING row, STRING s) BOOL:
# whether row includes s #
BEGIN
FOR i TO UPB row DO IF s = row[i] THEN true FI OD;
FALSE EXIT
true: TRUE
END;
PRIO & = 6;
OP & = (STRING s, []STRING row) []STRING:
# returns row with s prepended to it #
BEGIN
[UPB row + 1]STRING result;
result[2 : UPB result] := row;
result[1] := s;
result
END;
PROC add = (REF FLEX[]FLEX[]STRING rows, []STRING row) VOID:
# adds row to the end of rows (in place) #
BEGIN
[UPB rows+1]FLEX[0]STRING tmp;
tmp[:UPB rows] := rows;
tmp[UPB tmp] := row;
rows := tmp
END;
# actual solution: #
PROC word breaks = ([]STRING dict, STRING word) [][]STRING:
# build recursively the list of breaks for a word, using the given dictionary #
BEGIN
FLEX[0]FLEX[0]STRING result;
FOR last TO UPB word - 1 DO
STRING part1 := word[1 : last];
IF includes(dict, part1) THEN
STRING part2 := word[last + 1 : ];
IF includes(dict, part2) THEN add(result, (part1, part2)) FI;
[][]STRING sub results = word breaks(dict, part2);
FOR i FROM LWB sub results TO UPB sub results DO
add(result, part1 & sub results[i])
OD
FI
OD;
result
END;
PROC break word = ([]STRING dict, STRING word) VOID:
# find the ways to break a word and display the result #
BEGIN
print((word, ":", newline));
[][]STRING word seqs = word breaks(dict, word);
IF UPB word seqs = 0 THEN
print((" <no break possible>", newline))
ELSE
FOR i FROM LWB word seqs TO UPB word seqs DO
[]STRING seq = word seqs[i];
print(" ");
FOR j FROM LWB seq TO UPB seq DO
print((" ", seq[j]))
OD;
print(newline)
OD
FI
END;
[]STRING dict = ("a", "bc", "abc", "cd", "b");
[]STRING words = ("abcd", "abbc", "abcbcd", "acdbc", "abcdd");
FOR i TO UPB words DO
break word(dict, words[i])
OD</syntaxhighlight>
{{out}}
<pre>abcd:
a b cd
abbc:
a b bc
abcbcd:
a bc b cd
abc b cd
acdbc:
a cd bc
abcdd:
<no break possible>
</pre>
=={{header|C++}}==
{{trans|Rust}}
<
#include <iostream>
#include <optional>
Line 229 ⟶ 321:
return 0;
}
</syntaxhighlight>
{{out}}
Line 238 ⟶ 330:
=={{header|D}}==
{{trans|Java}}
<
import std.range;
import std.stdio;
Line 297 ⟶ 389:
parsed = p;
}
}</
{{out}}
<pre>String = aab, Dictionary = ["a", "aa", "b", "ab", "aab"]. Solutions = 4
Line 330 ⟶ 422:
{{libheader| System.Generics.Collections}}
{{Trans|Go}}
<syntaxhighlight lang="delphi">
program Word_break_problem;
Line 418 ⟶ 510:
Readln;
end.</
=={{header|Go}}==
<
import (
Line 471 ⟶ 563:
}
}
}</
{{out}}
<pre>
Line 483 ⟶ 575:
=={{header|Haskell}}==
{{Trans|Javascript}}
<
import Data.Tree (Tree(..))
Line 516 ⟶ 608:
main :: IO ()
main = (putStrLn . unlines) $ wordBreaks ws <$> texts</
{{Out}}
<pre>abcd:
Line 537 ⟶ 629:
With such short sentences we can find the partition sets, then check that all are words.
<syntaxhighlight lang="j">
all_partitions=: <@(<;.1)"1 _~ (1,.[:#:[:i.2^<:@:#) NB. all_partitions 'abcd'
word_break=: ([ #~ 0 = [: #&>@:] -.L:_1 _)~ all_partitions
main=: (] , (;:inv L:_1@:word_break >))"_ 0 boxopen
</syntaxhighlight>
<pre>
Line 595 ⟶ 687:
=={{header|Java}}==
Accept string to be parsed, and dictionary of words, as method arguments.
<syntaxhighlight lang="java">
import java.util.ArrayList;
import java.util.Arrays;
Line 665 ⟶ 757:
}
</syntaxhighlight>
{{out}}
<pre>
Line 700 ⟶ 792:
Composing a solution from generic functions.
{{Trans|Python}}
<
'use strict';
Line 811 ⟶ 903:
// MAIN ---
return main();
})();</
{{Out}}
<pre>abcd:
Line 842 ⟶ 934:
used to determine the number of possible parses of the string "totalreturn".
<syntaxhighlight lang="jq">
def words: ["a", "bc", "abc", "cd", "b"];
def strings: ["abcd", "abbc", "abcbcd", "acdbc", "abcdd"];
Line 888 ⟶ 980:
demo1, " ", demo2, "", demo3
</syntaxhighlight>
{{out}}
<pre>
Line 908 ⟶ 1,000:
=={{header|Julia}}==
Some extra loops to record and print all solutions.<
words = ["a", "bc", "abc", "cd", "b"]
strings = ["abcd", "abbc", "abcbcd", "acdbc", "abcdd"]
Line 935 ⟶ 1,027:
end
wordbreak()</
{{output}}<pre>
1 Solution(s) for abcd:
Line 950 ⟶ 1,042:
=={{header|Kotlin}}==
I've downloaded the free dictionary at http://www.puzzlers.org/pub/wordlists/unixdict.txt for this task. All single letters from 'a' to 'z' are considered to be words by this dictionary but 'bc' and 'cd' which I'd have expected to be present are not.
<
import java.io.File
Line 997 ⟶ 1,089:
println()
}
}</
{{out}}
Line 1,023 ⟶ 1,115:
=={{header|Lua}}==
<
-- possible candidates for this particalur problem
function genDict(ws)
Line 1,088 ⟶ 1,180:
for i=1,#test do
print(test[i],wordbreak(test[i]))
end</
{{out}}
<pre>abcd a b cd
Line 1,099 ⟶ 1,191:
We provide two ways to initialize the dictionary: either explicitly by providing the list of words or by providing a file name from which extract the words. In the second case, it is possible to specify the minimal length of the words to keep. This is useful mainly to eliminate all the one letter words we get with “unixdict.txt”.
<
type
Line 1,146 ⟶ 1,238:
dict = initDict("unixdict.txt", 2)
dict.breakWord("because")
dict.breakWord("software")</
{{out}}
Line 1,174 ⟶ 1,266:
=={{header|Perl}}==
<
use warnings;
Line 1,186 ⟶ 1,278:
@matches = $word =~ /^ ($one_of) ($one_of)? ($one_of)? ($one_of)? $/x; # sub-optimal: limited number of matches
return join(' ', grep {$_} @matches) || "(not possible)";
}</
{{out}}
<pre>a: a
Line 1,203 ⟶ 1,295:
=={{header|Phix}}==
The distributed version also contains a few experiments with applying a real dictionary, largely unsuccessful.
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Word_break_problem.exw
Line 1,316 ⟶ 1,408:
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span>
<!--</
{{out}}
<pre>
Line 1,332 ⟶ 1,424:
===Non-determinism using member/2===
<code>s/2</code> checks all possible combinations. It uses <code>member/2</code> to select some element from Dict and backtracks if no solution is found.
<
Tests = [["aab", ["a", "aa", "b", "ab", "aab"]],
["abba", ["a", "aa", "b", "ab", "aab"]],
Line 1,344 ⟶ 1,436:
foreach([String,Dict] in Tests)
println([string=String,dict=Dict]),
All = findall(S, S =
println(All),
nl
Line 1,356 ⟶ 1,448:
S := S ++ [D]
end,
S.flatten==String.</
{{out}}
Line 1,385 ⟶ 1,477:
===Non-determinism using member/2 and append/3===
<code>s2/2</code> is more efficient. It uses <code>append/3</code> to extract the found prefix from the string.
<
String2 = copy_term(String),
S = [],
Line 1,400 ⟶ 1,492:
S := S ++ [D]
end,
S.flatten==String.</
===Using recursion===
<code>s3/2</code> use the same idea as <code>s2/2</code> but use recursion. Neater and more efficient.
<
s3(Dict,String,S).
Line 1,411 ⟶ 1,503:
member(E,Dict),
append(E,String2,String),
s3(Dict,String2,S).</
===Some tests===
Here is the test engine.
<
garbage_collect(300_000_000),
_ = random2(),
Line 1,435 ⟶ 1,527:
end,
println(len=All.len),
nl.</
As can be seen by this summary, <code>s3/2</code> is the most efficient of these versions. <code>s/2</code> is too slow but for the simplest examples.
Line 1,476 ⟶ 1,568:
=={{header|PicoLisp}}==
<
(setq *Dict2
(quote
Line 1,516 ⟶ 1,608:
(println (word "samsungandmango" *Dict2))
(println (word "samsungandmangok" *Dict2))
(println (word "ksamsungandmango" *Dict2))</
{{out}}
<pre>
Line 1,538 ⟶ 1,630:
{{Works with|Python|3.7}}
<
from itertools import (chain)
Line 1,640 ⟶ 1,732:
# MAIN ---
if __name__ == '__main__':
main()</
{{Out}}
<pre>abcd:
Line 1,657 ⟶ 1,749:
This returns all the possible splits (and null list if none is possible). Who's to say which is the best?
<
(define render-phrases pretty-print)
Line 1,688 ⟶ 1,780:
(render-phrases (word-splits "samsungandmango" dict-2))
(render-phrases (word-splits "samsungandmangok" dict-2))
(render-phrases (word-splits "ksamsungandmango" dict-2)))</
{{out}}
<pre>'("a b cd")
Line 1,710 ⟶ 1,802:
This implementation does not necessarily find ''every'' combination, it returns the one with the longest matching tokens.
<syntaxhighlight lang="raku"
my $regex = @words.join('|');
put "$_: ", word-break($_) for <abcd abbc abcbcd acdbc abcdd>;
sub word-break (Str $word) { ($word ~~ / ^ (<$regex>)+ $ /)[0] // "Not possible" }</
{{out}}
<pre>abcd: a b cd
Line 1,725 ⟶ 1,817:
=={{header|REXX}}==
This REXX version allows the words to be tested (and the dictionary words) to be specified on the command line.
<
parse arg a '/' x; a=space(a); x=space(x) /*get optional args; elide extra blanks*/
if a=='' | a=="," then a= 'abcd abbc abcbcd acdbc abcdd' /*Not specififed? Use default*/
Line 1,759 ⟶ 1,851:
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
fw: parse arg z,L; do k=L by -1 for L; ?=left(z,k); if @.? then leave; end; return k</
{{out|output|text= when using the default inputs:}}
<pre>
Line 1,772 ⟶ 1,864:
=={{header|Ring}}==
<
# Project : Word break problem
Line 1,837 ⟶ 1,929:
next
next
</syntaxhighlight>
Output:
<pre>
Line 1,844 ⟶ 1,936:
abcbcd = abc b cd
acdbc = a cd bc
</pre>
=={{header|Ruby}}==
<syntaxhighlight lang="ruby">
def split_text_with_dict(text, dict, splited=[])
solutions = []
dict.each do |word|
if text.start_with? word
new_text = text.delete_prefix word
new_splited = splited.dup<< word
if new_text.empty?
solutions << new_splited
else
sols = split_text_with_dict(new_text, dict, new_splited)
sols.each { |s| solutions << s }
end
end
end
return solutions
end
</syntaxhighlight>
{{out}}
<pre>
dict = ["a", "abc", "b", "bc", "cd"]
abcd => a b cd
abbc => a b bc
abcbcd
=> a bc b cd
=> abc b cd
acdbc => a cd bc
abcdd => No solution
</pre>
=={{header|Rust}}==
===Dynamic programming===
<
fn create_string(s: &str, v: Vec<Option<usize>>) -> String {
let mut idx = s.len();
Line 1,893 ⟶ 2,020:
set.insert("b");
println!("{:?}", word_break("abcd", set).unwrap());
}</
{{output}}
<pre>"a b cd"</pre>
Line 1,900 ⟶ 2,027:
===First solution===
Finds all possible solutions recursively, using a trie representation of the dictionary:
<
def add(s: String): TrieNode = s match {
case "" => copy(isWord = true)
Line 1,933 ⟶ 2,060:
def wordBreak(s: String, dict: TrieNode): List[List[String]] = {
wordBreakRec(s, dict, dict, "")
}</
Calling it with some example strings:
<
val testCases = List("abcd", "abbc", "abcbcd", "acdbc", "abcdd")
for (s <- testCases) {
Line 1,943 ⟶ 2,070:
println("\t" + words.mkString(" "))
}
}</
{{out}}
<pre>abcd has 1 solution(s):
Line 1,957 ⟶ 2,084:
===Combined solution===
{{Out}}Best seen running in your browser either by [https://scalafiddle.io/sf/49YwsD5/1 ScalaFiddle (ES aka JavaScript, non JVM)] or [https://scastie.scala-lang.org/L47OzgmjSkOnQ5wfrZ5snQ Scastie (remote JVM)].
<
val dict = buildTrie("a", "bc", "abc", "cd", "b")
lazy val empty = TrieNode(isWord = false, Map.empty) // lazy or in a companion object
Line 2,006 ⟶ 2,133:
})
}</
=={{header|Seed7}}==
<
const func boolean: wordBreak (in string: stri, in array string: words, in string: resultList) is func
Line 2,042 ⟶ 2,169:
end if;
end for;
end func;</
{{out}}
Line 2,056 ⟶ 2,183:
=={{header|Sidef}}==
{{trans|zkl}}
<
var r = ->(str, arr=[]) {
Line 2,078 ⟶ 2,205:
for str in (strs) {
printf("%11s: %s\n", str, word_break(str, words) \\ '(not possible)')
}</
{{out}}
<pre>
Line 2,095 ⟶ 2,222:
{{trans|Rust}}
<
@inlinable
Line 2,166 ⟶ 2,293:
print("\(test):")
print(" \(wordBreak(str: test, dict: words) ?? "did not parse with given words")")
}</
{{out}}
Line 2,183 ⟶ 2,310:
=={{header|Tailspin}}==
Does a depth-first search (after generating all possibilities on the current level). We could stop looking further after one is found, just add a condition to do nothing if a done-flag is set.
<
templates wordBreaks&{dict:}
composer starts&{with:}
Line 2,208 ⟶ 2,335:
'ababab' -> wordBreaks&{dict: ['a', 'ab', 'bab']} -> !OUT::write
'abcbab' -> wordBreaks&{dict: ['a', 'ab', 'bab']} -> !OUT::write
</syntaxhighlight>
{{out}}
<pre>
Line 2,222 ⟶ 2,349:
=={{header|Visual Basic .NET}}==
{{trans|Java}}
<
Structure Node
Line 2,294 ⟶ 2,421:
End Sub
End Module</
{{out}}
<pre>String = aab, Dictionary = System.Collections.Generic.List`1[System.String]. Solutions = 4
Line 2,327 ⟶ 2,454:
{{trans|Go}}
{{libheader|Wren-fmt}}
<
class Prefix {
Line 2,366 ⟶ 2,493:
System.print("can't break")
}
}</
{{out}}
Line 2,378 ⟶ 2,505:
=={{header|zkl}}==
<
words=words.split(" "); // to list of words
r:=fcn(str,words,sink){ // recursive search, easy to collect answer
Line 2,392 ⟶ 2,519:
if(False==r) return("not possible");
r.reverse().concat(" ")
}</
<
println(text,": ",wordBreak(text,"a bc abc cd b"))
}</
{{out}}
<pre>
|