N-grams

From Rosetta Code
Revision as of 14:21, 21 April 2023 by Proton2 (talk | contribs) (Created task)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters. The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text. For example, the 2-grams of the text "Live and let live" are:

 "LI" - 2
 "IV" - 2
 "VE" - 2
 " L" - 2
 "E " - 1
 " A" - 1
 "AN" - 1
 "ND" - 1
 "D " - 1
 "LE" - 1
 "ET" - 1
 "T " - 1

Note that space and other non-alphanumeric characters are taken into account.