Jaro-Winkler distance

The Jaro-Winkler distance is a metric for measuring the edit distance between words. It is similar to the more basic Levenstein distance but the Jaro distance also accounts for transpositions between letters in the words. With the Winkler modification to the Jaro metric, the Jaro-Winkler distance also adds an increase in similarity for words which start with the same letters (prefix).

The Jaro-Winkler distance is a modification of the Jaro similarity metric, which measures the similarity between two strings. The Jaro similarity is 1.0 when strings are identical and 0 when strings have no letters in common. Distance measures such as the Jaro distance or Jaro-Winkler distance, on the other hand, are 0 when strings are identical and 1 when they have no letters in common.

The Jaro similarity between two strings s1 and s2, sim_j, is defined as

sim_j = 0 if m is 0.

sim_j = ( (m / length of s1) + (m / length of s2) + (m - t) / m ) / 3 otherwise.

Where:

$m$ is the number of matching characters (the same character within max(|s1|, |s2|)/2 - 1 of one another);
$t$ is half the number of transpositions (a shared character placed in different positions).

The Winkler modification to Jaro is to check for identical prefixes of the strings.

If we define the number of initial (prefix) characters in common as:

l = the length of a common prefix between strings, up to 4 characters

and, additionally, select a multiplier (Winkler suggested 0.1) for the relative importance of the prefix for the word similarity:

p = 0.1

The Jaro-Winkler similarity can then be defined as

sim_w = sim_j + lp(1 - sim_j)

Where:

sim_j is the Jaro similarity.
l is the number of matching characters at the beginning of the strings, up to 4.
p is a factor to modify the amount to which the prefix similarity affects the metric.

Winkler suggested this be 0.1.

The Jaro-Winkler distance between strings, which is 0.0 for identical strings, is then defined as

d_w = 1 - sim_w

String metrics such as Jaro-Winkler distance are useful in applications such as spelling checkers, because letter transpositions are common typing errors and humans tend to misspell the middle portions of words more often than their beginnings. This may help a spelling checker program to generate better alternatives for misspelled word replacement.

The task

Using a dictionary of your choice and the following list of 9 commonly misspelled words:

"accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"

Calculate the Jaro-Winkler distance between the misspelled word and words in the dictionary.

Use this distance to list close alternatives (at least two per word) to the misspelled words.

Show the calculated distances between the misspelled words and their potential replacements.

See also

Wikipedia page: Jaro–Winkler distance.
Comparing string similarity algorithms. Comparison of algorithms on Medium

11l

Translation of: Python

<lang 11l>V WORDS = File(‘linuxwords.txt’).read_lines() V MISSPELLINGS = [‘accomodate’,

                 ‘definately’,
                 ‘goverment’]

F jaro_winkler_distance(=st1, =st2)

  I st1.len < st2.len
     (st1, st2) = (st2, st1)
  V len1 = st1.len
  V len2 = st2.len
  I len2 == 0
     R 0.0
  V delta = max(0, len2 I/ 2 - 1)
  V flag = (0 .< len2).map(_ -> 0B)
  [Char] ch1_match
  L(ch1) st1
     V idx1 = L.index
     L(ch2) st2
        V idx2 = L.index
        I idx2 <= idx1 + delta & idx2 >= idx1 - delta & ch1 == ch2 & !(flag[idx2])
           flag[idx2] = 1B
           ch1_match.append(ch1)
           L.break
  V matches = ch1_match.len
  I matches == 0
     R 1.0
  V transpositions = 0
  V idx1 = 0
  L(ch2) st2
     V idx2 = L.index
     I flag[idx2]
        transpositions += (ch2 != ch1_match[idx1])
        idx1++
  V jaro = (Float(matches) / len1 + Float(matches) / len2 + (matches - transpositions / 2) / matches) / 3.0
  V commonprefix = 0
  L(i) 0 .< min(4, len2)
     commonprefix += (st1[i] == st2[i])
  R 1.0 - (jaro + commonprefix * 0.1 * (1 - jaro))

F within_distance(maxdistance, stri, maxtoreturn)

  V arr = :WORDS.filter(w -> jaro_winkler_distance(@stri, w) <= @maxdistance)
  arr.sort(key' x -> jaro_winkler_distance(@stri, x))
  R I arr.len <= maxtoreturn {arr} E arr[0 .< maxtoreturn]

L(STR) MISSPELLINGS

  print("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to \" "STR" \" are:\n        Word   | Distance")
  L(w) within_distance(0.15, STR, 5)
     print(‘#14 | #.4’.format(w, jaro_winkler_distance(STR, w)))</lang>

Output:


Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate " are:
        Word   | Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately " are:
        Word   | Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment " are:
        Word   | Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Elm

Author: zh5 <lang Elm>module JaroWinkler exposing (similarity)

commonPrefixLength : List a -> List a -> Int -> Int commonPrefixLength xs ys counter =

   case ( xs, ys ) of
       ( x :: xs_, y :: ys_ ) ->
           if x == y then
               commonPrefixLength xs_ ys_ (counter + 1)

           else
               counter

       _ ->
           counter

similarity : String -> String -> Float similarity s1 s2 =

   let
       chars1 =
           String.toList s1

       chars2 =
           String.toList s2

       jaroScore =
           jaro chars1 chars2

       l =
           toFloat <| min (commonPrefixLength chars1 chars2 0) 4

       p =
           0.1
   in
   jaroScore + (l * p * (1.0 - jaroScore))

containtsInNextN : Int -> a -> List a -> Bool containtsInNextN i a items =

   case ( i, items ) of
       ( 0, _ ) ->
           False

       ( _, [] ) ->
           False

       ( _, item :: rest ) ->
           if item == a then
               True

           else
               containtsInNextN (i - 1) a rest

exists : Int -> Int -> List a -> a -> Bool exists startAt endAt items i =

   if endAt < startAt then
       False

   else if startAt == 0 then
       case items of
           first :: rest ->
               if i == first then
                   True

               else
                   exists 0 (endAt - 1) rest i

           [] ->
               False

   else
       exists 0 (endAt - startAt) (List.drop startAt items) i

existsInWindow : a -> List a -> Int -> Int -> Bool existsInWindow item items offset radius =

   let
       startAt =
           max 0 (offset - radius)

       endAt =
           min (offset + radius) (List.length items - 1)
   in
   exists startAt endAt items item

transpositions : List a -> List a -> Int -> Int transpositions xs ys counter =

   case ( xs, ys ) of
       ( [], _ ) ->
           counter

       ( _, [] ) ->
           counter

       ( x :: xs_, y :: ys_ ) ->
           if x /= y then
               transpositions xs_ ys_ (counter + 1)

           else
               transpositions xs_ ys_ counter

commonItems : List a -> List a -> Int -> List a commonItems items1 items2 radius =

   items1
       |> List.indexedMap
           (\index item ->
               if existsInWindow item items2 index radius then
                   [ item ]

               else
                   []
           )
       |> List.concat

jaro : List Char -> List Char -> Float jaro chars1 chars2 =

   let
       minLenth =
           min (List.length chars1) (List.length chars2)

       matchRadius =
           minLenth // 2 + (minLenth |> modBy 2)

       c1 =
           commonItems chars1 chars2 matchRadius

       c2 =
           commonItems chars2 chars1 matchRadius

       c1length =
           toFloat (List.length c1)

       c2length =
           toFloat (List.length c2)

       mismatches =
           transpositions c1 c2 0

       transpositionScore =
           (toFloat mismatches + abs (c1length - c2length)) / 2.0

       s1length =
           toFloat (List.length chars1)

       s2length =
           toFloat (List.length chars2)

       tLength =
           max c1length c2length

       result =
           (c1length / s1length + c2length / s2length + (tLength - transpositionScore) / tLength) / 3.0
   in
   if isNaN result then
       0.0

   else
       result

</lang>

ALGOL 68

Works with: ALGOL 68G version Any - tested with release 2.8.3.win32

Translation of: Wren

- the actual distance routines are translated from the Wren sample, the file reading and asociative arrays etc. are based on similar Algol 68 task solutions.

Uses unixdict.txt - possibly a different version to those used by some other solutions, as this finds a slightly different list of matches for "seperate" (assuming I got the translation correct!).
Prints the 6 closest matches regarddless of their distance (i.e. we don't restrict it to matches closer that 0.15). <lang algol68>PROC jaro sim = ( STRING sp1, sp2 )REAL:

    IF   STRING s1 = sp1[ AT 0 ];
         STRING s2 = sp2[ AT 0 ];
         INT le1   = ( UPB s1 - LWB s1 ) + 1;
         INT le2   = ( UPB s2 - LWB s2 ) + 1;
         le1 < 1 AND le2 < 1
    THEN # both strings are empty #         1
    ELIF le1 < 1 OR  le2 < 1
    THEN # one of the strings is empty #    0
    ELSE # both strings are non-empty #
       INT dist := IF le2 > le1 THEN le2 ELSE le1 FI;
       dist OVERAB 2 -:= 1;
       [ 0 : le1 ]BOOL matches1; FOR i FROM LWB matches1 TO UPB matches1 DO matches1[ i ] := FALSE OD;
       [ 0 : le2 ]BOOL matches2; FOR i FROM LWB matches2 TO UPB matches2 DO matches2[ i ] := FALSE OD;
       INT matches  := 0;
       INT transpos := 0;
       FOR i FROM LWB s1 TO UPB s1 DO
           INT start := i - dist;
           IF  start < 0 THEN start := 0 FI;
           INT end   := i + dist + 1;
           IF  end > le2 THEN end := le2 FI;
           FOR k FROM start TO end - 1
           WHILE IF matches2[ k ]
                 THEN TRUE
                 ELIF s1[ i ] /= s2[ k ]
                 THEN TRUE
                 ELSE
                     matches2[ k ] := matches1[ i ] := TRUE;
                     matches +:= 1;
                     FALSE
                 FI
           DO SKIP OD
       OD;
       IF matches = 0
       THEN 0
       ELSE
           INT k := 0;
           FOR i FROM LWB s1 TO UPB s1 DO
               IF matches1[ i ] THEN
                   WHILE NOT matches2[ k ] DO k +:= 1 OD;
                   IF s1[ i ] /= s2[ k ] THEN transpos +:= 1 FI;
                   k +:= 1
               FI
           OD;
           transpos OVERAB 2;
           ( ( matches / le1 )
           + ( matches / le2 )
           + ( ( matches - transpos ) / matches )
           ) / 3
       FI
    FI # jaro sim # ;

PROC jaro winkler dist = ( STRING sp, tp )REAL:

    BEGIN
       STRING s  = sp[ AT 0 ];
       STRING t  = tp[ AT 0 ];
       INT  ls = ( UPB s - LWB s ) + 1;
       INT  lt = ( UPB t - LWB t ) + 1;
       INT  l max := IF ls < lt THEN ls ELSE lt FI;
       IF   l max > 4 THEN l max := 4 FI;
       INT  l := 0;
       FOR  i FROM 0 TO l max - 1 DO IF s[ i ] = t[ i ] THEN l +:= 1 FI OD;
       REAL js = jaro sim( s, t );
       REAL p  = 0.1;
       REAL ws = js + ( l * p * ( 1 - js ) );
       1 - ws
    END # jaro winkler dist # ;

include the Associative Array code #

PR read "aArray.a68" PR

test cases #

[]STRING misspelt = ( "accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich" ); IF FILE input file;

   STRING file name = "unixdict.txt";
   open( input file, file name, stand in channel ) /= 0

THEN

   # failed to open the file #
   print( ( "Unable to open """ + file name + """", newline ) )

ELSE

   # file opened OK #
   BOOL at eof := FALSE;
   # set the EOF handler for the file #
   on logical file end( input file, ( REF FILE f )BOOL:
                                    BEGIN
                                        # note that we reached EOF on the #
                                        # latest read #
                                        at eof := TRUE;
                                        # return TRUE so processing can continue #
                                        TRUE
                                    END
                      );
   REF AARRAY words := INIT LOC AARRAY;
   STRING word;
   WHILE NOT at eof
   DO
       STRING word;
       get( input file, ( word, newline ) );
       words // word := ""
   OD;
   # close the file #
   close( input file );
   # look for near matches to the misspelt words #
   INT max closest = 6; # max number of closest matches to show #
   FOR m pos FROM LWB misspelt TO UPB misspelt DO
       [ max closest ]STRING closest word;
       [ max closest ]REAL   closest jwd;
       FOR i TO max closest DO closest word[ i ] := ""; closest jwd[ i ] := 999 999 999 OD; 
       REF AAELEMENT e := FIRST words;
       WHILE e ISNT nil element DO
           STRING word = key OF e;
           REAL jwd = jaro winkler dist( misspelt[ m pos ], word );
           BOOL found better match := FALSE;
           FOR i TO max closest WHILE NOT found better match DO
               IF jwd <= closest jwd[ i ] THEN
                   # found a new closer match #
                   found better match := TRUE;
                   # shuffle the others down 1 and insert the new match #
                   FOR j FROM max closest BY - 1 TO i + 1 DO
                       closest word[ j ] := closest word[ j - 1 ];
                       closest jwd[  j ] := closest jwd[  j - 1 ]
                   OD;
                   closest word[ i ] := word;
                   closest jwd[  i ] := jwd
               FI
           OD;
           e := NEXT words
       OD;
       print( ( "Misspelt word: ", misspelt[ m pos ], ":", newline ) );
       FOR i TO max closest DO
           print( ( fixed( closest jwd[ i ], -8, 4 ), " ", closest word[ i ], newline ) )
       OD;
       print( ( newline ) )
   OD

FI</lang>

Output:

Misspelt word: accomodate:
  0.0182 accommodate
  0.1044 accordant
  0.1136 accolade
  0.1219 acclimate
  0.1327 accompanist
  0.1333 accost

Misspelt word: definately:
  0.0800 define
  0.0850 definite
  0.0886 defiant
  0.1200 definitive
  0.1219 designate
  0.1267 deflate

Misspelt word: goverment:
  0.0667 govern
  0.1167 governor
  0.1175 governess
  0.1330 governance
  0.1361 coverlet
  0.1367 sovereignty

Misspelt word: occured:
  0.0250 occurred
  0.0571 occur
  0.0952 occurrent
  0.1056 occlude
  0.1217 concurred
  0.1429 cure

Misspelt word: publically:
  0.0800 public
  0.1325 pullback
  0.1327 publication
  0.1400 pull
  0.1556 pulley
  0.1571 publish

Misspelt word: recieve:
  0.0333 receive
  0.0667 relieve
  0.0762 reeve
  0.0852 recessive
  0.0852 receptive
  0.0905 recipe

Misspelt word: seperate:
  0.0708 desperate
  0.0917 separate
  0.1042 temperate
  0.1048 repartee
  0.1167 sewerage
  0.1167 selenate

Misspelt word: untill:
  0.0333 until
  0.1111 till
  0.1333 huntsville
  0.1357 instill
  0.1422 unital
  0.1511 unilateral

Misspelt word: wich:
  0.0533 witch
  0.0533 winch
  0.0600 which
  0.0857 wichita
  0.1111 twitch
  0.1111 switch

C++

Translation of: Swift

<lang cpp>#include <algorithm>

include <cstdlib>
include <fstream>
include <iomanip>
include <iostream>
include <string>
include <vector>

auto load_dictionary(const std::string& path) {

   std::ifstream in(path);
   if (!in)
       throw std::runtime_error("Cannot open file " + path);
   std::string line;
   std::vector<std::string> words;
   while (getline(in, line))
       words.push_back(line);
   return words;

}

double jaro_winkler_distance(std::string str1, std::string str2) {

   size_t len1 = str1.size();
   size_t len2 = str2.size();
   if (len1 < len2) {
       std::swap(str1, str2);
       std::swap(len1, len2);
   }
   if (len2 == 0)
       return len1 == 0 ? 0.0 : 1.0;
   size_t delta = std::max(size_t(1), len1/2) - 1;
   std::vector<bool> flag(len2, false);
   std::vector<char> ch1_match;
   ch1_match.reserve(len1);
   for (size_t idx1 = 0; idx1 < len1; ++idx1) {
       char ch1 = str1[idx1];
       for (size_t idx2 = 0; idx2 < len2; ++idx2) {
           char ch2 = str2[idx2];
           if (idx2 <= idx1 + delta && idx2 + delta >= idx1
               && ch1 == ch2 && !flag[idx2]) {
               flag[idx2] = true;
               ch1_match.push_back(ch1);
               break;
           }
       }
   }
   size_t matches = ch1_match.size();
   if (matches == 0)
       return 1.0;
   size_t transpositions = 0;
   for (size_t idx1 = 0, idx2 = 0; idx2 < len2; ++idx2) {
       if (flag[idx2]) {
           if (str2[idx2] != ch1_match[idx1])
               ++transpositions;
           ++idx1;
       }
   }
   double m = matches;
   double jaro = (m/len1 + m/len2 + (m - transpositions/2.0)/m)/3.0;
   size_t common_prefix = 0;
   len2 = std::min(size_t(4), len2);
   for (size_t i = 0; i < len2; ++i) {
       if (str1[i] == str2[i])
           ++common_prefix;
   }
   return 1.0 - (jaro + common_prefix * 0.1 * (1.0 - jaro));

}

auto within_distance(const std::vector<std::string>& words,

                    double max_distance, const std::string& str,
                    size_t max_to_return) {
   using pair = std::pair<std::string, double>;
   std::vector<pair> result;
   for (const auto& word : words) {
       double jaro = jaro_winkler_distance(word, str);
       if (jaro <= max_distance)
           result.emplace_back(word, jaro);
   }
   std::stable_sort(result.begin(), result.end(),
       [](const pair& p1, const pair& p2) { return p1.second < p2.second; });
   if (result.size() > max_to_return)
       result.resize(max_to_return);
   return result;

}

int main() {

   try {
       auto words(load_dictionary("linuxwords.txt"));
       std::cout << std::fixed << std::setprecision(4);
       for (auto str : {"accomodate", "definately", "goverment",
                           "occured", "publically", "recieve",
                           "seperate", "untill", "wich"}) {
           std::cout << "Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '"
               << str << "' are:\n        Word   |  Distance\n";
           for (const auto& pair : within_distance(words, 0.15, str, 5)) {
               std::cout << std::setw(14) << pair.first << " | "
                   << std::setw(6) << pair.second << '\n';
           }
           std::cout << '\n';
       }
   } catch (const std::exception& ex) {
       std::cerr << ex.what() << '\n';
       return EXIT_FAILURE;
   }
   return EXIT_SUCCESS;

}</lang>

Output:

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
        Word   |  Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
        Word   |  Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
        Word   |  Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
        Word   |  Distance
      occurred | 0.0250
         occur | 0.0571
      occupied | 0.0786
        occurs | 0.0905
      accursed | 0.0917

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
        Word   |  Distance
      publicly | 0.0400
        public | 0.0800
     publicity | 0.1044
   publication | 0.1327
    biblically | 0.1400

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
        Word   |  Distance
       receive | 0.0333
      received | 0.0625
      receiver | 0.0625
      receives | 0.0625
       relieve | 0.0667

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
        Word   |  Distance
     desperate | 0.0708
      separate | 0.0917
     temperate | 0.1042
     separated | 0.1144
     separates | 0.1144

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
        Word   |  Distance
         until | 0.0333
         untie | 0.1067
      untimely | 0.1083
          till | 0.1111
      Antilles | 0.1264

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
        Word   |  Distance
         witch | 0.0533
         which | 0.0600
        switch | 0.1111
        twitch | 0.1111
       witches | 0.1143

F#

This task uses Jaro Distance (F#) <lang fsharp> // Calculate Jaro-Winkler Similarity of 2 Strings. Nigel Galloway: August 7th., 2020 let Jw P n g=let L=float(let i=Seq.map2(fun n g->n=g) n g in (if Seq.length i>4 then i|>Seq.take 4 else i)|>Seq.takeWhile id|>Seq.length)

            let J=J n g in J+P*L*(1.0-J)

let dict=System.IO.File.ReadAllLines("linuxwords.txt") let fN g=let N=Jw 0.1 g in dict|>Array.map(fun n->(n,1.0-(N n)))|>Array.sortBy snd ["accomodate";"definately";"goverment";"occured";"publically";"recieve";"seperate";"untill";"wich"]|>

 List.iter(fun n->printfn "%s" n;fN n|>Array.take 5|>Array.iter(fun n->printf "%A" n);printfn "\n")

</lang>

Output:

accomodate
("accommodate", 0.01818181818)("accommodated", 0.03333333333)("accommodates", 0.03333333333)("accommodation", 0.08153846154)("accommodating", 0.08153846154)

definately
("definitely", 0.04)("defiantly", 0.04222222222)("define", 0.08)("definite", 0.085)("definable", 0.08722222222)

goverment
("government", 0.05333333333)("govern", 0.06666666667)("governments", 0.0696969697)("governmental", 0.08333333333)("governs", 0.09523809524)

occured
("occurred", 0.025)("occur", 0.05714285714)("occupied", 0.07857142857)("occurs", 0.09047619048)("cured", 0.09523809524)

publically
("publicly", 0.04)("public", 0.08)("publicity", 0.1044444444)("publication", 0.1327272727)("politically", 0.1418181818)

recieve
("receive", 0.03333333333)("received", 0.0625)("receives", 0.0625)("receiver", 0.0625)("relieve", 0.07619047619)

seperate
("desperate", 0.0787037037)("separate", 0.09166666667)("separated", 0.1143518519)("separates", 0.1143518519)("temperate", 0.1157407407)

untill
("until", 0.03333333333)("untie", 0.1066666667)("untimely", 0.1083333333)("till", 0.1111111111)("Huntsville", 0.1333333333)

wich
("witch", 0.05333333333)("which", 0.06)("switch", 0.1111111111)("twitch", 0.1111111111)("witches", 0.1142857143)

Go

This uses unixdict and borrows code from the Jaro_distance#Go task. Otherwise it is a translation of the Wren entry. <lang go>package main

import (

   "bytes"
   "fmt"
   "io/ioutil"
   "log"
   "sort"

)

func jaroSim(str1, str2 string) float64 {

   if len(str1) == 0 && len(str2) == 0 {
       return 1
   }
   if len(str1) == 0 || len(str2) == 0 {
       return 0
   }
   match_distance := len(str1)
   if len(str2) > match_distance {
       match_distance = len(str2)
   }
   match_distance = match_distance/2 - 1
   str1_matches := make([]bool, len(str1))
   str2_matches := make([]bool, len(str2))
   matches := 0.
   transpositions := 0.
   for i := range str1 {
       start := i - match_distance
       if start < 0 {
           start = 0
       }
       end := i + match_distance + 1
       if end > len(str2) {
           end = len(str2)
       }
       for k := start; k < end; k++ {
           if str2_matches[k] {
               continue
           }
           if str1[i] != str2[k] {
               continue
           }
           str1_matches[i] = true
           str2_matches[k] = true
           matches++
           break
       }
   }
   if matches == 0 {
       return 0
   }
   k := 0
   for i := range str1 {
       if !str1_matches[i] {
           continue
       }
       for !str2_matches[k] {
           k++
       }
       if str1[i] != str2[k] {
           transpositions++
       }
       k++
   }
   transpositions /= 2
   return (matches/float64(len(str1)) +
       matches/float64(len(str2)) +
       (matches-transpositions)/matches) / 3

}

func jaroWinklerDist(s, t string) float64 {

   ls := len(s)
   lt := len(t)
   lmax := lt
   if ls < lt {
       lmax = ls
   }
   if lmax > 4 {
       lmax = 4
   }
   l := 0
   for i := 0; i < lmax; i++ {
       if s[i] == t[i] {
           l++
       }
   }
   js := jaroSim(s, t)
   p := 0.1
   ws := js + float64(l)*p*(1-js)
   return 1 - ws

}

type wd struct {

   word string
   dist float64

}

func main() {

   misspelt := []string{
       "accomodate", "definately", "goverment", "occured", "publically",
       "recieve", "seperate", "untill", "wich",
   }
   b, err := ioutil.ReadFile("unixdict.txt")
   if err != nil {
       log.Fatal("Error reading file")
   }
   words := bytes.Fields(b)
   for _, ms := range misspelt {
       var closest []wd
       for _, w := range words {
           word := string(w)
           if word == "" {
               continue
           }
           jwd := jaroWinklerDist(ms, word)
           if jwd < 0.15 {
               closest = append(closest, wd{word, jwd})
           }
       }
       fmt.Println("Misspelt word:", ms, ":")
       sort.Slice(closest, func(i, j int) bool { return closest[i].dist < closest[j].dist })
       for i, c := range closest {
           fmt.Printf("%0.4f %s\n", c.dist, c.word)
           if i == 5 {
               break
           }
       }
       fmt.Println()
   }

}</lang>

Output:

Misspelt word: accomodate :
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord

Misspelt word: definately :
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate

Misspelt word: goverment :
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty

Misspelt word: occured :
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure

Misspelt word: publically :
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback

Misspelt word: recieve :
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife

Misspelt word: seperate :
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sewerage
0.1167 sept

Misspelt word: untill :
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital

Misspelt word: wich :
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch

Java

Translation of: C++

<lang java>import java.io.*; import java.util.*;

public class JaroWinkler {

   public static void main(String[] args) {
       try {
           List<String> words = loadDictionary("linuxwords.txt");
           String[] strings = {
               "accomodate", "definately", "goverment", "occured",
               "publically", "recieve", "seperate", "untill", "wich"
           };
           for (String string : strings) {
               System.out.printf("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '%s' are:\n"
                                   + "        Word   |  Distance\n", string);
               for (StringDistance s : withinDistance(words, 0.15, string, 5)) {
                   System.out.printf("%14s | %.4f\n", s.word, s.distance);
               }
               System.out.println();
           }
       } catch (Exception e) {
           e.printStackTrace();
       }
   }

   private static class StringDistance implements Comparable<StringDistance> {
       private StringDistance(String word, double distance) {
           this.word = word;
           this.distance = distance;
       }
       public int compareTo(StringDistance s) {
           return Double.compare(distance, s.distance);
       }
       private String word;
       private double distance;
   }

   private static List<StringDistance> withinDistance(List<String> words,
                       double maxDistance, String string, int max) {
       List<StringDistance> result = new ArrayList<>();
       for (String word : words) {
           double distance = jaroWinklerDistance(word, string);
           if (distance <= maxDistance)
               result.add(new StringDistance(word, distance));
       }
       Collections.sort(result);
       if (result.size() > max)
           result = result.subList(0, max);
       return result;
   }

   private static double jaroWinklerDistance(String string1, String string2) {
       int len1 = string1.length();
       int len2 = string2.length();
       if (len1 < len2) {
           String s = string1;
           string1 = string2;
           string2 = s;
           int tmp = len1;
           len1 = len2;
           len2 = tmp;
       }
       if (len2 == 0)
           return len1 == 0 ? 0.0 : 1.0;
       int delta = Math.max(1, len1 / 2) - 1;
       boolean[] flag = new boolean[len2];
       Arrays.fill(flag, false);
       char[] ch1Match = new char[len1];
       int matches = 0;
       for (int i = 0; i < len1; ++i) {
           char ch1 = string1.charAt(i);
           for (int j = 0; j < len2; ++j) {
               char ch2 = string2.charAt(j);
               if (j <= i + delta && j + delta >= i && ch1 == ch2 && !flag[j]) {
                   flag[j] = true;
                   ch1Match[matches++] = ch1;
                   break;
               }
           }
       }
       if (matches == 0)
           return 1.0;
       int transpositions = 0;
       for (int i = 0, j = 0; j < len2; ++j) {
           if (flag[j]) {
               if (string2.charAt(j) != ch1Match[i])
                   ++transpositions;
               ++i;
           }
       }
       double m = matches;
       double jaro = (m / len1 + m / len2 + (m - transpositions / 2.0) / m) / 3.0;
       int commonPrefix = 0;
       len2 = Math.min(4, len2);
       for (int i = 0; i < len2; ++i) {
           if (string1.charAt(i) == string2.charAt(i))
               ++commonPrefix;
       }
       return 1.0 - (jaro + commonPrefix * 0.1 * (1.0 - jaro));
   }

   private static List<String> loadDictionary(String path) throws IOException {
       try (BufferedReader reader = new BufferedReader(new FileReader(path))) {
           List<String> words = new ArrayList<>();
           String word;
           while ((word = reader.readLine()) != null)
               words.add(word);
           return words;
       }
   }

}</lang>

Output:

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
        Word   |  Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
        Word   |  Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
        Word   |  Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
        Word   |  Distance
      occurred | 0.0250
         occur | 0.0571
      occupied | 0.0786
        occurs | 0.0905
      accursed | 0.0917

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
        Word   |  Distance
      publicly | 0.0400
        public | 0.0800
     publicity | 0.1044
   publication | 0.1327
    biblically | 0.1400

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
        Word   |  Distance
       receive | 0.0333
      received | 0.0625
      receiver | 0.0625
      receives | 0.0625
       relieve | 0.0667

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
        Word   |  Distance
     desperate | 0.0708
      separate | 0.0917
     temperate | 0.1042
     separated | 0.1144
     separates | 0.1144

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
        Word   |  Distance
         until | 0.0333
         untie | 0.1067
      untimely | 0.1083
          till | 0.1111
      Antilles | 0.1264

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
        Word   |  Distance
         witch | 0.0533
         which | 0.0600
        switch | 0.1111
        twitch | 0.1111
       witches | 0.1143

jq

Works with: jq

Works with gojq, the Go implementation of jq

This entry, which uses unixdict.txt, borrows the implementation in jq of the Jaro similarity measure as defined at Jaro_similarity#jq; since it is quite long, it is not repeated here. <lang jq># See Jaro_similarity#jq for the implementation of jaro/2

def length_of_common_prefix($s1; $s2):

 if ($s1|length) > ($s2|length) then length_of_common_prefix($s2; $s1)
 else ($s1|explode) as $x1
 | ($s2|explode) as $x2
 | first( range(0;$x1|length) | select( $x1[.] != $x2[.] )) // ($x1|length)
 end;

Output: the Jaro-WInkler distance using 0.1 as the common-prefix multiplier

def jaro_winkler($s1; $s2):

 if $s1 == $s2 then 0
 else jaro($s1; $s2) as $j
 | length_of_common_prefix($s1[:4]; $s2[:4]) as $l
 | 1 - ($j + 0.1 * $l * (1 - $j))
 end ;

Input: an array of words
Output: [[match, distance] ...]

def candidates($word; $threshold):

 map(jaro_winkler($word; . ) as $x | select($x <= $threshold) | [., $x] );

def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .;

def task:

 [inputs] # the dictionary
 | ("accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich") as $word
 | candidates($word; 0.15) | sort_by(.[-1]) | .[:5]
 | "Matches for \($word|lpad(10)): Distance",
   (.[] | "\(.[0] | lpad(21)) : \(.[-1] * 1000 | round / 1000)") ;

task</lang>

Output:

Invocation: jq -rRn -f program.jq unixdict.txt

Matches for accomodate: Distance
          accommodate : 0.018
            accordant : 0.104
             accolade : 0.114
            acclimate : 0.122
          accompanist : 0.133
Matches for definately: Distance
               define : 0.08
             definite : 0.085
              defiant : 0.089
           definitive : 0.12
              deflate : 0.127
Matches for goverment: Distance
               govern : 0.08
             governor : 0.13
            governess : 0.133
           governance : 0.149
Matches for    occured: Distance
             occurred : 0.025
                occur : 0.057
            occurrent : 0.095
              occlude : 0.106
            concurred : 0.122
Matches for publically: Distance
               public : 0.08
          publication : 0.133
Matches for   recieve: Distance
              receive : 0.063
                reeve : 0.1
              relieve : 0.105
               recife : 0.108
               recipe : 0.108
Matches for   seperate: Distance
            desperate : 0.079
             separate : 0.092
            temperate : 0.116
                 sept : 0.117
              septate : 0.131
Matches for     untill: Distance
                until : 0.033
                 till : 0.111
           huntsville : 0.133
               unital : 0.142
Matches for      wich: Distance
                winch : 0.107
                witch : 0.107
                which : 0.12
              wichita : 0.126

Julia

<lang julia># download("http://users.cs.duke.edu/~ola/ap/linuxwords", "linuxwords.txt") const words = read("linuxwords.txt", String) |> split .|> strip

function jarowinklerdistance(s1, s2)

   if length(s1) < length(s2)
       s1, s2 = s2, s1
   end
   len1, len2 = length(s1), length(s2)
   len2 == 0 && return 0.0
   delta = max(0, len2 ÷ 2 - 1)
   flag = zeros(Bool, len2)  # flags for possible transpositions, begin as false
   ch1_match = eltype(s1)[]
   for (i, ch1) in enumerate(s1)
       for (j, ch2) in enumerate(s2)
           if (j <= i + delta) && (j >= i - delta) && (ch1 == ch2) && !flag[j]
               flag[j] = true
               push!(ch1_match, ch1)
               break
           end
       end
   end
   matches = length(ch1_match)
   matches == 0 && return 1.0
   transpositions, i = 0, 0
   for (j, ch2) in enumerate(s2)
       if flag[j]
           i += 1
           transpositions += (ch2 != ch1_match[i])
       end
   end
   jaro = (matches / len1 + matches / len2 + (matches - transpositions/2) / matches) / 3.0
   commonprefix = count(i -> s1[i] == s2[i], 1:min(len2, 4))
   return 1 - (jaro + commonprefix * 0.1 * (1 - jaro))

end

function closewords(s, maxdistance, maxtoreturn)

   jw = 0.0
   arr = [(w, jw) for w in words if (jw = jarowinklerdistance(s, w)) <= maxdistance]
   sort!(arr, lt=(x, y) -> x[2] < y[2])
   return length(arr) <= maxtoreturn ? arr : arr[1:maxtoreturn]

end

for s in ["accomodate", "definately", "goverment", "occured", "publically",

   "recieve", "seperate", "untill", "wich"]
   println("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to '$s' are:")
   println("    Word      |  Distance")
   for (w, jw) in closewords(s, 0.15, 5)
       println(rpad(w, 14), "| ", Float16(jw))
   end

end

</lang>

Output:

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
    Word      |  Distance
accommodate   | 0.01819
accommodated  | 0.03333
accommodates  | 0.03333
accommodating | 0.08154
accommodation | 0.08154

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
    Word      |  Distance
definitely    | 0.04
defiantly     | 0.04224
define        | 0.08
definite      | 0.085
definable     | 0.0872

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
    Word      |  Distance
government    | 0.05334
govern        | 0.06665
governments   | 0.0697
movement      | 0.081
governmental  | 0.0833

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
    Word      |  Distance
occurred      | 0.025
occur         | 0.05713
occupied      | 0.07855
occurs        | 0.09045
accursed      | 0.0917

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
    Word      |  Distance
publicly      | 0.04
public        | 0.08
publicity     | 0.10443
publication   | 0.1327
biblically    | 0.14

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
    Word      |  Distance
receive       | 0.03333
received      | 0.0625
receiver      | 0.0625
receives      | 0.0625
relieve       | 0.06665

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
    Word      |  Distance
desperate     | 0.07086
separate      | 0.0917
temperate     | 0.1042
separated     | 0.1144
separates     | 0.1144

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
    Word      |  Distance
until         | 0.03333
untie         | 0.1067
untimely      | 0.10834
Antilles      | 0.1263
untidy        | 0.1333

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
    Word      |  Distance
witch         | 0.05334
which         | 0.06
witches       | 0.11426
rich          | 0.11664
wick          | 0.11664

Nim

Translation of: Go

<lang Nim>import lenientops

func jaroSim(s1, s2: string): float =

 if s1.len == 0 and s2.len == 0: return 1
 if s1.len == 0 or s2.len == 0: return 0

 let matchDistance = max(s1.len, s2.len) div 2 - 1
 var s1Matches = newSeq[bool](s1.len)
 var s2Matches = newSeq[bool](s2.len)
 var matches = 0
 for i in 0..s1.high:
   for j in max(0, i - matchDistance)..min(i + matchDistance, s2.high):
     if not s2Matches[j] and s1[i] == s2[j]:
       s1Matches[i] = true
       s2Matches[j] = true
       inc matches
       break
 if matches == 0: return 0

 var transpositions = 0.0
 var k = 0
 for i in ..s1.high:
   if not s1Matches[i]: continue
   while not s2Matches[k]: inc k
   if s1[i] != s2[k]: transpositions += 0.5
   inc k

 result = (matches / s1.len + matches / s2.len + (matches - transpositions) / matches) / 3

func jaroWinklerDist(s, t: string): float =

 let ls = s.len
 let lt = t.len
 var lmax = if ls < lt: ls else: lt
 if lmax > 4: lmax = 4
 var l = 0
 for i in 0..<lmax:
   if s[i] == t[i]: inc l
 let js = jaroSim(s, t)
 let p = 0.1
 let ws = js + float(l) * p * (1 - js)
 result = 1 - ws

when isMainModule:

 import algorithm, sequtils, strformat

 type Wd = tuple[word: string; dist: float]

 func `<`(w1, w2: Wd): bool =
   if w1.dist < w2.dist: true
   elif w1.dist == w2.dist: w1.word < w2.word
   else: false

 const Misspelt = ["accomodate", "definately", "goverment", "occured",
                   "publically", "recieve", "seperate", "untill", "wich"]

 let words = toSeq("unixdict.txt".lines)
 for ms in Misspelt:
   var closest: seq[Wd]
   for word in words:
     if word.len == 0: continue
     let jwd = jaroWinklerDist(ms, word)
     if jwd < 0.15:
       closest.add (word, jwd)
   echo "Misspelt word: ", ms, ":"
   closest.sort()
   for i, c in closest:
     echo &"{c.dist:0.4f} {c.word}"
     if i == 5: break
   echo()</lang>

Output:

Misspelt word: accomodate:
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord

Misspelt word: definately:
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate

Misspelt word: goverment:
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty

Misspelt word: occured:
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure

Misspelt word: publically:
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback

Misspelt word: recieve:
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife

Misspelt word: seperate:
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sept
0.1167 sewerage

Misspelt word: untill:
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital

Misspelt word: wich:
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch

Perl

<lang perl>use strict; use warnings; use List::Util qw(min max head);

sub jaro_winkler {

   my($s, $t) = @_;
   my(@s_matches, @t_matches, $matches);

   return 0 if $s eq $t;

   my $s_len = length $s; my @s = split //, $s;
   my $t_len = length $t; my @t = split //, $t;

   my $match_distance = int (max($s_len,$t_len)/2) - 1;

   for my $i (0 .. $#s) {
       my $start = max(0, $i - $match_distance);
       my $end   = min($i + $match_distance, $t_len - 1);
       for my $j ($start .. $end) {
           next if $t_matches[$j] or $s[$i] ne $t[$j];
           ($s_matches[$i], $t_matches[$j]) = (1, 1);
           $matches++ and last;
       }
   }
   return 1 unless $matches;

   my($k, $transpositions) = (0, 0);

   for my $i (0 .. $#s) {
       next unless $s_matches[$i];
       $k++ until  $t_matches[$k];
       $transpositions++ if $s[$i] ne $t[$k];
       $k++;
   }

   my $prefix = 0;
   $s[$_] eq $t[$_] and ++$prefix for 0 .. -1 + min 5, $s_len, $t_len;

   my $jaro = ($matches / $s_len + $matches / $t_len +
       (($matches - $transpositions / 2) / $matches)) / 3;

   1 - ($jaro + $prefix * .1 * ( 1 - $jaro) )

}

my @words = split /\n/, `cat ./unixdict.txt`;

for my $word (<accomodate definately goverment occured publically recieve seperate untill wich>) {

   my %J;
   $J{$_} = jaro_winkler($word, $_) for @words;
   print "\nClosest 5 dictionary words with a Jaro-Winkler distance < .15 from '$word':\n";
   printf "%15s : %0.4f\n", $_, $J{$_}
        for head 5, sort { $J{$a} <=> $J{$b} or $a cmp $b } grep { $J{$_} < 0.15 } keys %J;

}</lang>

Output:

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'accomodate':
    accommodate : 0.0152
      accordant : 0.1044
    accompanist : 0.1106
       accolade : 0.1136
  accompaniment : 0.1183

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'definately':
         define : 0.0667
       definite : 0.0708
        defiant : 0.0886
     definitive : 0.1000
      designate : 0.1219

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'goverment':
         govern : 0.0556
       governor : 0.0972
      governess : 0.0979
     governance : 0.1108
       coverlet : 0.1167

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'occured':
       occurred : 0.0208
          occur : 0.0476
      occurrent : 0.0794
        occlude : 0.1056
      occurring : 0.1217

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'publically':
         public : 0.0667
    publication : 0.1106
        publish : 0.1310
           pull : 0.1400
       pullback : 0.1492

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'recieve':
        receive : 0.0333
        relieve : 0.0571
          reeve : 0.0667
      receptive : 0.0852
      recessive : 0.0852

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'seperate':
      desperate : 0.0708
       separate : 0.0786
       sewerage : 0.1000
       repartee : 0.1083
       repeater : 0.1083

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'untill':
          until : 0.0278
           till : 0.1111
           tilt : 0.1111
     huntsville : 0.1333
        instill : 0.1357

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'wich':
          winch : 0.0533
          witch : 0.0533
          which : 0.0600
        wichita : 0.0857
         switch : 0.1111

Phix

Uses jaro() from Jaro_distance#Phix (reproduced below for your convenience) and the standard unix_dict()

function jaro(string str1, str2)
    str1 = trim(upper(str1))
    str2 = trim(upper(str2))
    integer len1 = length(str1),
            len2 = length(str2),
            match_distance = floor(max(len1,len2)/2)-1,
            match_count = 0,
            half_transposed = 0
 
    if len1==0 then return len2==0 end if
 
    -- count the number of matches
    sequence m1 = repeat(false,len1),
             m2 = repeat(false,len2)
    for i=1 to len1 do
        for k=max(1,i-match_distance)
           to min(len2,i+match_distance) do
            if not m2[k] then
                if str1[i]=str2[k] then
                    m1[i] = true
                    m2[k] = true
                    match_count += 1
                    exit
                end if
            end if
        end for
    end for
 
    if match_count==0 then return 0 end if
 
    -- count the number of half-transpositions
    integer k = 1
    for i=1 to len1 do
        if m1[i] then
            while not m2[k] do k += 1 end while
            half_transposed += (str1[i]!=str2[k])
            k += 1
        end if
    end for
    integer transpositions = floor(half_transposed/2),
            not_transposed = match_count - transpositions
    --
    -- return the average of:
    --   percentage/fraction of the first string matched,
    --   percentage/fraction of the second string matched, and
    --   percentage/fraction of matches that were not transposed.
    --
    return (match_count/len1 + 
            match_count/len2 + 
            not_transposed/match_count)/3
end function

with javascript_semantics
function jaroWinklerDist(string s, t)
    integer lm = min({length(s),length(t),4}),
            l = sum(sq_eq(s[1..lm],t[1..lm]))
    return (1-jaro(s, t))*(1-l*0.1)
end function
 
constant mispelt = {"accomodate", "definately", "goverment", "occured", 
                    "publically", "recieve", "seperate", "untill", "wich"},
         words = unix_dict()
sequence jwds = repeat(0,length(words))
for m=1 to length(mispelt) do
    string ms = mispelt[m]
    printf(1,"\nMisspelt word: %s :\n", ms)
    for w=1 to length(words) do
        jwds[w] = jaroWinklerDist(ms,words[w])
    end for
    sequence tags = custom_sort(jwds,tagset(length(words)))
    for j=1 to 6 do
        integer tj = tags[j]
--      if jwds[tj]>0.15 then exit end if
        printf(1,"%0.4f %s\n", {jwds[tj], words[tj]})
    end for
end for

Output identical to ~~Go/Wren~~ Algol68

Python

<lang python>""" Test Jaro-Winkler distance metric. linuxwords.txt is from http://users.cs.duke.edu/~ola/ap/linuxwords """

WORDS = [s.strip() for s in open("linuxwords.txt").read().split()] MISSPELLINGS = [

   "accomodate",
   "definately",
   "goverment",
   "occured",
   "publically",
   "recieve",
   "seperate",
   "untill",
   "wich",

]

def jaro_winkler_distance(st1, st2):

   """
   Compute Jaro-Winkler distance between two strings.
   """
   if len(st1) < len(st2):
       st1, st2 = st2, st1
   len1, len2 = len(st1), len(st2)
   if len2 == 0:
       return 0.0
   delta = max(0, len2 // 2 - 1)
   flag = [False for _ in range(len2)]  # flags for possible transpositions
   ch1_match = []
   for idx1, ch1 in enumerate(st1):
       for idx2, ch2 in enumerate(st2):
           if idx2 <= idx1 + delta and idx2 >= idx1 - delta and ch1 == ch2 and not flag[idx2]:
               flag[idx2] = True
               ch1_match.append(ch1)
               break

   matches = len(ch1_match)
   if matches == 0:
       return 1.0
   transpositions, idx1 = 0, 0
   for idx2, ch2 in enumerate(st2):
       if flag[idx2]:
           transpositions += (ch2 != ch1_match[idx1])
           idx1 += 1

   jaro = (matches / len1 + matches / len2 + (matches - transpositions/2) / matches) / 3.0
   commonprefix = 0
   for i in range(min(4, len2)):
       commonprefix += (st1[i] == st2[i])

   return 1.0 - (jaro + commonprefix * 0.1 * (1 - jaro))

def within_distance(maxdistance, stri, maxtoreturn):

   """
   Find words in WORDS of closeness to stri within maxdistance, return up to maxreturn of them.
   """
   arr = [w for w in WORDS if jaro_winkler_distance(stri, w) <= maxdistance]
   arr.sort(key=lambda x: jaro_winkler_distance(stri, x))
   return arr if len(arr) <= maxtoreturn else arr[:maxtoreturn]

for STR in MISSPELLINGS:

   print('\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to "',
         STR, '" are:\n        Word   |  Distance')
   for w in within_distance(0.15, STR, 5):
       print('{:>14} | {:6.4f}'.format(w, jaro_winkler_distance(STR, w)))

</lang>

Output:

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate " are:
        Word   |  Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately " are:
        Word   |  Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment " are:
        Word   |  Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " occured " are:
        Word   |  Distance
      occurred | 0.0250
         occur | 0.0571
      occupied | 0.0786
        occurs | 0.0905
      accursed | 0.0917

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " publically " are:
        Word   |  Distance
      publicly | 0.0400
        public | 0.0800
     publicity | 0.1044
   publication | 0.1327
    biblically | 0.1400

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " recieve " are:
        Word   |  Distance
       receive | 0.0333
      received | 0.0625
      receiver | 0.0625
      receives | 0.0625
       relieve | 0.0667

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " seperate " are:
        Word   |  Distance
     desperate | 0.0708
      separate | 0.0917
     temperate | 0.1042
     separated | 0.1144
     separates | 0.1144

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " untill " are:
        Word   |  Distance
         until | 0.0333
         untie | 0.1067
      untimely | 0.1083
      Antilles | 0.1264
        untidy | 0.1333

Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " wich " are:
        Word   |  Distance
         witch | 0.0533
         which | 0.0600
       witches | 0.1143
          rich | 0.1167
          wick | 0.1167

Raku

Works with: Rakudo version 2020.07

A minor modification of the Jaro distance task entry.

using the unixdict.txt file from www.puzzlers.org

<lang perl6>sub jaro-winkler ($s, $t) {

   return 0 if $s eq $t;

   my $s_len = + my @s = $s.comb;
   my $t_len = + my @t = $t.comb;

   my $match_distance = ($s_len max $t_len) div 2 - 1;

   my @s_matches;
   my @t_matches;
   my $matches = 0;

   for ^@s -> $i {

       my $start = 0 max $i - $match_distance;
       my $end = $i + $match_distance min ($t_len - 1);

       for $start .. $end -> $j {
           @t_matches[$j] and next;
           @s[$i] eq @t[$j] or next;
           @s_matches[$i] = 1;
           @t_matches[$j] = 1;
           $matches++;
           last;
       }
   }

   return 1 if $matches == 0;

   my $k              = 0;
   my $transpositions = 0;

   for ^@s -> $i {
       @s_matches[$i] or next;
       until @t_matches[$k] { ++$k }
       @s[$i] eq @t[$k] or ++$transpositions;
       ++$k;
   }

   my $prefix = 0;

   ++$prefix if @s[$_] eq @t[$_] for ^(min 4, $s_len, $t_len);

   my $jaro = ($matches / $s_len + $matches / $t_len +
       (($matches - $transpositions / 2) / $matches)) / 3;

   1 - ($jaro + $prefix * .1 * ( 1 - $jaro) )

}

my @words = './unixdict.txt'.IO.slurp.words;

for <accomodate definately goverment occured publically recieve seperate untill wich>

  -> $word {

  my %result = @words.race.map: { $_ => jaro-winkler($word, $_) };

  say "\nClosest 5 dictionary words with a Jaro-Winkler distance < .15 from $word:";

  printf "%15s : %0.4f\n", .key, .value for %result.grep({ .value < .15 }).sort({+.value, ~.key}).head(5);

}</lang>

Output:

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from accomodate:
    accommodate : 0.0182
      accordant : 0.1044
       accolade : 0.1136
      acclimate : 0.1219
    accompanist : 0.1327

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from definately:
         define : 0.0800
       definite : 0.0850
        defiant : 0.0886
     definitive : 0.1200
      designate : 0.1219

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from goverment:
         govern : 0.0667
       governor : 0.1167
      governess : 0.1175
     governance : 0.1330
       coverlet : 0.1361

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from occured:
       occurred : 0.0250
          occur : 0.0571
      occurrent : 0.0952
        occlude : 0.1056
      concurred : 0.1217

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from publically:
         public : 0.0800
    publication : 0.1327
           pull : 0.1400
       pullback : 0.1492

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from recieve:
        receive : 0.0333
        relieve : 0.0667
          reeve : 0.0762
      receptive : 0.0852
      recessive : 0.0852

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from seperate:
      desperate : 0.0708
       separate : 0.0917
      temperate : 0.1042
       selenate : 0.1167
           sept : 0.1167

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from untill:
          until : 0.0333
           till : 0.1111
     huntsville : 0.1333
        instill : 0.1357
         unital : 0.1422

Closest 5 dictionary words with a Jaro-Winkler distance < .15 from wich:
          winch : 0.0533
          witch : 0.0533
          which : 0.0600
        wichita : 0.0857
         switch : 0.1111

Rust

Translation of: Python

<lang rust>use std::fs::File; use std::io::{self, BufRead};

fn load_dictionary(filename: &str) -> std::io::Result<Vec<String>> {

   let file = File::open(filename)?;
   let mut dict = Vec::new();
   for line in io::BufReader::new(file).lines() {
       dict.push(line?);
   }
   Ok(dict)

}

fn jaro_winkler_distance(string1: &str, string2: &str) -> f64 {

   let mut st1 = string1;
   let mut st2 = string2;
   let mut len1 = st1.chars().count();
   let mut len2 = st2.chars().count();
   if len1 < len2 {
       std::mem::swap(&mut st1, &mut st2);
       std::mem::swap(&mut len1, &mut len2);
   }
   if len2 == 0 {
       return if len1 == 0 { 0.0 } else { 1.0 };
   }
   let delta = std::cmp::max(1, len1 / 2) - 1;
   let mut flag = vec![false; len2];
   let mut ch1_match = vec![];
   for (idx1, ch1) in st1.chars().enumerate() {
       for (idx2, ch2) in st2.chars().enumerate() {
           if idx2 <= idx1 + delta && idx2 + delta >= idx1 && ch1 == ch2 && !flag[idx2] {
               flag[idx2] = true;
               ch1_match.push(ch1);
               break;
           }
       }
   }
   let matches = ch1_match.len();
   if matches == 0 {
       return 1.0;
   }
   let mut transpositions = 0;
   let mut idx1 = 0;
   for (idx2, ch2) in st2.chars().enumerate() {
       if flag[idx2] {
           transpositions += (ch2 != ch1_match[idx1]) as i32;
           idx1 += 1;
       }
   }
   let m = matches as f64;
   let jaro =
       (m / (len1 as f64) + m / (len2 as f64) + (m - (transpositions as f64) / 2.0) / m) / 3.0;
   let mut commonprefix = 0;
   for (c1, c2) in st1.chars().zip(st2.chars()).take(std::cmp::min(4, len2)) {
       commonprefix += (c1 == c2) as i32;
   }
   1.0 - (jaro + commonprefix as f64 * 0.1 * (1.0 - jaro))

}

fn within_distance<'a>(

   dict: &'a Vec<String>,
   max_distance: f64,
   stri: &str,
   max_to_return: usize,

) -> Vec<(&'a String, f64)> {

   let mut arr: Vec<(&String, f64)> = dict
       .iter()
       .map(|w| (w, jaro_winkler_distance(stri, w)))
       .filter(|x| x.1 <= max_distance)
       .collect();
   // The trait std::cmp::Ord is not implemented for f64, otherwise
   // we could just do this:
   // arr.sort_by_key(|x| x.1);
   let compare_distance = |d1, d2| {
       use std::cmp::Ordering;
       if d1 < d2 {
           Ordering::Less
       } else if d1 > d2 {
           Ordering::Greater
       } else {
           Ordering::Equal
       }
   };
   arr.sort_by(|x, y| compare_distance(x.1, y.1));
   arr[0..std::cmp::min(max_to_return, arr.len())].to_vec()

}

fn main() {

   match load_dictionary("linuxwords.txt") {
       Ok(dict) => {
           for word in &[
               "accomodate",
               "definately",
               "goverment",
               "occured",
               "publically",
               "recieve",
               "seperate",
               "untill",
               "wich",
           ] {
               println!("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '{}' are:", word);
               println!("        Word   |  Distance");
               for (w, dist) in within_distance(&dict, 0.15, word, 5) {
                   println!("{:>14} | {:6.4}", w, dist)
               }
               println!();
           }
       }
       Err(error) => eprintln!("{}", error),
   }

}</lang>

Output:

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
        Word   |  Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
        Word   |  Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
        Word   |  Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
        Word   |  Distance
      occurred | 0.0250
         occur | 0.0571
      occupied | 0.0786
        occurs | 0.0905
      accursed | 0.0917

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
        Word   |  Distance
      publicly | 0.0400
        public | 0.0800
     publicity | 0.1044
   publication | 0.1327
    biblically | 0.1400

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
        Word   |  Distance
       receive | 0.0333
      received | 0.0625
      receiver | 0.0625
      receives | 0.0625
       relieve | 0.0667

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
        Word   |  Distance
     desperate | 0.0708
      separate | 0.0917
     temperate | 0.1042
     separated | 0.1144
     separates | 0.1144

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
        Word   |  Distance
         until | 0.0333
         untie | 0.1067
      untimely | 0.1083
          till | 0.1111
      Antilles | 0.1264

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
        Word   |  Distance
         witch | 0.0533
         which | 0.0600
        switch | 0.1111
        twitch | 0.1111
       witches | 0.1143

Swift

Translation of: Rust

<lang swift>import Foundation

func loadDictionary(_ path: String) throws -> [String] {

   let contents = try String(contentsOfFile: path, encoding: String.Encoding.ascii)
   return contents.components(separatedBy: "\n")

}

func jaroWinklerDistance(string1: String, string2: String) -> Double {

   var st1 = Array(string1)
   var st2 = Array(string2)
   var len1 = st1.count
   var len2 = st2.count
   if len1 < len2 {
       swap(&st1, &st2)
       swap(&len1, &len2)
   }
   if len2 == 0 {
       return len1 == 0 ? 0.0 : 1.0
   }
   let delta = max(1, len1 / 2) - 1
   var flag = Array(repeating: false, count: len2)
   var ch1Match: [Character] = []
   ch1Match.reserveCapacity(len1)
   for idx1 in 0..<len1 {
       let ch1 = st1[idx1]
       for idx2 in 0..<len2 {
           let ch2 = st2[idx2]
           if idx2 <= idx1 + delta && idx2 + delta >= idx1 && ch1 == ch2 && !flag[idx2] {
               flag[idx2] = true
               ch1Match.append(ch1)
               break
           }
       }
   }
   let matches = ch1Match.count
   if matches == 0 {
       return 1.0
   }
   var transpositions = 0
   var idx1 = 0
   for idx2 in 0..<len2 {
       if flag[idx2] {
           if st2[idx2] != ch1Match[idx1] {
               transpositions += 1
           }
           idx1 += 1
       }
   }
   let m = Double(matches)
   let jaro =
       (m / Double(len1) + m / Double(len2) + (m - Double(transpositions) / 2.0) / m) / 3.0
   var commonPrefix = 0
   for i in 0..<min(4, len2) {
       if st1[i] == st2[i] {
           commonPrefix += 1
       }
   }
   return 1.0 - (jaro + Double(commonPrefix) * 0.1 * (1.0 - jaro))

}

func withinDistance(words: [String], maxDistance: Double, string: String,

                   maxToReturn: Int) -> [(String, Double)] {
   var arr = Array(words.map{($0, jaroWinklerDistance(string1: string, string2: $0))}
       .filter{$0.1 <= maxDistance})
   arr.sort(by: { x, y in return x.1 < y.1 })
   return Array(arr[0..<min(maxToReturn, arr.count)])

}

func pad(string: String, width: Int) -> String {

   if string.count >= width {
       return string
   }
   return String(repeating: " ", count: width - string.count) + string

}

do {

   let dict = try loadDictionary("linuxwords.txt")
   for word in ["accomodate", "definately", "goverment", "occured",
                "publically", "recieve", "seperate", "untill", "wich"] {
       print("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '\(word)' are:")
       print("        Word   |  Distance")
       for (w, dist) in withinDistance(words: dict, maxDistance: 0.15,
                                       string: word, maxToReturn: 5) {
           print("\(pad(string: w, width: 14)) | \(String(format: "%6.4f", dist))")
       }
       print()
   }

} catch {

   print(error.localizedDescription)

}</lang>

Output:

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
        Word   |  Distance
   accommodate | 0.0182
  accommodated | 0.0333
  accommodates | 0.0333
 accommodating | 0.0815
 accommodation | 0.0815

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
        Word   |  Distance
    definitely | 0.0400
     defiantly | 0.0422
        define | 0.0800
      definite | 0.0850
     definable | 0.0872

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
        Word   |  Distance
    government | 0.0533
        govern | 0.0667
   governments | 0.0697
      movement | 0.0810
  governmental | 0.0833

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
        Word   |  Distance
      occurred | 0.0250
         occur | 0.0571
      occupied | 0.0786
        occurs | 0.0905
      accursed | 0.0917

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
        Word   |  Distance
      publicly | 0.0400
        public | 0.0800
     publicity | 0.1044
   publication | 0.1327
    biblically | 0.1400

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
        Word   |  Distance
       receive | 0.0333
      received | 0.0625
      receiver | 0.0625
      receives | 0.0625
       relieve | 0.0667

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
        Word   |  Distance
     desperate | 0.0708
      separate | 0.0917
     temperate | 0.1042
     separated | 0.1144
     separates | 0.1144

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
        Word   |  Distance
         until | 0.0333
         untie | 0.1067
      untimely | 0.1083
          till | 0.1111
      Antilles | 0.1264

Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
        Word   |  Distance
         witch | 0.0533
         which | 0.0600
        switch | 0.1111
        twitch | 0.1111
       witches | 0.1143

Wren

Library: Wren-fmt

Library: Wren-sort

This uses unixdict and borrows code from the Jaro_distance#Wren task. <lang ecmascript>import "io" for File import "/fmt" for Fmt import "/sort" for Sort

var jaroSim = Fn.new { |s1, s2|

   var le1 = s1.count
   var le2 = s2.count
   if (le1 == 0 && le2 == 0) return 1
   if (le1 == 0 || le2 == 0) return 0
   var dist = (le2 > le1) ? le2 : le1
   dist = (dist/2).floor - 1
   var matches1 = List.filled(le1, false)
   var matches2 = List.filled(le2, false)
   var matches = 0
   var transpos = 0
   for (i in 0...s1.count) {
       var start = i - dist
       if (start < 0) start = 0
       var end = i + dist + 1
       if (end > le2) end = le2
       var k = start
       while (k < end) {
           if (!(matches2[k] || s1[i] != s2[k])) {
               matches1[i] = true
               matches2[k] = true
               matches = matches + 1
               break
           }
           k = k + 1
       }
   }
   if (matches == 0) return 0
   var k = 0
   for (i in 0...s1.count) {
       if (matches1[i]) {
           while(!matches2[k]) k = k + 1
           if (s1[i] != s2[k]) transpos = transpos + 1
           k = k + 1
       }
   }
   transpos = transpos / 2
   return (matches/le1 + matches/le2 + (matches - transpos)/matches) / 3

}

var jaroWinklerDist = Fn.new { |s, t|

   var ls = s.count
   var lt = t.count
   var lmax = (ls < lt) ? ls : lt
   if (lmax > 4) lmax = 4
   var l = 0
   for (i in 0...lmax) {
       if (s[i] == t[i]) l = l + 1
   }
   var js = jaroSim.call(s, t)
   var p = 0.1
   var ws = js + l*p*(1 - js)
   return 1 - ws

}

var misspelt = ["accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"] var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() }.where { |w| w != "" } for (ms in misspelt) {

   var closest = []
   for (word in words) {
      var jwd = jaroWinklerDist.call(ms, word)
      if (jwd < 0.15) closest.add([word, jwd])
   }
   System.print("Misspelt word: %(ms):")
   var cmp = Fn.new { |n1, n2| (n1[1]-n2[1]).sign }
   Sort.insertion(closest, cmp)
   for (c in closest.take(6)) Fmt.print("$0.4f $s", c[1], c[0])
   System.print()

}</lang>

Output:

Misspelt word: accomodate:
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord

Misspelt word: definately:
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate

Misspelt word: goverment:
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty

Misspelt word: occured:
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure

Misspelt word: publically:
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback

Misspelt word: recieve:
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife

Misspelt word: seperate:
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sept
0.1167 sewerage

Misspelt word: untill:
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital

Misspelt word: wich:
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch