Description
Levenshtein distance (LD) is a measure of the similarity between two strings, and it is a minimum costs to transform a string to the other using deletions, insertions, or substitutions. This is really useful algorithm for comparing two strings.
Algorithm
As shown on the algorithm table, make a calculated distance table.
Next, initialize first row and column as 0 – (m or n).
Get minimum value of left one +1, upper one +1, and diagonally above and to the left + cost.
Example ( GUMBO & GAMBOL )
Step 7
final Levenshtein Distance is lower right hand corner of the matrix which is 2 in this case.
Java Code
public class Distance { //**************************** // Get minimum of three values //**************************** private int Minimum (int a, int b, int c) { int mi; mi = a; if (b < mi) { mi = b; } if (c < mi) { mi = c; } return mi; } //***************************** // Compute Levenshtein distance //***************************** public int LD (String s, String t) { int d[][]; // matrix int n; // length of s int m; // length of t int i; // iterates through s int j; // iterates through t char s_i; // ith character of s char t_j; // jth character of t int cost; // cost // Step 1 n = s.length (); m = t.length (); if (n == 0) { return m; } if (m == 0) { return n; } d = new int[n+1][m+1]; // Step 2 for (i = 0; i <= n; i++) { d[i][0] = i; } for (j = 0; j <= m; j++) { d[0][j] = j; } // Step 3 for (i = 1; i <= n; i++) { s_i = s.charAt (i - 1); // Step 4 for (j = 1; j <= m; j++) { t_j = t.charAt (j - 1); // Step 5 if (s_i == t_j) { cost = 0; } else { cost = 1; } // Step 6 d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost); } } // Step 7 return d[n][m]; } }