You are given two strings `$S1`

and `$S2`

.

Write a script to find out the minimum operations required to convert `$S1`

into `$S2`

. The operations can be insert, remove or replace a character. Please check out [the] Wikipedia page for more information.

## Overview

What we are asked to do, is to calculate the *Levenshtein distance* between two strings. If we have two strings, `$source`

and `$dest`

, we can recursively define how to go from `$source`

to `$dest`

using inserts, deletes, and replacements. Let `head ($str)`

be the first character of a string `$str`

, and `tail ($str)`

be the string, except the first character. Then:

- If
`$source`

is the empty string, we add each of the characters of`$dest`

. - If
`$dest`

is the empty string, we delete each of the characters of`$source`

. - If
`head ($source)`

equals`head ($dest)`

, then we recurse with`tail ($source)`

and`tail ($dest)`

. - Otherwise, we either:
- add
`head ($dest)`

, and recurse with`$source`

and`tail ($dest)`

. - delete
`head ($dest)`

, and recurse with`tail ($source)`

and`$dest`

. - replace
`head ($source)`

with`head ($tail)`

, and recurse with`tail ($source)`

and`tail ($dest)`

.

- add

Since we just want to number of operations, we can translate this. Let be the *Levenshtein distance* between strings and . Let be the length of the string , the first character of the string and the string minus the first character. Then:

We can rearrange this. Define as:

That is, is `1`

if the first characters differ, and 0 if they are equal. Then:

We can use this to construct a recursive algorithm. However, this would require recursive calls. Instead, we will be using an iterative algorithm, the Wagner Fischer algorithm. This is a more or less straight forward implementation of the above definition — except that it works the other way around. Instead of iterating through the strings, finding the shortest number of edits from the current positions to the end of the strings, it keeps track of the shortest path from the beginning to the current position. Which boils down to the same thing, it’s just finding the shortest number of edits of the reversed strings.

We will be calculating a 2-dimensional array `distance`

. `distance [i] [j]`

will be the minimum number of edits required to change the first `i`

characters of the source string (`s`

) to the first `j`

characters of the destination string (`d`

). We then have, in pseudo-code:

```
for j = 1 .. |d| do
distance [0] [j] = j for j = 1 .. |d|
end
for i = 1 .. |s| do
distance [i] [0] = i for i = 1 .. |s|
done
for i = 1 .. |s| do
for j = 1 .. |d| do
cost = s [i] == d [j] ? 0 : 1
distance [i] [j] = min (distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] + cost)
done
done
return distance [|s|] [|d|]
```

Note that strings are indexed `1`

-based.

This uses memory. We can improve on this. Note that we are using the *current* and *previous* rows of the array, that is, in the inner loop, we only index by `i`

and `i - 1`

. With a little rearrangement and deletion of rows we no longer use, we can reduce the memory to :

```
for i = 0 .. |s| do
for j = 0 .. |d| do
if i == 0 || j == 0
then
distance [i] [j] = i + j
else
cost = s [i] == d [j] ? 0 : 1
distance [i] [j] = min (distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] + cost)
end
end
if i > 0
then
delete distance [i]
end
end
return distance [|s|] [|d|]
```

By just using two one dimensional arrays, and swapping the strings if the destination string is larger than the source string, we could reduce the memory usage to , but we decided not to bother.

Implementation of the algorithm above in various languages is now easy.

# Solutions

## Perl

```
sub LevenshteinDistance ($first, $second) {
my $distance;
for (my $i = 0; $i <= length ($first); $i ++) {
for (my $j = 0; $j <= length ($second); $j ++) {
$$distance [$i] [$j] =
$i == 0 || $j == 0 ? $i + $j
: min ($$distance [$i - 1] [$j] + 1,
$$distance [$i] [$j - 1] + 1,
$$distance [$i - 1] [$j - 1] +
(substr ($first, $i - 1, 1) eq
substr ($second, $j - 1, 1) ? 0 : 1))
}
undef $$distance [$i - 1] if $i;
}
$$distance [-1] [-1];
}
```

Find the full program on GitHub.

## AWK

```
{
for (i = 0; i <= length ($1); i ++) {
for (j = 0; j <= length ($2); j ++) {
distance [i, j] = i == 0 || j == 0 ? i + j\
: min(distance [i - 1, j] + 1,
distance [i, j - 1] + 1,
distance [i - 1, j - 1] +\
(substr ($1, i, 1) == substr ($2, j, 1) ? 0 : 1))
}
}
print distance [length ($1), length ($2)]
}
```

Find the full program on GitHub.

## C

```
size_t min3 (size_t a, size_t b, size_t c) {
return a < b ? a < c ? a : c
: b < c ? b : c;
}
size_t LevenshteinDistance (char * first, size_t n,
char * second, size_t m) {
size_t ** distance;
if ((distance = malloc ((n + 1) * sizeof (size_t *))) == NULL) {
fprintf (stderr, "Out of memory\n");
exit (1);
}
for (size_t i = 0; i <= n; i ++) {
if ((distance [i] = malloc ((m + 1) * sizeof (size_t))) == NULL) {
fprintf (stderr, "Out of memory\n");
exit (1);
}
for (size_t j = 0; j <= m; j ++) {
distance [i] [j] = i == 0 || j == 0 ? i + j
: min3 (distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] +
(first [i - 1] == second [j - 1] ? 0 : 1));
}
if (i) {
free (distance [i - 1]);
}
}
size_t out = distance [n] [m];
free (distance [n]);
free (distance);
return (out);
}
```

Find the full program on GitHub.

## Lua

```
function LevenshteinDistance (first, second)
local n = first : len ()
local m = second : len ()
distance = {}
for i = 0, n do
distance [i] = {}
for j = 0, m do
if i == 0 or j == 0
then
distance [i] [j] = i + j
else
local cost = 1
if string . sub (first, i, i) ==
string . sub (second, j, j)
then
cost = 0
end
distance [i] [j] = math . min (
distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] + cost
)
end
end
if i > 0
then
distance [i - 1] = nil
end
end
return distance [n] [m]
end
```

Find the full program on GitHub.

## Node.js

```
function LevenshteinDistance (strings) {
let first = strings [0] . split ("");
let second = strings [1] . split ("");
let distance = [];
let N = first . length;
let M = second . length;
for (let i = 0; i <= N; i ++) {
distance [i] = [];
for (let j = 0; j <= M; j ++) {
distance [i] [j] =
i == 0 || j == 0 ? i + j
: Math . min (distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] +
(first [i - 1] ==
second [j - 1] ? 0 : 1))
}
if (i) {
distance [i - 1] = 0
}
}
return distance [N] [M]
}
```

Find the full program on GitHub.

## Python

```
def LevenshteinDistance (first, second):
n = len (first)
m = len (second)
distance = []
for i in range (n + 1):
distance . append ([])
for j in range (m + 1):
distance [i] . append (i + j if i == 0 or j == 0 else
min (distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] +
(0 if first [i - 1] ==
second [j - 1] else 1)))
return distance [n] [m]
```

Find the full program on GitHub.

## Ruby

```
def LevenshteinDistance (first, second)
n = first . length
m = second . length
distance = []
for i in 0 .. n do
distance [i] = []
for j in 0 .. m do
distance [i] [j] = i == 0 || j == 0 ? i + j
: [distance [i - 1] [j] + 1,
distance [i] [j - 1] + 1,
distance [i - 1] [j - 1] +
(first [i - 1] == second [j - 1] ? 0 : 1)] . min
end
if i > 1
distance [i - 1] = nil
end
end
return distance [n] [m]
end
```

Find the full program on GitHub.