# Edit Distance / Levenshtein Distance

Mar 13, 2022·

Tag: Leetcode Hard

# Edit Distance

Edit distance between two words is the minimum number of operations , i.e character insertions, deletions or substitutions, to transform one word to another word. This operation has cost associated with it.

# Levenshtein Distance

Levenshtein Distance is edit distance with `cost = 1` for the operation , it comes from the family of distance metrics.

# Definition

## Example

• Edit Distance between EXECUTE and EXPIRE is 4.
• EXECUTE → EXPCUTE → EXPIUTE → EXPRTE→ EXPIRE
• Better representation using "gap representation"
• `E X E C U T E`
• `E X P I R _ E`
• `0 0 1 1 1 1 0`
• Sum the Cost = 4
• " empty space `_` " to denote insertion or deletion.

## Recurrence Relation

``````ED(s1,s2,i,j) where i - s1[0...m-1] , j - s2[0...n-1]

Insertion: Recur for i and j+1
Deletion: Recur for i+1 and j
Substitution: Recur for i+1 and j+1
``````

# Algorithm

## Brute Force Approach

Brute Force approach is to enumerate all possible matches for edit operations until we find match between two strings. While comparing the characters, when the character match we do not have to perform any operation , so the cost becomes 0. If they do not match, then we check for possibilities of each insertion ,deletion and substitution operation, with cost 1.

### Code

``````    public int minDistance(String word1, String word2) {
return ED(word1,word2,0,0);
}

private int ED(String s1, String s2, int i, int j){
if(i==s1.length())
return s2.length()-j; // remaining size difference

if(j==s2.length())
return s1.length()-i; // remaining size difference

if(s1.charAt(i) == s2.charAt(j))
return ED(s1,s2,i+1,j+1); //cost = 0
else
return Math.min(Math.min(
ED(s1,s2,i,j+1), //insertion
ED(s1,s2,i+1,j)), //deletion
ED(s1,s2,i+1,j+1)) //substitution
+ 1; //cost = 1 for Levenshtein
}
``````

### Visualize Recursion Tree

Notice :

• When there is a Prefix Match between two strings, the tree progresses only in one direction. e.g. `ED(0,0) and ED(1,1)`.
• Otherwise three children (one for each operation) are generated. e.g `ED(2,2)`.
• Also, notice repeating sub-problems e.g. `ED(3,3)`.

During recursion, we are trying to breakdown our original problem into sub-problems. We keep recursing until we hit empty string. This hints to be a dynamic programming problem.

At a high level, dynamic programming approaches includes below, and we will walk through them in the upcoming blogs for this series.

Let us first find out the complexity of the code above.

### Complexity Analysis

• Time complexity:

• at least : O(3 min(m,n) )
• at worst case: O(3n) , occurs when m=n
• Space complexity:

• O(n)

# Applications of Edit Distance

• diff (Unix)
• stemming (NLP)
• spelling correction
• DNA sequence

# UpNext

The complexity for the brute force approach came up to being exponential. How did we come up with the analysis? Stay tune for the next blog to understand the complexity analysis over a recursion tree.

Below is a Map of things we have written so far. Also, we would like to hang with people who are doing these algorithms. Join us at Discord or follow us over Twitter/Instagram.