Tag: Leetcode Hard

# Edit Distance

Edit distance between two words is the minimum number of operations , i.e character insertions, deletions or substitutions, to transform one word to another word. This operation has cost associated with it.

# Levenshtein Distance

Levenshtein Distance is edit distance with `cost = 1`

for the operation , it comes from the family of distance metrics.

# Family of Distance Metrics

# Definition

Levenshtein distance - Wikipedia

## Example

- Edit Distance between EXECUTE and EXPIRE is 4.
- EXECUTE → EXPCUTE → EXPIUTE → EXPRTE→ EXPIRE
- Better representation using "gap representation"
`E X E C U T E`

`E X P I R _ E`

`0 0 1 1 1 1 0`

**Sum the Cost = 4**- " empty space
`_`

" to denote insertion or deletion.

## Recurrence Relation

```
ED(s1,s2,i,j) where i - s1[0...m-1] , j - s2[0...n-1]
Insertion: Recur for i and j+1
Deletion: Recur for i+1 and j
Substitution: Recur for i+1 and j+1
```

# Algorithm

## Brute Force Approach

Brute Force approach is to **enumerate all possible matches** for edit operations until we find match between two strings. While comparing the characters, when the character match we do not have to perform any operation , so the cost becomes 0. If they do not match, then we check for possibilities of each insertion ,deletion and substitution operation, with cost 1.

### Code

```
public int minDistance(String word1, String word2) {
return ED(word1,word2,0,0);
}
private int ED(String s1, String s2, int i, int j){
if(i==s1.length())
return s2.length()-j; // remaining size difference
if(j==s2.length())
return s1.length()-i; // remaining size difference
if(s1.charAt(i) == s2.charAt(j))
return ED(s1,s2,i+1,j+1); //cost = 0
else
return Math.min(Math.min(
ED(s1,s2,i,j+1), //insertion
ED(s1,s2,i+1,j)), //deletion
ED(s1,s2,i+1,j+1)) //substitution
+ 1; //cost = 1 for Levenshtein
}
```

### Visualize Recursion Tree

**Notice** :

- When there is a
**Prefix Match**between two strings, the tree progresses only in one direction. e.g.`ED(0,0) and ED(1,1)`

. - Otherwise three children (one for each operation) are generated. e.g
`ED(2,2)`

. - Also, notice repeating sub-problems e.g.
`ED(3,3)`

.

During recursion, we are trying to breakdown our original problem into sub-problems. We keep recursing until we hit empty string. *This hints to be a dynamic programming problem.*

At a high level, dynamic programming approaches includes below, and we will walk through them in the upcoming blogs for this series.

- Recursion and Memoization
- Tabulation aka Wagner–Fischer Algorithm

Let us first find out the complexity of the code above.

### Complexity Analysis

Time complexity:

- at least : O(3
^{min(m,n)}) - at worst case: O(3
^{n}) , occurs when m=n

- at least : O(3
Space complexity:

- O(n)

# Applications of Edit Distance

- diff (Unix)
- stemming (NLP)
- spelling correction
- DNA sequence

# UpNext

The complexity for the brute force approach came up to being ** exponential**. How did we come up with the analysis?

*Stay tune for the next blog to understand the complexity analysis over a recursion tree.*

Below is a Map of things we have written so far. Also, we would like to hang with people who are doing these algorithms. Join us at Discord or follow us over Twitter/Instagram.