The edit distance of two strings commonly refers to the Levenshtein Distance of two strings- the amount of work it takes to transform one string to the other by either inserting, deleting, or changing one character at a time.
For example, changing sitting -> kitten takes 3
moves: sitting -> kitting, kitting -> kittin, and finally kittin -> kitten.
The algorithm to solve this in reasonable time takes dynamic programming, and has many applications such as word checking, and typing speed tests. Algorithm below:
public static int editDistance(String s, String t){
int m=s.length();
int n=t.length();
int[][]d=new int[m+1][n+1];
for(int i=0;i<=m;i++){
d[i][0]=i;
}
for(int j=0;j<=n;j++){
d[0][j]=j;
}
for(int j=1;j<=n;j++){
for(int i=1;i<=m;i++){
if(s.charAt(i-1)==t.charAt(j-1)){
d[i][j]=d[i-1][j-1];
}
else{
d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));
}
}
}
return(d[m][n]);
}
public static int min(int a,int b,int c){
return(Math.min(Math.min(a,b),c));
}
It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing.
int m=s.length();
int n=t.length();
int[][]d=new int[m+1][n+1];
for(int i=0;i<=m;i++){
d[i][0]=i;
}
for(int j=0;j<=n;j++){
d[0][j]=j;
}
for(int j=1;j<=n;j++){
for(int i=1;i<=m;i++){
if(s.charAt(i-1)==t.charAt(j-1)){
d[i][j]=d[i-1][j-1];
}
else{
d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));
}
}
}
return(d[m][n]);
}
public static int min(int a,int b,int c){
return(Math.min(Math.min(a,b),c));
}
It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing.
Thus, if we let L(i,j) indicate the cost it takes from changing i to j, if s and t are the original strings, the cost is:
Math.min(1+L(i-1,j-1),1+L(i,j-1),1+L(i-1,j)) unless the last characters of the strings are the same, then the minimum is L(i-1,j-1). This solves the problem, but note that it is very slow because of all the recursive calls, which repeat many similiar problems.
Thus, we use dynamic programming: Construct a matrix d[m][n] with sizes s.length() and t.length(). instead of recursive calls, we will use a matrix lookup which will store the smallest current cost to change a smaller i into j.
Thus, with he java algorithm code for edit distance above, the speed is O(mn) where m and n are the sizes of the strings.
0 comments:
Post a Comment