Sunday, October 25, 2015

Algorithms and Data Structures : Part 4.1 String : Substring : Subset : Permutation : Combination


A lot of questions comes on string or sets. 

But if we now the basics , then we can solve most of the problems.

For strings, we should know how to find SUBSTRINGS.

For Sets, we should know how to find  SUBSETS, PERMUTATIONS, COMBINATIONS. 


Lets see one problem on  STRING. 

 Given a string, find all the possible palindrome in it. 

Now if you know how you can find all the substring, then you can check each for the palindrome test. 

Now you can find all the substring using 3 loops.


Now we can use this code to work on variety of question, but if someone ask to provide 6th substring in hexa order, then first finding total substring and then sorting them, will consume space and increase complexity.....a better way is SUFFIX TREE.

The funda behind SUFFIX TREE is that , for a given string you find all the suffixes and then arrange them in ascending order.  Now you store this order only.

Ex : For string  "tree", we have

0 1 2 3
t  r e e

suffixes :
0 tree
1 ree
2 ee
3 e

On arranging as per lexicographical order, we have
 3 e
 2 ee
 1 ree
 0 tree

So suffix array will be
0 1 2 3
3 2 1 0

How we read suffix array is like this : Suppose we have to find suffix whose order is 3rd in lexicographic order, in suffix array what index is at 3rd position, it is 0.

No in given string start from 0th index and go till the end of string...we have tree :)


Now suppose you have suffix array and you want to find  5th SUBSTRING.


Now what are possible substring of tree

tree,tre,ree,tr,re,ee,t,r,e,e

Now the suffix we have are   tree,ree,ee,e ,

Now to get substring from suffix array, we find prefixes of each suffix , so

from tree we have     tree,tre,tr,t
from ree                    ree,re,r
from ee                     ee,e
from e                       e

if you see we have all SUBSTRING.

Further if you NOTE , all substring derived from  REE will have order greater than TREE.

So I know that my suffix order is

e      from this I can have 1 substring
ee    from this  2 substring
ree   from this 3 substring
tree  from this 4 substring


Now for the 5th substring we will have it in 1+2+3...so in 3rd suffix substrings

We have to also pay attention to DUPLICATE, for that we have LCS matrix. It value is derived by checking how many characters of two adjacent suffix in suffix array matches from start.

Ex for
               LCS
0  3 e        0
1  2 ee      1          // this means 1 substring will be common in e and ee
2  1 ree     0
3  0 tree    0

Other example  for  "chtch"
                          LCS                         SUBSTRING
0   ch                 0                               ch,c
1   chtch            2                               chtch,chtc,cht,ch,c          See TWO common SUBSTRING
2   h                   0                               h
3   htch              1                               htch,htc,ht,h                    SEE ONE common SUBSTRING
4   tch                0                               tch,tc,t

chtch :  chtch , htch, tch, ch, h

So substring in lexicographic order is

c,ch,cht,chtc,chtch,h,ht,htc,htch,t,tc,tch   12 SUBSTRING

possible are  5*6/2   : since STRING LENGTH is 5 , so  n(n+1)/2 ...but as you see 3 are duplicates


No comments:

Post a Comment