Autocomplete feature using Trie data structure and sample implementation in python.
Dictionary meaning of “Autocomplete” is completion of word without user needing to type in the full word.
So, to implement this feature in python we can use a data structure called Trie.
Trie, also known as digital tree or prefix tree, is a type of search tree, used to locate the specific keys from within a set.
As Trie is a tree data structure, it is composed of nodes and each node contains link to its child node. In this tree , each node here is an individual letter of word. There are two special kind of nodes, “root” and “end” nodes. There is only one root node and possibly many end nodes, specifying end of the word.
So, for example we are going to insert following words in our word store(Trie tree structure).
- dis
- dismi
- dismip
- eat
It will be stored as follows:
In the above picture we can see root has two children ‘d’ and ‘e’, as we are storing words starting with ‘d’ and ‘e’. The special character ‘#’ specifies the end of the word. As we have a full word ‘dis’ in the list of words to be stored, that’s why below ‘s’ in the traversal , root → d → i →s, we have #, which specifies end of the word ‘dis’. If in the list of letters, we have special characters like ‘#’, we can use any other symbol or full word like ‘END’ to specify terminal node. In this tree structure each node points to its child node.
So, for example user enters key ‘di’ , then we can autocomplete the word di using following approach:
- First check if ‘d’ is there in the tree, if its there, then traverse to child elements of ‘d’ and check if ‘i’ is there, in the list of the child elements of ‘d’. If ‘i’ is present in the list of child elements of ‘d’, then retrieve all the child nodes of ‘i’. If ‘i’ is not present in the list of child elements ‘d’ , then that means there is no words starting with ‘di’.
- The above approach can be specified in pseudocode as:
As now we understand the basics of the approach used, lets write this in python.
Python code to insert into the Trie tree structure:
If the _insert_word of the above code is called with word ‘dis’, it will iterate through each letter of the word ‘dis’ and populate _letter_indices as follows:
{‘d’:{‘i’:{‘s’:{‘#’:’END’}}}}.
So, we can see the value of each element is a dict specifying child node and child node of ‘s’ is ‘#’ specifying the end of word ‘dis’.
Now if we call _insert_word with another word ‘dism’, the above _letter_indices will be updated as:
{‘d’:{‘i’:{‘s’:{‘#’:’END’, ’m’:{‘#’:’END’}}}}}}
So, we can see in the dict of ‘s’, a new key ‘m’ gets added, along with already existing ‘#’, this specifies that, in the store there full word ‘dis’ and another word starting with ‘dis’, in this case ‘dism’.
Python code to find words in the Trie tree structure:
Here in the above code _find_words first looks for existence of words, starting with search_prefix. If there are no letters in sequence as in the search_prefix, the search is stopped there. If all letters present in search_prefix are found in same sequence as in search_prefix, then all letters present in the tree, below the last letter of the search_prefix is returned as search_result. search_prefix is defaulted to None, so if no search prefix is passed it will return all letters present in the Tree.
Lets run the above code as follows:
The output of the above code will be as follows:
Hope this article helped you to understand the basics of Trie structure and implementation in python.