LeetCode 820 Compressed encoding of words

Mondo Technology Updated on 2024-01-30

Given a list of words, we encode the list into an index string s and an index list a.

For example, if this list is ["time", "me", "bell"], we can then express it as s ="time#bell#"and indexes = [0, 2, 5].

For each index, we can read the string by starting from the position of the index in the string s until"#"to restore our previous list of words.

So what is the minimum string length to successfully encode a given list of words?

Example: Enter: words = [.]"time", "me", "bell"]

Output: 10

Description: s ="time#bell#" , indexes = [0, 2, 5] 。

Hint: 1 < = wordslength <= 2000

1 <= words[i].length <= 7

Each word is a lowercase letter.

After reading the question, Ali Byte found that if each word in the list is reversed, it is a suffix tree problem. For example:["time", "me", "bell"]Reverse order is followed by ["emit", "em", "lleb"The result we are asking for is nothing more than that"emit"length of +"llem"length of +"##"(em and emit have a common prefix, just calculate one).

Therefore, the intuitive idea is to use the form of prefix tree + reverse insertion to simulate a suffix tree.

The following ** looks complicated, but I use this template for many topics, and I can ac with a little adjustment of the details.

The main APIs for the prefix tree are as follows:

insert(word): Insert a wordsearch(word): Find if a word existsstartwith(word): Find out if there are words prefixed with word, where startwith is the core usage of the prefix tree, and its name prefix tree comes from here. You can start with 208 questions to familiarize yourself with the prefix tree before trying other questions.

A prefix tree looks something like this:

As shown in the figure, each node stores a character, and then adds a control information to indicate whether it is the end of the word, and the actual use process may be slightly different, but it does not change much.

This problem needs to consider the edge case, for example, this list is ["time", "time", "me", "bell"In this case of duplicate elements, here I use hashset to deduplicate.

The prefix tree is deduplicated.

class trie: def __init__(self): """ initialize your data structure here. """ self.trie = {}def insert(self, word): """ inserts a word into the trie. :type word: str :rtype: void """ curr = self.trie for w in word: if w not in curr: curr[w] = {}curr = curr[w] curr['#'] = 1 def search(self, word): """ returns if the word is in the trie. :type word: str :rtype: bool """ curr = self.trie for w in word: curr = curr[w] # len(curr) == 1 means we meet '## when we search 'em'(which reversed from 'me') # the result is len(curr) >1 # cause the curr look like } return len(curr) == 1class solution: def minimumlengthencoding(self, words: list[str]) int: trie = trie() cnt = 0 words = set(words) for word in words: trie.insert(word[::1]) for word in words: if trie.search(word[::1]):cnt += len(word) +1 return cnt
Complexity analysisTime complexity: $o(n)$, where n is the total number of characters in the word length list, such as ["time", "me"], that's 4 + 2 = 6. Spatial complexity: $o(n)$, where n is the total number of characters in the word-length list, such as ["time", "me"], that's 4 + 2 = 6.

Related Pages