Problem
How do you find the number of unique words from a collection of letters contain duplicates?
Word here represents a specific orderings of letters.
Example: if a collection of letters is [a,b,b], possible words are aab, aba, baa.
Examples
Input: RRD
Output: 3, options are RRD, RDR, DRR
Input: RRDD
Output: 6, RDDR, DRRD, RDRD, DRDR, RRDD, DDRR
Why Can’t We Use Other Formulas?
Let’s suppose the collection of letters is: RRD.
To differentiate between the two Rs, let’s denote them as R1 and R2.
Permutation Formula (n!)
It doesn’t make sense to use the permutation formula, because even though R1-R2-D is a different permutation from R2-R1-D, it is the same word (using our definition from before).
Combination Formula (nCk)
But can we use the combinations formula? Kind of, we will use a related formula, but we can’t use it directly at the moment. Why?
Because what are we choosing? How does this formula work for repeated/duplicated letters? We’re not choosing 2 letters from all the letters, we want to make unique words!
So How To Think About This Problem?
Defining the formula
Let’s think about it this way, C is the number of unique letter combinations, or unique words.
But remember, for each C (aka unique word), if there are repeated letters, there can be more that one permutation which will still yield the same word. We need to account for that.
Next, there is a calculable total number of permutations, which can be found by the permutation formula, n!, where n is the size of the collection of letters.
So with that in mind, think about following equation and validate it to yourself:
(Number of unique letter combinations, or C) x (All of the permutations of C which result in the same word) = Total number of permutations
or
C * Number_Perms_For_Each = n!
Number of Permutations for Each Unique Word
Let’s think about the collection of letters as [D1, D2, R1, R2] and word DDRR.
We want to come up with the number of permutations of the letters that yield the same word.
The valid permutations: [D2, D1, R1, R2], [D1, D2, R2, R1], [D1, D2, R2, R1], [D2, D1, R2, R1].
One way to think about this is that the number of ways to switch letters and have the same word is to perform a permutation for each letter group i, which is equal to numLetters(i) and then multiplying all of these together.
Back to the formula
Our unknown is C, we can calculate the n!, and finally we can find the number of permutations of C because we know it will contain a specified number of letters types and their frequency.
So the formula turns into:
C = n! / Number_Perms_For_Each
Example
Say we have RRDD, we know R should show up twice and D should show up twice.
For each combination, there are many permutations of letters that result in the same word, this will be 2!2!
C = 4! / ( 2! * 2!) = 4 * 3 / 2 = 2 * 3 = 6
Which makes sense, since the valid six unique words are: RRDD, DDRR, RDDR, DRRD, RDRD, DRDR.
I want to think about this a bit more.
Though, honestly this is just a multinomial expansion, but this is just another way of looking at it.