Entropy in Passwords

I recently found one of my throwaway passwords in a random corner of the internet. It was just sitting there in plain text with my username right next to it along with around 33,000 other username-password pairs. Besides the minor scare, I found it quite fascinating that there these huge files exist publicly. I immediately downloaded the file and scrolled through some passwords. I noticed some obvious patterns. I then wondered about the algorithms used to detect these patterns. I looked around and found some interesting references.

One thing that struck me was that entropy in passwords affects their strength. I then wondered about entropy – was it possible to create artifacts using entropy in passwords?

I was reminded of The Code Book by Simon Singh which I had read a while back. The book references frequency analysis of character pairs and triplets. I found this page on letter frequency by Freek Dijkstra.

That lead me to this page in which they calculated the statistical distribution of characters upto three orders (three characters). They used the following books as their corpus

  1. Origin of Species (Charles Darwin)
  2. The Voyage of the Beagle (Charles Darwin)
  3. Jane Eyre (Charlotte Bronte)
  4. Wuthering Heights (Emily Bronte)
  5. Tarzan of the Apes (Edgar Rice Burroughs)
  6. The Return of Tarzan (Edgar Rice Burroughs)
  7. Paradise Lost (John Milton)

I agree that using this particular set might be dated, but I figured it would still be interesting to see the outcome. Besides, replacing the actual data set should be fairly straightforward in this case.

I then wrote a program with certain rules.

When the program is given a phrase, first, the program would look up the second and third order databases (from the link above). The database I’m using converts all lowercase letters and spaces – yes, a problem, but still potentially interesting outcomes. Based on the amount of variance from the most likely next character to the least likely next character, the actual next character gets a score between 0 and 179. The more likely a character follows the previous character, the straighter the line. Since the database also counts spaces as characters, and I chose to ignore spaces (considering the relative low use of spaces in passwords), the line can almost never be a single straight line. The distance between points is relative to the size of the canvas. The rotation starts clockwise, but for every change in type of character (lowercase, uppercase, number or symbol), the direction changes.

The program also converts non-alphabets (leet) to their respective alphabets.

Here are some of the passwords visualized.
line2 line44 line85line86 line19448 line16402 line16379 line16363 line523 line504 line495 line471 line404 line392 line320 line305 line264 line258 line242 line241 line203 line202 line148 line139 line137 line113 line110 line104 line102 line97 line93


The code that generated this is not yet public on Github. There are still some changes and cleanup that I need to do. I’ll update this page when I put it up.

Share on Google+Share on LinkedInTweet about this on TwitterPin on PinterestShare on Facebook


One thought on “Entropy in Passwords

  1. Mithru, this is a really fascinating topic to tackle. Thanks for documenting your concept and process so thoroughly. Your question of how entropy in passwords affects their strength, and then exploring how to visualize that entropy seems more and more relevant as people rely so much on databases and cloud-based accounts.

    One question I had was what you mean by “The distance between points is relative to the size of the canvas.” Does the size of the line drawings have to do with the length of the password, or are you normalizing the drawing so that it fills up the size of your canvas?

    Seeing multiple password/line drawing combinations is really effective in understanding how your algorithm works. Another way to show it in action, could be to animate the line drawings over time as you reveal the letters in the passwords one by one. This project also made me think that reverse-engineering, or creating passwords from sketches could be an interesting thing to try. Great work!

Leave a Reply

Your email address will not be published. Required fields are marked *