5/09/2017

Counting letters in English text.


Let's write a program that determines the frequency of occurring of different letters in English.
We start with that short text (from a story about Sherlock Holmes - 'A scandal in Bohemia').

I will comment this code tomorrow.
#letters_1.py
# -*- coding: utf-8 -*-

s = '''
To Sherlock Holmes she is always the woman. I have seldom heard him mention her under 
any other name. In his eyes she eclipses and predominates the whole of her sex. 
It was not that he felt any emotion akin to love for Irene Adler. All emotions, 
and that one particularly, were abhorrent to his cold, precise but admirably balanced mind. 
He was, I take it, the most perfect reasoning and observing machine that the world has seen, 
but as a lover he would have placed himself in a false position. '''

chars = []
for i in range(255):
    chars.append(0)

for letter in s:
    indeks=ord(letter)-1
    chars[indeks]+=1          

d = len(chars)
X = []
Y = []

for i in range(d):
    if chars[i]>0 and (i+1)>=97 and (i+1)<=122:                
        X.append(chr(i+1))
        Y.append(chars[i])

sum_y = sum(Y)
print 'All small letters in the text: ', sum_y
print '\nThe frequency of letters in %:\n '

for i in range(len(X)):
    Y[i] = round(100.0*Y[i]/sum_y,1)
    print '%5s %10.1f' %(X[i], Y[i])
And the program prints something like this.
All small letters in the text:  382

The frequency of letters in %:

    a        9.2
    b        1.6
    c        2.4
    d        3.9
    e       14.4
    f        1.6
    g        0.5
    h        6.8
    i        5.8
    k        0.8
    l        5.8
    m        3.7
    n        7.3
    o        7.9
    p        1.8
    r        5.8
    s        6.8
    t        7.6
    u        1.3
    v        1.3
    w        2.1
    x        0.3
    y        1.6

As we can see the most common letter as the letter 'e'.

No comments:

Post a Comment