read-text and see
and the variable punc. The function read-text reads one
word at a time, the function see updates the count of how many time
a word has been seen, punc specifies pubctuation characters that
should be considered as separate words.
bad] and one *good* for the words from the good email file
[called good by Paul Graham].
I have removed the call to string-downcase in the
function read-text since words should be kept with the
original case. I have added additional characters to the
list in punc since I want them to be stored as separate
symbols. You are free to add more characters if needed.
; ngood = number of good messages -- good = hash table of good words
(let ((g (* 2 (or (gethash word good) 0)))
(b (or (gethash word bad) 0)))
(unless (< (+ g b) 5)
(max .01
(min .99 (float (/ (min 1 (/ b nbad))
(+ (min 1 (/ g ngood))
(min 1 (/ b nbad)))))))))
;; call them probs
(let ((prod (apply #'* probs)))
(/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x)) probs)))))