Spam Identification using ANN

Ananya Bal

Ananya Bal

Unknown

1 0
  • 0 Collaborators

Spam Identification and segregation is an integral part of mail and message management systems. It is of even larger importance when it comes to official emails where one doesn't want to encounter unwanted messages or advertising content. Neural Networks can be applied to messages to observe the non-linearities from various parts of the message body (like frequencies od specific words, sequences of capital letters, etc) and classify messages as spam or not spam. Neural Networks are especially helpful when the dataset and the number of attributes are huge. ...learn more

Project status: Under Development

Artificial Intelligence

Code Samples [1]

Overview / Usage

Classifying messages as spam or not has been a classical classification problem for a while now. However, the parameters chosen as attributes are usually different in each classifier and can affect the results majorly. The dataset I have used records the percentage of words in the e-mail that match a given word from a set of words, the percentage of special characters, the average length of uninterrupted sequences of capital letters, length of the longest uninterrupted sequence of capital letters and the total number of capital letters in the e-mail. There are a total of 57 attributes and 4601 records.
The classifier gave an overall precision of 94% and recall of 94% as well. These results are good enough for the classifier to be applied to software.

Methodology / Approach

Post random splitting of the attributes and the class variable into training and test cases, a Multi-Layer Perceptron with three hidden layers is designed to apply to the training set. The number of nodes is adjusted and so is the maximum number of iterations to get convergence and high precision.

Technologies Used

scikit-learn
MLPClassifier

Repository

https://github.com/Anniebbb/SpamC

Comments (0)