Get Your Data Science Assignment Done at AllHomeworkAssignments

  • k-Nearest Neighbor algorithm represents a machine learning technique used for classifying a set of data into its given target values . KNN could also be used for regression problems but is widely used for classification problems. Now, any classification model needs a target set on which we train the model for its further use. Most of Data Scientists manually set these target values to positive, negative or null. There are other ways of doing this for example there is a library in python known as Textblob that automatically set the target for each tweet just in few lines of code. The data set then is divided into two half's training set and testing set. Now this training as well as test set needs to be transformed into vector formation so as to be fed to the model. The models don’t understand any values other than the vectors. This could be done using another module of the python known as sklearn which contains many classification model as well as different encoders in it. Most commonly used encoding methods include:
    1: Count vectorizor 2: Tfidf encoder
    Alternately, can use deep learning for building a sentiment analyser. In this case libraries like keras, tensorflow and Theono could turn to be helpful
    Model evaluation
    One of the most common and appropriate technique used by for evaluation of a classifier is through confusion matrix. A generalized form of confusion matrix is given below: By applying this technique we can derive the generalized evaluation parameters. These parameters include:
    Accuracy : accuracy of a classifier indicates how accurately the classifier has predicted the result.
    Precision: precision shows how often the result that is being predicted by the classifier is actually true, when it indicates true. The formula for precision is:
    Text Classification is a technique in which we assign targets or categories to textual data in accordance with the context of the data. This method is included in the fundamentals of NLP techniques. Sentiment analysis is actually an application of text classification. Other applications of text classification include spam detection , also a faster emergency response system can be made by classifying panic conversation on social media. Textual data is everywhere, may it be emails , web sites , social media , books or chats. Everywhere the sight goes there is some form of unstructured textual data present. All this data could be made usefull only if we know how to extract it and find usefull patterns in it. Structuring this large data needs scrutinizing effort but this effort could bring a lot of benifit to an individual or organization. Almost all of the text Classification Techniques could be expressed in following steps :
    Data Collection
    This is the first and a necessary step for building any Machine Learning algorithm as all the machine learning algorithms require some data to train on. Data Collection totally depends on the problem at hand , for example in case of sentiment analysis which is an application text classification, needs any raw text with attached target annotations as positive , negative and null. Similarly depending on the problem we can collect data in any form , may it be reviews of some product of some organization or it could be genre-labeled songs etc.

Log in to reply