K-Nearest Neighbors Algorithm
The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics.
5.1 Text mining
The KNN algorithm is one of the most popular algorithms for text categorization or text mining. Some of the most recent works on this topic are for instance. Different numbers of nearest neighbors are used for different classes in this approach, rather than a fixed number across all classes. In this way, the only parameter that needs to be chosen by the user when using KNN, the K value, becomes less sensible and hence it does not need to be carefully chosen as in the standard algorithm. Indeed, the probability that an unknown sample belongs to a class is computed by using only some top Kn nearest neighbors for that class. The Kn value is derived from K according to the size of the corresponding class in the training set. This modified KNN was efficient and less sensible to the K values when applied to text mining problems.
In general, KNN is applied less than other data mining techniques in agriculture related fields. It has been applied, for instance, for simulating daily precipitations and other weather variables. Another interesting application is the evaluation of forest inventories and for estimating forest variables. In these applications, satellite imagery is used, with the aim of mapping the land cover and land use with few discrete classes. The other applications of the k-NN method in agriculture include climate forecasting and estimating soil water parameters.
Data mining as a process of discovering useful patterns and correlations has its own niche in financial modeling. Similar to other computational methods almost every data mining method and technique has been used in financial modeling. An incomplete list includes a variety of linear and nonlinear models multi-layer neural networks, k-means and hierarchical clustering, k-nearest neighbors, decision tree analysis, regression (logistic regression, general multiple regression), ARIMA, principal component analysis, and Bayesian learning. Stock market forecasting is one of the most core financial tasks of KNN. Stock market forecasting includes uncovering market trends, planning investment strategies, identifying the best time to purchase the stocks, and what stocks to purchase. Some of other applications of KNN in finance are mentioned below: Forecasting stock market: Predict the price of a stock, on the basis of company performance measures and economic data. Currency exchange rate Bank bankruptcies Understanding and managing financial risk Trading futures Credit rating Loan management Bank customer profiling Money laundering analyses
Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack. The prediction is to be based on demographic, diet and clinical measurements for that patient. Estimate the amount of glucose in the blood of a diabetic person, from the infrared absorption spectrum of that person’s blood. Identify the risk factors for prostate cancer, based on clinical and demographic variables. The KNN algorithm has been also applied for analyzing micro-array gene expression data, where the KNN algorithm has been coupled with genetic algorithms, which are used as a search tool. Other applications include the prediction of solvent accessibility in protein molecules, the detection of intrusions in computer systems, and the management of databases of moving objects such as computer with wireless connections.