K-Nearest Neighbors (KNN) Algorithm

Introduction

K-Nearest Neighbors, also known as KNN, is one of the most popular machine learning algorithms. It is used in both classification and regression. The special thing about KNN is that it is based on the idea of similarity. When a new data point arrives, the algorithm looks at the existing data and finds the most similar points. Based on those points, it decides the output.

KNN does not build a mathematical model. It does not create rules. It simply stores the dataset and waits for new input. Because of this, KNN is often called a lazy learner.

In more easy words we will understand
What is KNN?

KNN is a supervised learning algorithm. Supervised learning means the model is trained using labelled data. For every example in the training dataset, we know the input as well as the correct output.

KNN predicts the output of a new data point based on the K nearest data points in the dataset. These “nearest” points are selected using distance formulas. The basic logic is that similar things exist close to each other.

If a new fruit is similar to apples in terms of weight, color, and texture, then the model will label it as an apple.

How KNN Works?

KNN follows a sequence of steps.

1. Choose the value of K

You must decide how many neighbours the algorithm will consider for making predictions. K = 3, 5, 7 are very common.

2. Calculate distance

The algorithm will measure how far each point is from the new unknown point. Distance is usually calculated using formulas like Euclidean distance.

3. Sort all distances

After calculating the distance, the algorithm sorts the points from smallest to largest distance.

4. Pick K points with minimum distance

The algorithm now selects the K nearest neighbours.

5. Majority voting (for classification)

If the majority of neighbours belong to a particular class, the new point is assigned to that class.

6. Average value (for regression)

If KNN is being used for regression, the final prediction is the average of the K neighbours.

This method makes KNN very flexible. It can solve both number prediction (regression) and category prediction (classification) problems.

Example to Understand KNN Clearly

Imagine you have data about students' study habits. Based on hours studied and attendance percentage, you want to classify whether a new student will pass or fail.

You take K = 5.

You find the 5 nearest neighbours of the new student.

If among those 5 neighbours:

Pass = 4

Fail = 1

Your model will predict → Pass.

This idea is called majority voting.

This example shows that KNN is simple, logical, and close to real-life decision-making.

Let's understand knn with easy diagram

In this diagram k value is 3 , so new data point find nearest data point

As blue point means new points has 3 close point , 2 red and 1 black.

Red points indicates model will predict "No" and Black points will predict "Yes"

So that new points predict "No".

Always remember always k values are odd number

[3,5,7....]

If want to understand why odd k value just DM me on Instagram channel - smart_tech_ai_unfolded

Why KNN Is Called a Lazy Learner

Lazy learner means KNN does not learn patterns in advance. It does not build a model at training time. Instead:

1. It stores all training data

2. When a new data point comes, it performs heavy calculations

3. It delays generalization

This is the opposite of algorithms like logistic regression or decision trees, which learn rules beforehand.

Because of this property, KNN has almost zero training time but high prediction time.

Distance Formulas Used in KNN

KNN depends on distance measurement. Some common distance measures are:

1. Euclidean Distance

The most commonly used formula. It finds the straight-line distance between two points.

2. Manhattan Distance

Used when you can only move in horizontal and vertical directions. Useful in grid-like data.

Distance measure has a major impact on the accuracy of KNN.

Importance of Feature Scaling in KNN

Feature scaling means normalizing or standardizing the values of features. It is very important in KNN because KNN is distance-based. If one feature has large values and the other feature has small values, large-value features dominate distance calculation.

Example:

• Age = 20, 30

• Salary = 20,000, 40,000

Salary will dominate distance because of large values.

To fix this, scaling is applied using StandardScaler or MinMaxScaler.

Choosing the Right Value of K

Selecting K is a crucial step.

Small K → More sensitive to noise, unstable

Large K → Too smooth, may ignore important patterns

The commonly used practice is:

• Try multiple K values

• Use cross-validation to choose the best

Odd values are often chosen to avoid tie situations in classification.

Advantages of KNN (Elaborated)

KNN has several strong benefits:

1. Very easy to understand

2. No training phase

3. Works well for small datasets

4. Works for both regression and classification

5. No assumptions about data

6. Good performance when data is well structured

7. Easy to update with new data

8. Performs well when classes are clearly separated

Limitations of KNN

Even though KNN is simple, it has some disadvantages:

1. Slow for large datasets because it calculates distance for every point

2. Sensitive to irrelevant features

3. High memory requirement

4. Needs feature scaling

5. Cannot handle high-dimensional data well

6. Affected by outliers

7. Performance decreases when data is imbalanced

KNN is used in many practical applications:

1. Recommender Systems

Finds similar users and suggests movies or products.

2. Credit Risk Detection

Finds past customers with similar patterns to predict if a loan should be sanctioned.

3. Image Classification

Classifies images based on pixel similarity.

4. Disease Prediction

Checks patient symptoms and finds similar patients to predict diseases.

5. Fraud Detection

Compares transactions with past fraud patterns.

6. Customer Segmentation

Groups similar customers together.

KNN is especially effective when similarity plays a major role.

Python Code :-

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.datasets import load_iris

# load a data set from iris

from sklearn.metrics import accuracy_score

data = load_iris()

X = data.data

y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = KNeighborsClassifier(n_neighbors=5)

# it means we will choose 5 nearest neighbour from the data point

model.fit(X_train, y_train)

pred = model.predict(X_test)

accuracy = accuracy_score(y_test, pred)

print("Predicted values:", pred)

print("Accuracy:", accuracy)

Conclusion

KNN is one of the simplest algorithms in machine learning, but it is extremely powerful when used correctly.

It teaches core ML ideas like distance, similarity, voting, and neighbourhood patterns.

Even though it becomes slow on large datasets, it is still one of the best algorithms for learning ML basics.

If you are a beginner, KNN is a perfect algorithm to understand the foundation of predictive analytics.

#KNNAlgorithm #MachineLearning #SupervisedLearning #DataScienceBasics #MLModels #ClassificationAlgorithm #TechEducation #LearnMachineLearning #MLTheory #DataScienceBlog

Search This Blog

smarttechaiunfolded