Question Solved1 Answer K-means clustering and normalization kmtest.csv: 2.000000 4.000000 3.000000 3.000000 3.000000 4.000000 3.000000 5.000000 4.000000 3.000000 4.000000 5.000000 9.000000 4.000000 9.000000 5.000000 9.000000 9.000000 9.000000 10.000000 10.000000 4.000000 10.000000 5.000000 10.000000 9.000000 10.000000 10.000000 11.000000 10.000000 15.000000 4.000000 15.000000 5.000000 15.000000 6.000000 16.000000 4.000000 16.000000 5.000000 16.000000 6.000000 Implement basic K-means algorithm to cluster the sample data set. Do not use built-in function such as kmeans for clustering. sample data set (kmtest ) provided. Make sure your program works with data set A. What to DO: 1. Clustering with K-means algorithm for kmtest dataset. a. Without normalization, cluster the dataset by choosing the K value as 2, 3, 4, 5. Plot results for each K values by showing each cluster with different color and cluster centers. b. With normalization, cluster the dataset by choosing the values as 2, 3, 4, 5. You should create clustering centers and clustering input for normalized data. Use Z-score normalization as the normalization method. First normalize the data and apply clustering on the normalized data. Plot results for each K values by showing each cluster with different color and cluster centers.

FKUOJU The Asker · Computer Science



K-means clustering and normalization

kmtest.csv:

2.000000 4.000000
3.000000 3.000000
3.000000 4.000000
3.000000 5.000000
4.000000 3.000000
4.000000 5.000000
9.000000 4.000000
9.000000 5.000000
9.000000 9.000000
9.000000 10.000000
10.000000 4.000000
10.000000 5.000000
10.000000 9.000000
10.000000 10.000000
11.000000 10.000000
15.000000 4.000000
15.000000 5.000000
15.000000 6.000000
16.000000 4.000000
16.000000 5.000000
16.000000 6.000000
Transcribed Image Text: Implement basic K-means algorithm to cluster the sample data set. Do not use built-in function such as kmeans for clustering. sample data set (kmtest ) provided. Make sure your program works with data set A. What to DO: 1. Clustering with K-means algorithm for kmtest dataset. a. Without normalization, cluster the dataset by choosing the K value as 2, 3, 4, 5. Plot results for each K values by showing each cluster with different color and cluster centers. b. With normalization, cluster the dataset by choosing the values as 2, 3, 4, 5. You should create clustering centers and clustering input for normalized data. Use Z-score normalization as the normalization method. First normalize the data and apply clustering on the normalized data. Plot results for each K values by showing each cluster with different color and cluster centers.
More
Transcribed Image Text: Implement basic K-means algorithm to cluster the sample data set. Do not use built-in function such as kmeans for clustering. sample data set (kmtest ) provided. Make sure your program works with data set A. What to DO: 1. Clustering with K-means algorithm for kmtest dataset. a. Without normalization, cluster the dataset by choosing the K value as 2, 3, 4, 5. Plot results for each K values by showing each cluster with different color and cluster centers. b. With normalization, cluster the dataset by choosing the values as 2, 3, 4, 5. You should create clustering centers and clustering input for normalized data. Use Z-score normalization as the normalization method. First normalize the data and apply clustering on the normalized data. Plot results for each K values by showing each cluster with different color and cluster centers.
See Answer
Add Answer +20 Points
Community Answer
HARVN6 The First Answerer
See all the answers with 1 Unlock
Get 4 Free Unlocks by registration

(a.)  Without Normalization # coding: utf-8 # In[1]: import numpy as np import matplotlib.pyplot as plt # In[2]: import random #initialize centroids def kMeansInitCentroids(X,k):     centroids = np.zeros((np.shape(X)[1],k))     idx = np.random.choice(np.shape(X)[0],k,replace = 'false')     centroids = X[idx,:]     return centroids # In[3]: #find closest centroid[] def findClosestCentroids(X,centroids):     k = np.shape(centroids)[0]     m = np.shape(X)[0]          idx = np.zeros((m,1))          for i in range(m):         min_dist = sum((X[i,:]-centroids[0,:])**2)         idx[i] = 0         for j in range(k):             dist = sum((X[i,:]-centroids[j,:])**2)             if dist<min_dist:                 min_dist = dist                 idx[i] = j     return idx # In[4]: #computes centroid def computeCentroids(X,idx,k):     [m,n] =  np.shape(X)          centroids = np.zeros((k,n))          for i in range(k):         cnt = 0         for j in range(m):             if idx[j] == i:                 cnt+=1                 centroids[i,:]+=X[j,:]         try:             centroids[i,:]/=cnt         except:             print("divided by 0")     return centroids ... See the full answer