HIERARCHICALCLUSTERING
ResearchPaper2 (H6016)
PreparedBy
DebashisRout
ITBB00066133
Tableof Contents
1.0 ABSTRACT 3
1.1 Problem statement 3
2.0 Introduction to Cluster Analysis 4
3.0 Hierarchical Clustering 5
3.1 Hierarchical Clustering Theory 5
Types of dendrograms 5
3.2 Clustering Methods 6
3.2.1 Agglomerative clustering: 6
3.2.2 Divisive clustering: 6
3.3 DISTANCE METRICS 6
a. Unweighted PairGroup Method with Arithmetic mean. 6
b. Weighted PairGroup Method with Arithmetic mean. 7
c. Single linkage 7
d. Complete Linkage 7
e. Centroid: 8
f. Averagelink: 8
4.0 Hierarchical Clustering method algorithms 9
4.1 Agglomerative Method 9
4.2 Divisive Method 12
5.0 Uses Of Hierarchical Clustering 13
5.1 Strengths of Hierarchical Clustering 13
5.2 Weakness of Hierarchical Clustering: 13
CONCLUSIONS 14
REFERENCES 15
1.0 ABSTRACT
Hierarchicalclustering approach is one of the cluster analysis methods broadlyutilized within diverse parts of the world. The result of thisstrategy structures a tree structure. We use this method to findclusters applying different granularity levels. In this paper, wediscuss hierarchical clusteranalysis approaches and their respectivealgorithms. Additionally, we will examine current advancement alsothe use in the public arena.
1.1Problem statement
Humongousvolumes data are being exchanged in networks and the internet ingeneral, without clustering/cluster analysis methodologies it wouldbe relatively difficult to group/associate multivariate objects intomeaningful groups of associative data. If we learn variables within adata then, we can group/cluster objects into homogeneous groups. Weuse cluster analysis methodologies in many application domains andthe development of algorithms for use in artificial has proven to bepromising.
Inthis paper we imagine wehave a set of data objects contained in one dataset for analysis, inour analysis of this data we consider data points or groups of datahaving no class labels(unlabeled data). This trend is not uncommonwith many databases and datasets out in the world because the processof as assigning class labels to individual tuples can be quite costlyand timeconsuming. With this imaginative dataset, our dilemma is toattempt to group these objects into naturally occurring groups thatpresent data that possess similar traits, thereby clustering the dataand making some meaningful groups that we can then use.
Thispaper outlines the requirements of clustering methods for data setsand databases. We will also attempt to explain how to computedissimilarities between objects represented by observing multipleattributes/variables of the objects in question. We will give anoverview of the hierarchical clustering technique because that`s whatthe primary focus of this paper is all about.
Keywords:hierarchical clustering, heat maps, data mining, linkage metrics,agglomerative method, divisive method, Dendrogram, Euclidean
2.0 Introduction to ClusterAnalysis
Clusteranalysis can be used to define a wide variety of procedures thatattempt to reorganize a large nonhomogenous data into smallerhomogeneous groups using clustering algorithms.
Clusteranalysis method is an unsupervised learning procedure forsegmentation of a large heterogeneous data set into smaller naturalgroupings. Data mining algorithms work by identifying objects withthe largest similarity between them, groups these objects together.Each group has objects with similar characteristics, and we can alsosay that each data points fall under a meaningful group1. Clusteranalysis algorithms evaluate important attributes of a cluster andcalculate the proximity measure of objects in the cluster.Differentcluster analysis algorithms have distinctive rationale ofcloseness.
Datamay be in one of the following forms

quantitative referring to amount

binary: one or zero,

nominal digits 1 through 9

ordinal.
Theclustering analysis for the most part is isolated into differentclassifications like Partition, Hierarchical, Density, Grid and Modelbases techniques.
Cluster Analysis
ModelBased
Hierarchical Methods
GridBased
Partitions Method
Agglomerative
Divisive
Expectation
DensityBased
KMeans
Figure1. Cluster analysismethods
Thediagram above shows the different methods for cluster analysis, thispaper’s primary focus is the Hierarchical subset of clusteringmethodologies.
3.0Hierarchical Clustering
Hierarchicalclustering method is a family two different methods, agglomerativeand divisive cluster analysis.
3.1Hierarchical Clustering Theory
Hierarchicalclustering arranges items in the data set by forming a hierarchybased on the distance or similarity between attributes in the objectscontained in the data set. A dendrogram is a treelike structuredgraph that represents the hierarchy objects take in a set of data.
Types of dendrograms
Rowdendrograms:
Attemptto draw the hierarchy of similarity or distance between variables ofan object in the data set, rows represent clusters, the clusters showspecific nodes to which each row is a member of as a result ofclustering.
Columndendrograms:
Acolumn dendrogram display hierarchy from the measured distance orfrom comparing similarity between variables of objects. An example of(b) a cluster diagram and (a) a row dendrogram is shown bellow
Figure3.1.1 An example of a dendrogram and a nested cluster diagram
Wecan carry out hierarchical clustering in one of two different ways:using a Hierarchical Clustering tool, or by performing hierarchicalclustering on a heat map visualization.
3.2Clustering Methods
Thealgorithm starts by calculating distances between all possiblecombinations of two objects by measuring the distance between theircentroids. We use the distance we have calculated to estimate thedistance between all remaining clusters that we previously formedfrom the attributes of the data during the clustering. There areseveral clustering methods that we can use:
3.2.1 Agglomerativeclustering:Thisstrategy is a methodology of combining groups through abottomupstrategy. Therefore, each one row when we are starting the analysisis an individual group. Through emphasis process, individual groupsare united to structure further expansive groups until all theindividual group combined into a big cluster.3.2.2 Divisive clustering:
Thedivisive method is the exact opposite of the agglomerative approachand takes topbottom approach. With divisive cluster analysis, weconsider all objects or data points, and we put them all togetherinto a huge cluster to begin the methodology. Repeat iteration ofpart the data brought about more modest clusters until a stage willget to be the place each one group was having stand out objects.
3.3 DISTANCE METRICS
Weought to recollect that separations are a measure of the contrasts orcloseness between two information questions in the data set.Progressive grouping routines work by arranging separations from onegroup to its neighboring groups. We will additionally examine quicklyprobably the most generally utilized separation measurements which wewill likewise consider as linkage measurements.
a. UnweightedPairGroup Method with Arithmetic mean.
Giventhat we can split our dataset into three clusters that we identify byC_{1},C_{2} andC_{3},and made up of n_{1},n_{2} andn_{3}number of rows/columns. We aggregate clusters C_{2} andC_{3}are to form a new cluster, and we name this C_{4}.
Nowwe can calculate the distance between cluster C_{1}and C_{4}as follows:
distance_{c1,c4}= a x distance _{c1,c2}+ b x distance _{c1,c3}
Where
a= n_{2} .
(n_{2}+ n_{3})
b= n_{3} .
(n_{2}+ n_{3})
b. WeightedPairGroup Method with Arithmetic mean.
Forthis case lets consider three clusters, C_{1},C_{2} andC_{3},which contain n_{1},n_{2} andn_{3}number of rows respectively. We combine clusters C_{2} andC_{3}together to form a larger cluster that we name C_{4}.
Wecalculate the distance between cluster C_{1} andthe new cluster C_{4}as follows:
distance_{c1,c4} = ½ (distance _{c1,c2}+ distance _{c1,c3})
c.Single linkage:
Isa separation measuring technique that is likewise thought to asclosest neighbor approach. We consider this the least complexcalculation contrasted with its partners for figuring relativeseparations between groups. Subsequently, all the pairwise separation needs to be registered, and the most modest separation recognized.
Figure 3.3.1 Single linkage
Distance(cluster 1,cluster 2) = d(B,M) = d(A,B ) ,(M, O, T, K )
d.Complete Linkage:
Weadditionally call this separation measuring system, most remoteneighbor approach. It is a calculation for figuring the greatestrelative separation between one cluster and the next. Subsequently,all the pairwise separation needs to be figured, and the greatestseparation recognized.
Figure 3.3.2 Complete linkage
Distance(Cluster1,Cluster2) = d(E,A) = d(M,A,K) ,(E,L,P)
e. Centroid:
Tocalculate the relative difference in separation between a cluster andits neighbor using centroid cluster analysis approach, we determinethe distance between their centroids. Generally the squared Euclideanseparation between the centroids is utilized.
Figure 3.3.3 Centroid
f. Averagelink:
Theaverage link calculation computes the normal separation of all groupsets. We measure normal from a chosen bunch and another. Henceforth,if there are x information in bunch 1 and y information in group 2,we have xy separations to figure out and include. The last step is topartition by xy.
Figure 3.3.4 Averagelink distance
4.0 Hierarchical Clustering method algorithms4.1Agglomerative Method
Theagglomerative strategy is a simple bottomup methodology to clusteranalysis as clarified prior which includes these steps:

The Agglomerative method operates by Successively merging of cluster

It allocates each data point to a cluster of its own. In the beginning, the number of clusters is as many as the number of objects N.

With each successive stage, it merges the two most similar groups to form a new cluster
Itcontinues this process until (a stoppage point is reached as thenumber of clusters reduces) all subgroups are combined to form asingle group.

The final result of this approach needs to be displayed as adendrogram tree.
Wewill explain this process with the example bellow:
Thedata set represents six cities, the distance between any two citiesas provided at the cross cell between the two cities.
Byusing the single link method.

 
GOA
MUMBAI
PUNE
DEL
BANGLORE
SURAT
DEL
670
900
400
0
255
1000
GOA
0
300
270
670
470
400
MUMBAI
300
0
560
900
750
140
BANGLORE
470
750
220
255
0
870
PUNE
270
560
0
400
220
670
SURAT
400
140
670
1000
870
0
Table 4.1.1 Cities and distances in between
Thedistance from MUMBAI to SURAT is the shortest for any two citieswhich is 140. SURAT and MUMBAI cities are put together in combinationto form " SURAT /MUMBAI" group.
Weadd the new group SURAT /MUMBAI to the table:

 
GOA
SURAT /MUMBAI/
PUNE
DEL
BANGLORE
DEL
670
900
400
0
255
GOA
0
300
270
670
470
MUMBAI/ SURAT
300
0
560
900
750
BANGLORE
470
750
220
255
0
PUNE
270
560
0
400
220
Table 4.1.2 1^{st} Iteration
Thisiteration results in BANGALORE and PUNE having the shortest distanceof 220
Min d(i,j) = d(C_{a},C_{b})= 220
C_{a}is BANGLORE
C_{b}is PUNE
Asa result BANGLORE and PUNE are merged to a new cluster, we name thisgroup PUNE /BANGLORE

 
SURAT /MUMBAI
DEL
GOA
PUNE /BANGLORE
PUNE /BANGLORE
560
255
270
0
GOA
300
670
0
270
MUMBAI/ SURAT
0
900
300
560
Table 4.1.3 2^{nd} Iteration
DELto PUNE /BANGLORE has the shortest distance measurable between twocities.
Min d(i,j) = d(C1,C2) = 255
Where C1 is DEL and
C2 is PUNE/BANGALORE
DELis merged with PUNE /BANGLORE into DEL/PUNE/BANGALORE as a newcluster
  
DEL/ PUNE /BANGLORE 
GOA 
MUMBAI/ SURAT 
SURAT /MUMBAI/ 
560 
300 
0 
DEL/ PUNE /BANGLORE/ 
0 
270 
560 
GOA 
270 
0 
300 
Table 4.1.4 3^{rd} Iteration
Atthis point we have no further change in the distance betweenclusters.
  
SURAT /MUMBAI 
DEL/ GOA/ BANGLORE/ PUNE 
SURAT /MUMBAI 
0 
300 
DEL/ GOA/ BANGLORE/ PUNE 
300 
0 
Table 4.1.5 4^{th}Iteration
Nowwe create a single group from the two clusters.
Theresult is shown graphically on the following hierarchical tree:
Figure 4.1.1Dendrogram
4.2 Divisive Method
Divisivemethod is just opposite of the agglomerative method. Itsfunctionality will work on successive splitting approach, and wesplit larger nonhomogenous groups into smaller clusters.

Monothetic: This algorithm splits the cluster by comparing/calculating similarity of one attribute at an instance.

Polythetic: Will split a cluster based on a comparison of all attributes at the same go.
Step1. Start with a single group ( one single cluster).
Step2. Divide the first group into 2 groups and the objects in onesubgroup are as far as possible from the objects in the other group
Step3. Continue till there are N groups, each with a single object
Step4.Display the results on adendrogram tree.
Thesplitting process can utilize any one of the following approach
Thegroupwith the biggest variety inside
Thegroup having the largest number of rows
Thegroup having a successive arrangement (if any).
5.0 Uses Of Hierarchical Clustering
Peopleand businesses use hierarchical cluster analysis in manyplaces/application domains as well as in many fields like banking,University, Marketing campaign, Medicine, Archeological Study,Finance, Astronomy, etc.

Marketing: Market segmentation and profiling can help marketers carry out selective marketing or target specific market groups for effective use of an organizations resources.

Land use: GIS software and agricultural analyst utilize clustering methodologies to create soil profiles, land characteristics and other properties that they then use in their planning activities.

Insurance: Insurance companies create profiles of target groups of policyholders with their specific needs. The clusters generated can be used to suggest to clients other services that may be beneficial to them.

Cityplanning: Can be used to identify groups/cluster of houses, buildings or structures. This kind of information is very helpful in city planning.
5.1 Strengths ofHierarchical Clustering

methodologies:

The number of clusters is not of major concern:

Taxonomies/Classifications may be represented meaningful.
5.2 Weakness ofHierarchical Clustering:

Greedy: When we make a decision to join two groups, it can`t be fixed

No “global objective function” is directly minimized

With different approaches a clustering algorithm may be limited with regards to:
Noisesensitivity
Differentcluster sized and curved can be adversely handled
Breakingbig clusters or chaining them
CONCLUSIONS
Inthis paper, we examined distinctive hierarchichal techniques as acomponent of cluster analysis. In progressive techniques, weperceived there is no compelling reason to indicate the aggregatenumber of clusters. These systems are efficiently basic, and we canwithout much of an effort actualize them. Anyhow there are suredisadvantages that we have to focus on, additionally while executing.The time taken to finish a various hierarchical clustering leveledsystem could be indicated to be O(n^{3}).Additionally, the separation lattice obliges O(n^2) space and getsto be gigantic for a large set of objects. Likewise can`t return whatwas carried out at one time.
REFERENCES
Aghagolzadeh,Mehdi &SoltanianZadeh, Hamid &Araabi, BabakNadjar 2011.Information Theoretic Hierarchical Clustering, Entropy, vol.13,450465.
Chaudhuri,K. McGregor, A. Finding metric structure in information theoreticclustering. In Conference on Learning Theory, COLT, Helsinki,Finland, July 2008.
Davis,J.V. Kulis, B. Jain, P. Sra, S. Dhillon, I.S.Informationtheoretic metric learning. In Proceedings of the 24thinternational conference on Machine learning (ICML ’07), New York,209–216.
Gokcay,E. Principe, J.C. Information theoretic clustering. IEEE Trans.Patt. Anal. Mach. Int.2002, 24, 158–171.
Gupta,G K 2012. Data Mining with Case Studies, PHI Learning Privatelimited, New Delhi.
Kraskov,A. Grassberger, P. MIC: mutual information based hierarchicalclustering. In Information Theory Statistical Learning Springer: NewYork, NY, USA, 2008, 101–123.
Nilsson,Nils J. 1998. Introduction to Machine Learning.
Parzen,E. On estimation of a probability density function and mode. Ann.Math. Statist. 1962, 33, 1065–1076.
Principe,J.C. Xu, D. Fisher, J. Information theoretic learning. InUnsupervised Adaptive Filtering, Haykin, S., Ed. Wiley: New York,NY, USA, 2000 265–319.
Roweis,S. Ghahramani, Z. A unifying review of linear Gaussian models.Neural Comput. 1999,11, 305–345.
Tan,PangNing& Kumar, Vipin. & Steinbach, Michael 2012,Introduction to Data Mining, Pearson Education, New Delhi.
Vercellis,Carlo 2009. Business Intelligence, John Wiley & Sons Ltd. NewDelhi.
Zhou,X. Wang, X. Dougherty, E.R. Russ, D. Suh, E. Gene clusteringbased on clusterwide mutual information. J. Comput. Biol. 2004, 11,147–161.