HIERARCHICAL CLUSTERING

HIERARCHICALCLUSTERING

ResearchPaper-2 (H6016)

PreparedBy

DebashisRout

ITB-B00066133

Tableof Contents

1.0 ABSTRACT 3

1.1 Problem statement 3

2.0 Introduction to Cluster Analysis 4

3.0 Hierarchical Clustering 5

3.1 Hierarchical Clustering Theory 5

Types of dendrograms 5

3.2 Clustering Methods 6

3.2.1 Agglomerative clustering: 6

3.2.2 Divisive clustering: 6

3.3 DISTANCE METRICS 6

a. Unweighted Pair-Group Method with Arithmetic mean. 6

b. Weighted Pair-Group Method with Arithmetic mean. 7

c. Single linkage 7

d. Complete Linkage 7

e. Centroid: 8

f. Average-link: 8

4.0 Hierarchical Clustering method algorithms 9

4.1 Agglomerative Method 9

4.2 Divisive Method 12

5.0 Uses Of Hierarchical Clustering 13

5.1 Strengths of Hierarchical Clustering 13

5.2 Weakness of Hierarchical Clustering: 13

CONCLUSIONS 14

REFERENCES 15

1.0 ABSTRACT

Hierarchicalclustering approach is one of the cluster analysis methods broadlyutilized within diverse parts of the world. The result of thisstrategy structures a tree structure. We use this method to findclusters applying different granularity levels. In this paper, wediscuss hierarchical clusteranalysis approaches and their respectivealgorithms. Additionally, we will examine current advancement alsothe use in the public arena.

1.1Problem statement

Humongousvolumes data are being exchanged in networks and the internet ingeneral, without clustering/cluster analysis methodologies it wouldbe relatively difficult to group/associate multivariate objects intomeaningful groups of associative data. If we learn variables within adata then, we can group/cluster objects into homogeneous groups. Weuse cluster analysis methodologies in many application domains andthe development of algorithms for use in artificial has proven to bepromising.

Inthis paper we imagine wehave a set of data objects contained in one dataset for analysis, inour analysis of this data we consider data points or groups of datahaving no class labels(unlabeled data). This trend is not uncommonwith many databases and datasets out in the world because the processof as assigning class labels to individual tuples can be quite costlyand time-consuming. With this imaginative dataset, our dilemma is toattempt to group these objects into naturally occurring groups thatpresent data that possess similar traits, thereby clustering the dataand making some meaningful groups that we can then use.

Thispaper outlines the requirements of clustering methods for data setsand databases. We will also attempt to explain how to computedissimilarities between objects represented by observing multipleattributes/variables of the objects in question. We will give anoverview of the hierarchical clustering technique because that`s whatthe primary focus of this paper is all about.

Keywords:hierarchical clustering, heat maps, data mining, linkage metrics,agglomerative method, divisive method, Dendrogram, Euclidean

2.0 Introduction to ClusterAnalysis

Clusteranalysis can be used to define a wide variety of procedures thatattempt to reorganize a large non-homogenous data into smallerhomogeneous groups using clustering algorithms.

Clusteranalysis method is an unsupervised learning procedure forsegmentation of a large heterogeneous data set into smaller naturalgroupings. Data mining algorithms work by identifying objects withthe largest similarity between them, groups these objects together.Each group has objects with similar characteristics, and we can alsosay that each data points fall under a meaningful group1. Clusteranalysis algorithms evaluate important attributes of a cluster andcalculate the proximity measure of objects in the cluster.Differentcluster analysis algorithms have distinctive rationale ofcloseness.

Datamay be in one of the following forms

  • quantitative referring to amount

  • binary: one or zero,

  • nominal digits 1 through 9

  • ordinal.

Theclustering analysis for the most part is isolated into differentclassifications like Partition, Hierarchical, Density, Grid and Modelbases techniques.

Cluster Analysis

Model-Based

Hierarchical Methods

Grid-Based

Partitions Method

Agglomerative

Divisive

Expectation

Density-Based

K-Means

Figure1. Cluster analysismethods

Thediagram above shows the different methods for cluster analysis, thispaper’s primary focus is the Hierarchical subset of clusteringmethodologies.

3.0Hierarchical Clustering

Hierarchicalclustering method is a family two different methods, agglomerativeand divisive cluster analysis.

3.1Hierarchical Clustering Theory

Hierarchicalclustering arranges items in the data set by forming a hierarchybased on the distance or similarity between attributes in the objectscontained in the data set. A dendrogram is a tree-like structuredgraph that represents the hierarchy objects take in a set of data.

Types of dendrograms

Rowdendrograms:

Attemptto draw the hierarchy of similarity or distance between variables ofan object in the data set, rows represent clusters, the clusters showspecific nodes to which each row is a member of as a result ofclustering.

Columndendrograms:

Acolumn dendrogram display hierarchy from the measured distance orfrom comparing similarity between variables of objects. An example of(b) a cluster diagram and (a) a row dendrogram is shown bellow

Figure3.1.1 An example of a dendrogram and a nested cluster diagram

Wecan carry out hierarchical clustering in one of two different ways:using a Hierarchical Clustering tool, or by performing hierarchicalclustering on a heat map visualization.

3.2Clustering Methods

Thealgorithm starts by calculating distances between all possiblecombinations of two objects by measuring the distance between theircentroids. We use the distance we have calculated to estimate thedistance between all remaining clusters that we previously formedfrom the attributes of the data during the clustering. There areseveral clustering methods that we can use:

3.2.1 Agglomerativeclustering:Thisstrategy is a methodology of combining groups through abottom-upstrategy. Therefore, each one row when we are starting the analysisis an individual group. Through emphasis process, individual groupsare united to structure further expansive groups until all theindividual group combined into a big cluster.3.2.2 Divisive clustering:

Thedivisive method is the exact opposite of the agglomerative approachand takes top-bottom approach. With divisive cluster analysis, weconsider all objects or data points, and we put them all togetherinto a huge cluster to begin the methodology. Repeat iteration ofpart the data brought about more modest clusters until a stage willget to be the place each one group was having stand out objects.

3.3 DISTANCE METRICS

Weought to recollect that separations are a measure of the contrasts orcloseness between two information questions in the data set.Progressive grouping routines work by arranging separations from onegroup to its neighboring groups. We will additionally examine quicklyprobably the most generally utilized separation measurements which wewill likewise consider as linkage measurements.

a. UnweightedPair-Group Method with Arithmetic mean.

Giventhat we can split our dataset into three clusters that we identify byC1,C2&nbspandC3,and made up of n1,n2&nbspandn3number of rows/columns. We aggregate clusters C2&nbspandC3are to form a new cluster, and we name this C4.

Nowwe can calculate the distance between cluster C1and C4as follows:

distancec1,c4= a x distance c1,c2+ b x distance c1,c3

Where

a= n2 .

(n2+ n3)

b= n3 .

(n2+ n3)

b. WeightedPair-Group Method with Arithmetic mean.

Forthis case lets consider three clusters, C1,C2&nbspandC3,which contain n1,n2&nbspandn3number of rows respectively. We combine clusters C2&nbspandC3together to form a larger cluster that we name C4.

Wecalculate the distance between cluster C1&nbspandthe new cluster C4as follows:

distancec1,c4 = ½ (distance c1,c2+ distance c1,c3)

c.Single linkage:

Isa separation measuring technique that is likewise thought to asclosest neighbor approach. We consider this the least complexcalculation contrasted with its partners for figuring relativeseparations between groups. Subsequently, all the pairwise separation needs to be registered, and the most modest separation recognized.

Figure 3.3.1 Single linkage

Distance(cluster 1,cluster 2) = d(B,M) = d(A,B ) ,(M, O, T, K )

d.Complete Linkage:

Weadditionally call this separation measuring system, most remoteneighbor approach. It is a calculation for figuring the greatestrelative separation between one cluster and the next. Subsequently,all the pairwise separation needs to be figured, and the greatestseparation recognized.

Figure 3.3.2 Complete linkage

Distance(Cluster1,Cluster2) = d(E,A) = d(M,A,K) ,(E,L,P)

e. Centroid:

Tocalculate the relative difference in separation between a cluster andits neighbor using centroid cluster analysis approach, we determinethe distance between their centroids. Generally the squared Euclideanseparation between the centroids is utilized.

Figure 3.3.3 Centroid

f. Average-link:

Theaverage link calculation computes the normal separation of all groupsets. We measure normal from a chosen bunch and another. Henceforth,if there are x information in bunch 1 and y information in group 2,we have xy separations to figure out and include. The last step is topartition by xy.

Figure 3.3.4 Average-link distance

4.0 Hierarchical Clustering method algorithms4.1Agglomerative Method

Theagglomerative strategy is a simple bottom-up methodology to clusteranalysis as clarified prior which includes these steps:

  • The Agglomerative method operates by Successively merging of cluster

  • It allocates each data point to a cluster of its own. In the beginning, the number of clusters is as many as the number of objects N.

  • With each successive stage, it merges the two most similar groups to form a new cluster

Itcontinues this process until (a stoppage point is reached as thenumber of clusters reduces) all subgroups are combined to form asingle group.

  • The final result of this approach needs to be displayed as adendrogram tree.

Wewill explain this process with the example bellow:

Thedata set represents six cities, the distance between any two citiesas provided at the cross cell between the two cities.

Byusing the single link method.

&nbsp

GOA

MUMBAI

PUNE

DEL

BANGLORE

SURAT

DEL

670

900

400

0

255

1000

GOA

0

300

270

670

470

400

MUMBAI

300

0

560

900

750

140

BANGLORE

470

750

220

255

0

870

PUNE

270

560

0

400

220

670

SURAT

400

140

670

1000

870

0

Table 4.1.1 Cities and distances in between

Thedistance from MUMBAI to SURAT is the shortest for any two citieswhich is 140. SURAT and MUMBAI cities are put together in combinationto form &quot SURAT /MUMBAI&quot group.

Weadd the new group SURAT /MUMBAI to the table:

&nbsp

GOA

SURAT /MUMBAI/

PUNE

DEL

BANGLORE

DEL

670

900

400

0

255

GOA

0

300

270

670

470

MUMBAI/ SURAT

300

0

560

900

750

BANGLORE

470

750

220

255

0

PUNE

270

560

0

400

220

Table 4.1.2 1st Iteration

Thisiteration results in BANGALORE and PUNE having the shortest distanceof 220

Min d(i,j) = d(Ca,Cb)= 220

Cais BANGLORE

Cbis PUNE

Asa result BANGLORE and PUNE are merged to a new cluster, we name thisgroup PUNE /BANGLORE

&nbsp

SURAT /MUMBAI

DEL

GOA

PUNE /BANGLORE

PUNE /BANGLORE

560

255

270

0

GOA

300

670

0

270

MUMBAI/ SURAT

0

900

300

560

Table 4.1.3 2nd Iteration

DELto PUNE /BANGLORE has the shortest distance measurable between twocities.

Min d(i,j) = d(C1,C2) = 255

Where C1 is DEL and

C2 is PUNE/BANGALORE

DELis merged with PUNE /BANGLORE into DEL/PUNE/BANGALORE as a newcluster

&nbsp

DEL/ PUNE /BANGLORE

GOA

MUMBAI/ SURAT

SURAT /MUMBAI/

560

300

0

DEL/ PUNE /BANGLORE/

0

270

560

GOA

270

0

300

Table 4.1.4 3rd Iteration

Atthis point we have no further change in the distance betweenclusters.

&nbsp

SURAT /MUMBAI

DEL/ GOA/ BANGLORE/ PUNE

SURAT /MUMBAI

0

300

DEL/ GOA/ BANGLORE/ PUNE

300

0

Table 4.1.5 4thIteration

Nowwe create a single group from the two clusters.

Theresult is shown graphically on the following hierarchical tree:

Figure 4.1.1Dendrogram

4.2 Divisive Method

Divisivemethod is just opposite of the agglomerative method. Itsfunctionality will work on successive splitting approach, and wesplit larger non-homogenous groups into smaller clusters.

  • Monothetic: This algorithm splits the cluster by comparing/calculating similarity of one attribute at an instance.

  • Polythetic: Will split a cluster based on a comparison of all attributes at the same go.

Step1. Start with a single group ( one single cluster).

Step2. Divide the first group into 2 groups and the objects in onesubgroup are as far as possible from the objects in the other group

Step3. Continue till there are N groups, each with a single object

Step4.Display the results on adendrogram tree.

Thesplitting process can utilize any one of the following approach

Thegroupwith the biggest variety inside

Thegroup having the largest number of rows

Thegroup having a successive arrangement (if any).

5.0 Uses Of Hierarchical Clustering

Peopleand businesses use hierarchical cluster analysis in manyplaces/application domains as well as in many fields like banking,University, Marketing campaign, Medicine, Archeological Study,Finance, Astronomy, etc.

  • Marketing: Market segmentation and profiling can help marketers carry out selective marketing or target specific market groups for effective use of an organizations resources.

  • Land use: GIS software and agricultural analyst utilize clustering methodologies to create soil profiles, land characteristics and other properties that they then use in their planning activities.

  • Insurance: Insurance companies create profiles of target groups of policyholders with their specific needs. The clusters generated can be used to suggest to clients other services that may be beneficial to them.

  • City-planning: Can be used to identify groups/cluster of houses, buildings or structures. This kind of information is very helpful in city planning.

5.1 Strengths ofHierarchical Clustering

  • methodologies:

  • The number of clusters is not of major concern:

  • Taxonomies/Classifications may be represented meaningful.

5.2 Weakness ofHierarchical Clustering:

  • Greedy: When we make a decision to join two groups, it can`t be fixed

  • No “global objective function” is directly minimized

  • With different approaches a clustering algorithm may be limited with regards to:

Noisesensitivity

Differentcluster sized and curved can be adversely handled

Breakingbig clusters or chaining them

CONCLUSIONS

Inthis paper, we examined distinctive hierarchichal techniques as acomponent of cluster analysis. In progressive techniques, weperceived there is no compelling reason to indicate the aggregatenumber of clusters. These systems are efficiently basic, and we canwithout much of an effort actualize them. Anyhow there are suredisadvantages that we have to focus on, additionally while executing.The time taken to finish a various hierarchical clustering leveledsystem could be indicated to be O(n3).Additionally, the separation lattice obliges O(n^2) space and getsto be gigantic for a large set of objects. Likewise can`t return whatwas carried out at one time.

REFERENCES

Aghagolzadeh,Mehdi &ampSoltanian-Zadeh, Hamid &ampAraabi, BabakNadjar 2011.Information Theoretic Hierarchical Clustering, Entropy, vol.13,450-465.

Chaudhuri,K. McGregor, A. Finding metric structure in information theoreticclustering. In Conference on Learning Theory, COLT, Helsinki,Finland, July 2008.

Davis,J.V. Kulis, B. Jain, P. Sra, S. Dhillon, I.S.Information-theoretic metric learning. In Proceedings of the 24thinternational conference on Machine learning (ICML ’07), New York,209–216.

Gokcay,E. Principe, J.C. Information theoretic clustering. IEEE Trans.Patt. Anal. Mach. Int.2002, 24, 158–171.

Gupta,G K 2012. Data Mining with Case Studies, PHI Learning Privatelimited, New Delhi.

Kraskov,A. Grassberger, P. MIC: mutual information based hierarchicalclustering. In Information Theory Statistical Learning Springer: NewYork, NY, USA, 2008, 101–123.

Nilsson,Nils J. 1998. Introduction to Machine Learning.

Parzen,E. On estimation of a probability density function and mode. Ann.Math. Statist. 1962, 33, 1065–1076.

Principe,J.C. Xu, D. Fisher, J. Information theoretic learning. InUnsupervised Adaptive Filtering, Haykin, S., Ed. Wiley: New York,NY, USA, 2000 265–319.

Roweis,S. Ghahramani, Z. A unifying review of linear Gaussian models.Neural Comput. 1999,11, 305–345.

Tan,Pang-Ning&amp Kumar, Vipin. &amp Steinbach, Michael 2012,Introduction to Data Mining, Pearson Education, New Delhi.

Vercellis,Carlo 2009. Business Intelligence, John Wiley &amp Sons Ltd. NewDelhi.

Zhou,X. Wang, X. Dougherty, E.R. Russ, D. Suh, E. Gene clusteringbased on clusterwide mutual information. J. Comput. Biol. 2004, 11,147–161.