Data Warehousing and Data Mining


DataWarehousing and Data Mining

DataWarehousing and Data Mining

Theimportance of technology cannot be understated as far as enhancingthe speed and quality of services is concerned. Indeed, thecontemporary human society has seen the advent of numeroustechnologies and innovations that enhance service provision. At theheart of these innovations are computers, which primarily undertakethe collection, storage and dissemination of data when required(Boorman, 2011). This has seen the incorporation of thesetechnological innovations into almost every facet of the humansociety. However, there have been immense concerns regarding thecapability of managers in varied fronts to use the availabletechnology. This is the main concern with Big Data.

Theterm Big Data is a collective term used in describing the exponentialgrowth, as well as availability of both unstructured and structureddata. Big Data underlines there specific aspects including variety,velocity and volume of data. The recent times have seen an increasein volume of data including enhanced amounts of sensor andmachine-machine data that is collected. This, however, introduces aproblem regarding how one would determine the relevance of large datavolumes or even how analytics may be used to create value from therelevant data. Similarly, data today comes in numerous formatsincluding structured, unstructured, audio, video, stock ticker dataamong others (LaValle et al, 2010). The management, merging andgoverning of varying formats of data is often a daunting task for alarge number of organizations. On the same note, organizations arealways grappling with the need to react sufficiently fast so as todeal with data velocity.

Thereare numerous challenges that managers are faced with in the age ofBig Data. First, they are required to meet the speed requirements.Needless to say, companies not only need to come up with and analyzedata but they also have to find it fast. Visualization allowsorganizations to undertake the analysis and make decisions at afaster pace, yet managers have to grapple with the need to go viaimmense volumes of data and access the necessary level of detail at ahigh speed. Secondly, managers are required to comprehend the datathat they are presenting before they can use visualization as acomponent of data analysis. This often necessitates the comprehensionof context, without which the visualization tools are likely to haveless value to the user (LaValle et al, 2010). Third, managers facethe challenge of ensuring that the data they present is of goodquality. This usually involves ensuring the accuracy and timelinessof data. This comes as a challenge for any data analysis, a factorthat is compounded by the volume of information in big data. Needlessto say, the quality of data determines the value of datavisualization in the long-term (Webster, 2011). On the same note,managers have to grapple with the need for exhibiting meaningfulresults. It goes without saying that plotting points on bars orgraphs for analysis may be difficult when one is dealing withimmensely large amounts and categories of information (Webster,2011). Lastly, managers often have to confront the problem ofoutliers. It is well noted that outliners and trends can becommunicated way faster on graphical data representations than ontables that have numbers and text. Users would be likely to spotissues that should be rectified more easily by a simple glance of achart. More often than not, outliers represent between 1 and 5percent of data, in which case it would be difficult to view suchpercentages of data when one is working with immense volumes of data.

Tomeet these challenges, managers need to make some fundamentaldecisions. First, managers must incorporate information managementand data governance process in place so as to ensure the cleanlinessof data (Nagabhushana,2006).Such a management process would ensure that the data used in creatingany visualizations would be accurate and is provided in a timelymanner so as to safeguard its relevance. This is the only way thatthe quality of data would be provided in the short-term and thelong-term (Nagabhushana,2006).

Inaddition, it is imperative that managers ensure that they have theappropriate hardware and equipment so as to tackle the challengeregarding the need for speed. A proportion of vendors use powerfulparallel processing and increased memory so as to crunch the largevolumes of data in a speedy manner (Wang,2008).Similarly, managers put data in-memory using grid-computing approachwhere a large proportion of machines are used in solving problems.This would allow managers to explore immense volumes of data andobtain business insights in real-time (Nagabhushana,2006).

Withregard to comprehension of the immense volumes of data, it isimperative that managers establish a proper domain expertise. Thiswould involve ensuring that the individuals who are carrying out dataanalysis have a deep comprehension of the sources of data, and theaudience that would essentially be consuming it, as how this audiencewould interpret this information (Wang,2008).In the case of outliers, managers may bin the results so as to allowfor the viewing of the distribution, as well as seeing the outliers.As much as outliers may not be representative of the data, they havethe capacity to reveal potentially valuable and previously unseeninsights (Nagabhushana,2006).


Boorman,C. (2011) WhyData Mining Is the Next Frontier for Social Media Marketing.Mashable Business. Retrieved 22 September 2014, from

LaValle,S., Lesser, E., Shockley, R., Hopkins, M. and Kruschwitz, N. (2010).BigData, Analytics and the Path From Insights to Value.MIT Sloan Management Review. December. Retrieved 22 September 2014,from

Nagabhushana,S. (2006).&nbspDatawarehousing: OLAP and data mining.New Delhi, India: New Age International.

Wang,J. (2008).&nbspDatawarehousing and mining: Concepts, methodologies, tools, andapplications.Hershey, PA: Information Science Reference.

Webster,J. (2011) UnderstandingBig Data Retrieved September 22nd,2014, from