Skip to main content

Tectonic Shifts and Disruptive Changes in Business Analytics

Tectonic Shifts and Disruptive Changes in Business Analytics

Fig 1: Dominant Themes in Data Mining
In this paper we review select critical changes in the business environment and how business analytics has addressed it. It also identifies the areas where response has been inadequate and requires urgent attention. The paper starts with defining and scoping key terms followed by review of recent developments.

Big Data Deluge
The data avalanche has led to prefixing the term Big to Data itself creating a new term Big Data. What has happened all of a sudden to create such massive amount of data? It is not because of any abrupt change in business activities. A number of factors such as cheap commodity storage, machines or sensors captured data, user generated data etc. have resulted in this data deluge.

Fig 2: Sources of Digitized Data
Clicking photos, creating videos, recording audios has become extremely easy. Blogs, forums, social media sites, e-communities generate humungous amount of text, images, videos and other forms of data. Increasing number of devices with GPS relay their geographical locations to a central server. Malls, airports, parking lots, toll gates on highways, condominiums, offices etc. are video monitored using CCTV cameras leaving humungous amount of video trail which can used for monitoring activities for enhancing security or study changing behavioral patterns. Earlier log data used for monitoring purpose only was discarded if no anomaly is observed. The activities on internet including emails, blogging, gaming, web browsing, watching videos, listening to music etc. everything creates data which can be used for drawing deeper insights in behaviors, preferences and opinions of the people. According to statistics available on YouTube website, every minutes 100 hours of video is uploaded on YouTube. Every minute more than three million likes and three million pieces of content is shared on Facebook. The integral presence of cameras on mobile phones with click and shoot with option to back up on cloud and sharing on social media has created massive growth of personal videos and photos. The gadgets are not limited to external devices but are now wearable and even ingestible. There are headbands collecting information on brain activities, wristbands measuring physical activity level, smart pills ingested inside body relay vital information on sleep, respiration and heart beats etc. These are some examples of data sources which have resulted in data explosion. In fact micro and nano electronic devices are increasingly used in every sphere of life generating tremendous amount of data. The ubiquitous data is brought to the fingertips by the cloud infrastructure. Data is now subjected to analysis and drawing insights.

Data Management
The data thus collected was not only large in volume but also had large variety and created at very high speeds. This posed challenges in terms of the structure and format of storage, scalability, ability to provide timely search and querying. This volume of data was addressed by creation of HADOOP (Ghemawat and Dean, 2004) and SECTOR/SPHERE (Gu and Grossman, 2009) which provided the necessary scaling and persistency of data. Though RDBMS ruled the last two decades of data storage format economizing storage capacity HADOOP economized on availability and agility allowing huge storage with timely response. The variety of data while not knowing how it will be analyzed lead to storage in unstructured and semi - structured data formats. NoSQL formats such as Column store, Document store, Key - Value pair, Graphs beyond RDBMS came into existence to meet the new requirements. The data sources are so widely spread over geography that cloud storage became a necessity. An entire collection, storage and access infrastructure over cloud evolved to meet new requirements. There is enough data available for analysis today than the skills and techniques available to analyze them.

Data Science
A large volume of various types of data captured and stored in unstructured format posed the next natural challenge of analyzing and drawing insights from data. It is extremely important to have appropriate scientific ways to enable business to extract value from data. As much as measurement, collection, storage and accessibility have metamorphosed to create new age data infrastructure, data science has also undergone phenomenal change. One of the most prominent term to emerge on the scene is Data Mining. Data mining is the process of discovering insightful, interesting, and novel patterns, as well as descriptive, understandable, and predictive models from large-scale data (Zaki and Meira 2014). Data mining spans exploratory data analysis, frequency pattern mining, clustering and classification. The shortage of necessary tools and limitations of techniques were felt as early as 1983 (Lovell 1983) where the the inadequacy of regression procedures in absence of tightly structured theory was highlighted. Similarly, (Leo Breiman 2001) in his seminal paper highlights and discusses the inadequacy of assumed data model giving misleading conclusions vis-a-vis algorithmic model resulting. He makes a case for adoption of diverse tools, beyond Statistics, such as Bayesian methods, Markov Chain, Monte Carlo simulation, neural networks, Random Forest among many others. New methods having origin in the field of Physics, Computer Science and other engineering disciplines which have interesting applications in speech recognition, image recognition, handwriting recognition etc. are increasingly adopted. The Shape Recognition Forest (Amit and Geman 1997) and Support Vector Machine (Vapnik 1998) extract small information stored in large number of possibly multi-collinear variables by reducing data dimensionality. (Gregory Piatetsky-Shapiro 2007) presented a review on macro changes in data science using simple text mining between 1996 and 2005. It was observed Social network analysis, Web and multimedia mining, Support Vector Machines and advanced algorithms on association rules were widely used besides the growing popularity of Weka and R were registered until 2005. It also found that there is tremendous shortage of data mining skills and methods. (Shu-Hsien Liao, Pei-Hui Chu, Pei-Yuan Hsiao 2012) presented a rigorous review of data mining techniques and applications of three types in diverse disciplines viz. knowledge types, analysis types and architecture types during 2000 - 2011. Similarly, CRM domain specific review of data mining techniques from 2000 - 2006 on seven functions viz. Association, Classification, Clustering, Forecasting, Regression, Sequencing Discovery and Visualization is done by (Ngai, Xiu and Chau 2009). (Chen, Chiang and Storey 2012) presented a bibliometric study of critical Business Intelligence and Analytics publications and research topics and characterized them in a six dimensional research framework. The diagram below in a nutshell highlights changes in various aspects in recent times.

Fig. 3: Changes in various aspects of Business Analytics

Machine Learning

Since huge volume of data of large varieties is being gathered at extremely high pace, it is not possible to analyze it humanly. It has given rise to automation and machine learning. Machine learning has evolved from artificial intelligence over half a century. Among the first books on Machine Learning (Michalski R S, Carbonell and Mitchell 1986) is a significant contribution. (Wojtusiak and Kaufman 2010) reviewed contribution of Dr. Michalski to evolution of natural induction, learnable evolution models and inductive databases. (Andreiu C et al. 2003) introduced probabilistic machine learning using modern markov chain monte carlo simulation. (Bishop C 2007), (Hastie et al. 2009) and (Abu - Mostafa Y et al. 2012) comprehensively cover concepts and theories in machine learning and pattern recognition. A lot of paradigmatic changes are happening in data science much beyond conventional statistics. The new evolving area of work is machine reasoning which is broadly defined as algebraically manipulating previously acquired knowledge in order to answer a new question (Bottou L 2014). NIPS conferences are providing excellent platform for developing Statistical Machine Learning with hundreds of papers published every year in Advances in Neural Information Processing Systems.
Data Visualization
Data Visualization is the last link in the chain of analytics. It is well known that a chain is as good as its weakest link. This last mile in analytics concerns with the ability to present the insights drawn in a meaningful way. This is where the analytics meets the real world. It requires a very difference expertise of rendering the results in comprehensible and graphical way. It is interdisciplinary in nature that it involves use of human psychology, graphic design and technology. (Fayyad U et al. 2002) presents a collection of over 30 research papers data and model visualization. Some very advanced packages such as lattice (Gentlemen et al. 2008), ggplot (Wickham H 2006) and ggplot2 (Wickham H 2009) are developed in R. There is an increasing need for spatio - temporal data visualization such as heat maps, 3D visualisation and time series simulation. (Bach et al. 2014) present a conceptual framework for multi-dimensional visualization techniques such as Geo-Temporal Visualization, Dynamic Networks, Time - Evolving Scatterplots and Videos. Dedicated commercial software such as Tableau, Qlikview, SAP Lumira and many more have emerged in Business analytics scene in recent times.
Summary
In nutshell Data Science can be represented as combination of Computer Science and Statistics in Business context as shown below.
Fig 4: Data Science in nutshell
The discussion so far touches upon the fast evolving context of business analytics and rapid changes in response to it. There are many areas which are addressed and many are still coping up. A famous quote of Chuck Dickens “Every time computing power increases by a factor of ten we should totally rethink how and what we compute” can be give direction for future of business analytics. Given the pace of growth of data, we expect the surge in new methodologies, tools and techniques to analyze data. One of the key challenge remains with quick adoption through skilled talent to add value to the business. Mechanisms are needed to quick move the innovations and research in university laboratories to the business world. Last decade has already witnessed rapid adoption of technology and methodologies in businesses.
References
Andreiu C, Freitas N D, Doucet A, Jordan M. 2003. An Introduction to MCMC for Machine Learning. 50. pp. 5 - 43.
Bach B, Dragicevic P, Archambault D, Hurter C, Carpendale S. 2014. A Review of Temporal Data Visualizations Based on Space-Time Cube Operations.Eurographics Conference on Visualization.
Beller M J, Barnett A. 2009. Next Generation Business Analytics. Lightship Partners LLC.
Bishop C M. 2007. Pattern Recognition and Machine Learning. Springer. ISBN 0-387-31073-8.
Bottou Leon. 2014. From Machine Learning to Machine Reasoning. Machine Learning. 94. pp. 133 - 149.
Breiman L. 2001. Statistical Modelling: The Two Cultures. Statistical Science. 6 (3). pp. 199 - 231.
Chen H, Chiang R, Storey V. 2012. Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly. 36 (4). pp. 1165 - 1188.
Davenport T H. 2006. Competing on Analytics. Harvard Business Review (84:1), p. 98-107
Davenport T H. 2014. Big Data at Work: Dispelling the Myths, Uncovering the Opportunities. Harvard Business Press Books.
Dean J, Ghemawat S. 2004. MapReduce: Simplified Data Processing on Large Clusters. Proc. of the 6th Symposium on Operating Systems Design and Implementation, San Francisco CA.
Fayyad U, Vierse A, Grinstein G. 2002. Information Visualization in Data Mining and Knowledge Discovery. Academic Press. ISBN: 1-55860 - 689 - 0.
Gentleman R, Hornik K, Parigiani G. 2008. Use R!. Springer Science+ Business Media LLC. ISBN 978-0-387-75968-5.
Gu Y, Grossman R. 2009. Sector and Sphere: The Design and Implementation of a High Performance Data Cloud. Theme Issue of the Philosophical Trans. Royal Soc. A: Crossing Boundaries: Computational Science. 367 (1897). pp. 2429-2445
Hastie T, Tibshirani, Friedman J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. ISBN 0387848576
Liao S H, Chu P H, Hsiao P Y. 2012. Data Mining Techniques and Applications.- A Decade Review from 2000 - 2011. Expert Systems with Applications. 39. 11303 - 11311.
Lovell M. 1983. Data Mining. The Review of Economics and Statistics. 65 (1). 1 - 12.
Michalski R S, Carbonell T J, Mitchell T M. 1986. Machine Learning: An Artificial Intelligence Approach Vol. II. Los Altos, CA, Morgan Kaufmann Publishers, Inc.
Ngai E W T, Xiu L, Chau D C K. 2009. Application of Data Mining Techniques in Customer Relationship Management: A Literature Review and Classification. Expert Systems with Applications. 36. 2592 - 2602
Piatetsky - Shapiro G. 2007. Data mining and knowledge discovery 1996 to 2005: overcoming the hype and moving from “university”to “business”and “analytics”. Data Mining Knowledge Discovery. 15, pp. 99 - 105.
Vapnik V. 1998. Statistical Learning Theory. Wiley New York.
Wickham H. 2006. ggplot: An Implementation of the Grammar of Graphics. R Package Version 0.4
Wickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer Science+ Business Media LLC. ISBN 978-0-387-98140-6.
Wojtusiak J, Kaufman K A. 2010. Ryszard S. Michalski: The Vision and Evolution of Machine Learning. Advances in Machine Learning. 3-22. Springer - Verlag.
Y Amit, Geman D. 1997. Shape Quantization and Recognition with Random Trees.Neural Computation. 9. pp. 1545 - 1588.
Yaser S, Abu-Mostafa, Malik M-I, and Hsuan-Tien Lin. 2012. Learn from Data.AMLBook. ISBN 1600490069
Zaki M, Wagner M. 2014. Data Mining and Analysis - Fundamental Concepts and Algorithms. Cambridge University Press. ISBN 978-0-521-76633-3

Comments

Popular posts from this blog

Segmentation - A Few Practical Considerations

Real Data Scientists Do It Themselves