{"id":20238,"date":"2019-03-04T10:15:25","date_gmt":"2019-03-04T10:15:25","guid":{"rendered":"http:\/\/www.ceo-na.com\/?p=20238"},"modified":"2020-01-10T20:20:15","modified_gmt":"2020-01-10T20:20:15","slug":"the-importance-of-data-mining","status":"publish","type":"post","link":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/business\/innovation-business\/the-importance-of-data-mining\/","title":{"rendered":"The importance of data mining"},"content":{"rendered":"<div class=\"page\" title=\"Page 19\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<p>Information often lies.\u00a0Data mining can open up this valuable seam and derive valuable business intelligence from it.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><!--more--><\/p>\n<div class=\"page\" title=\"Page 20\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<p>By <strong><em>Raoul Jetley<\/em><\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>It is estimated that up to 80% of all information held by organizations is stored in an unstructured text format. This information includes customer requirements, sales dossiers, technical specifications, maintenance reports, and stakeholder feedback.<\/p>\n<p>It is difficult to extract business intelligence from such disparate data using traditional data analysis methods so, instead, text-based data mining, or text mining, is used.<\/p>\n<div class=\"page\" title=\"Page 20\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<p>Simply put, text mining is the set of processes required to transform unstructured text documents or resources into meaningful, structured information.<\/p>\n<p>[ihc-hide-content ihc_mb_type=&#8221;show&#8221; ihc_mb_who=&#8221;3,4,5,6&#8243; ihc_mb_template=&#8221;3&#8243; ]<\/p>\n<p>The structured information can then be used to automatically discover hidden patterns and predict future outcomes using a combination of statistical, linguistic, and pattern-recognition techniques.<\/p>\n<p>Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics.<\/p>\n<p>These techniques are used to discover and present knowledge\u2014facts, business rules, and relationships\u2014that is otherwise locked in textual form, impenetrable to automated processing.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-large wp-image-20255\" src=\"http:\/\/www.ceo-na.com\/wp-content\/uploads\/2019\/02\/DATA-2-1024x576.jpg\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-200x113.jpg 200w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-300x169.jpg 300w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-400x225.jpg 400w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-500x281.jpg 500w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-600x338.jpg 600w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-700x394.jpg 700w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-768x432.jpg 768w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-800x450.jpg 800w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-1024x576.jpg 1024w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2-1200x675.jpg 1200w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/DATA-2.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<div class=\"page\" title=\"Page 20\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<h2>A typical text mining process includes the following steps:<\/h2>\n<p>\u2022 <strong>Identify and preprocess the text to be mined.<\/strong> This step involves text clean-up to remove unnecessary information from the text, splitting it into individual tokens (i.e., smaller components) and identifying parts-of-speech based on the grammar of the language used.<\/p>\n<p>\u2022 <strong>Extract relevant information and transform it into structured data.<\/strong> Information is retrieved by searching through the tokenized text and storing the results in a more structured, organized manner that is amenable to further analyses.<\/p>\n<p>\u2022 <strong>Select important features to build concept and category models.<\/strong> The number of concepts present in unstructured data is typically very large. The key to this step is to identify the most relevant features and use these to build meaningful models based on data categories and relationships.<\/p>\n<p>\u2022 <strong>Analyze the structured data to discover relationships between the concepts.<\/strong> At this point, the text mining process merges with the traditional data mining process. Classic data mining techniques, such as clustering, prediction, and classification can be used on the structured data resulting from the previous steps.<\/p>\n<p>Common applications resulting from these analyses include recognition of named entities, automatic summarization, categorization based on relevant features, and mining for customer sentiments and opinions expressed within the text.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-20241\" src=\"http:\/\/www.ceo-na.com\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25.png\" alt=\"\" width=\"776\" height=\"256\" srcset=\"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-200x66.png 200w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-300x99.png 300w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-400x132.png 400w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-500x165.png 500w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-600x198.png 600w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-700x231.png 700w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25-768x253.png 768w, http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-content\/uploads\/2019\/02\/Captura-de-pantalla-2019-02-06-a-las-16.32.25.png 776w\" sizes=\"(max-width: 776px) 100vw, 776px\" \/><\/p>\n<div class=\"page\" title=\"Page 20\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<p><em><strong>About the author:<\/strong> Raoul Jetley is Senior Principal Scientist at <a href=\"https:\/\/new.abb.com\/about\/technology\/corporate-research-centers\" target=\"_blank\" rel=\"noopener\">ABB Corporate Research<\/a>, Bangalore, India. He can be reached at: raoul.jetley@in.abb.com<\/em><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>[\/ihc-hide-content]<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Information often lies.\u00a0Data mining can open up this valuable seam  [&#8230;]<\/p>\n","protected":false},"author":8,"featured_media":20254,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,1337],"tags":[99,1385,1795,14,1484,1796,150],"class_list":["post-20238","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-innovation-business","category-primezone","tag-ceo","tag-ceo-northam","tag-data-mining","tag-innovation","tag-market","tag-mining","tag-printed-version"],"_links":{"self":[{"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/posts\/20238","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/comments?post=20238"}],"version-history":[{"count":5,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/posts\/20238\/revisions"}],"predecessor-version":[{"id":22561,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/posts\/20238\/revisions\/22561"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/media\/20254"}],"wp:attachment":[{"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/media?parent=20238"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/categories?post=20238"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/servidor-mxigen1.com\/ceona-antiguo\/wp-json\/wp\/v2\/tags?post=20238"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}