Data Analytics A Practical Guide To Data Analytics For Business, Beginner To Expert 1547156996

Understand Data Analytics and Implement it in Your Business Today Do you want improve your revenue and stop missing ou

753 79 316KB

English Pages 51 Year 2017

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Chapter 1 what is data analysis?......Page 5
Chapter 2 Why use data analysis?......Page 14
Chapter 3 The significance of data analysis for your business:......Page 16
Descriptive analytics......Page 24
Diagnostic analytics......Page 28
Predictive analytics......Page 29
Prescriptive analytics......Page 38
Chapter 5 Collecting Data......Page 39
Chapter 6 Mistakes to avoid......Page 44
Conclusion......Page 51
Recommend Papers

Data Analytics A Practical Guide To Data Analytics For Business, Beginner To Expert
 1547156996

  • Commentary
  • Data Analytics A Practical Guide Understand Data to Analytics and Implement it in Your Business Today
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Data Analysis Table of Contents

Chapter 1 what is data analysis? Chapter 2 Why use data analysis? Chapter 3 The significance of data analysis for your business: Chapter 4 Types of and explanation of data analytics Descriptive analytics Diagnostic analytics Predictive analytics Prescriptive analytics Chapter 5 Collecting Data Chapter 6 Mistakes to avoid Conclusion

Introduction I want to congratulate you and thank you that you have downloaded this book. Data analysis is the most common buzzword in the world of internet today and it is very difficult to understand what it is. There are so many definition of data analysis. The most important definition is data analysis is the procedure in which data is transferred, modeled, cleared and checked by business and the ultimate goal is to use in decision making procedure. The fields of data analysis and predictive analytics are so vast and have so many sub-branches, which are even more extensive. One of these branches is prescriptive analysis, which I will briefly cover in this Book. I have covered only the fundamentals of these fields in this book. This method is being used by large industries to determine what will happen in the future, and they use the information to prevent or make things happen in the future. I think you will enjoy this book. It will motivate you to manage your data and execute predictive analysis to get maximum profits. Thank you again as you have download this book, I think you like it.

Copyright 2017 by ______________________ - All rights reserved. The document is giving reliable and exact info in respect to the issue and topic covered. The publisher is not needed to render accounting, authoritatively legalized or qualifies services and publication is sold. If counseling is essential professional or legal, an experienced person in this profession must be ordered. - Statement of principles was approved and accepted equally by committee of Amerian Bar Association plus a committee of Associations and publishers. It is not legal to recreate, duplicate or transfer and portion of this paper in printed form or electronic means. It is firmly banned to recording of this publication and storage of this paper in not permitted without permission from publisher. The info is consistent and truthful. Any damage, reparation or monetary loss because of the info stated here, any legal blame or responsibility will be held against publisher. Respective authors possess all copyrights not held by publisher. The info is presented for informational resolves. The info is without deal or any kind warranty assurances All brands and trademarks are for descriptive reasons, not connected with this paper. The publication of trademark is without backing by the trademark proprietor.

Chapter 1 what is data analysis? In data analysis procedure raw data is collected and transformed into info. Stakeholder can use this for making good decision Stages Involved Into Data Analysis Below stages are common for all companies though data needs might not be similar for each firm. Phase 1: Choose on the Objective In this process you have to set clear objective which should be precise and measurable which is in the shape of question. For example owing to competitors’ product it is difficult for your company’s product to get off shelves. The queries that you might enquire are, “Is my product high-priced?” “What is exclusive about the contestant’s product?” “What is the target listeners for the contestant’s product?” “Is my technology or process redundant?” Why is enquiring these queries upfront significant? This is since your data assemblage depends on the kinds of queries you ask. For example, if your question is, “What is exclusive about the contestant’s product?” you would have to do an analysis on the spec of the product and collect feedback from the customers about what they like around the product. Alternatively, if your question is, “Is my procedure or technology redundant?” you will need to as conduct a survey near the technology used by others in the similar industry and perform an audit of the current processes and skills used at your establishment, also. Notably, the kind of questions you ask will affect the nature of data

collected. Given that data investigation is a tedious procedure, it is essential that you do not waste the time of your data science team in gathering useless data. Ask your queries right! Phase 2: Set Measurement Priority You need to establish measurement priority next As you have decided on your objectives. This is done over the following two phases: Choose on whatever to measure You have to decide what type of data you have to answer your query. For instance, if your query pertains to dropping the number of works without compromising the excellence of your product otherwise service, then the data that you requisite in hand right away are as follows:

-

-

The price of employing the current number of team members The number of workforce members hired by the company The proportion of time and efforts spent by the present workforce

members on current processes When you have the above data, you would have to ask other queries ancillary to the main question, for example, “Is there any procedure that can be altered toward improve the efficiency of the staff?” “Are my staff not being used to their completest potential?” “Will the company meet increased demands in the future in spite of the reducing of manpower?” The data collected in association with these auxiliary questions will aid you make better verdicts. These ancillary questions are as significant as the main objective question.

Choose on how to measure Decide on the parameters that will be used to measure data before you start collecting it is highly important. This is because in analyzing the collected data how you measure your data plays an important role in the advanced stages. At this stage some of the questions you

require to ask yourself are as follows:

-

What is the unit of measure? For instance, you need to reach at the base

price using a definite currency if your product has worldwide markets and you are obligatory to determine the pricing of your product, as well as then extrapolate it hence. In this case, selecting that base money is the solution. -

What is the time frame of completing the analysis? What are the factors that you requisite to include? This might again

depend on the query you have asked in stage 1. You need to decide on what issues you need to take into consideration through respect to the price of employment In the case of the staff reducing question. Whether you will be taking the gross salary package into consideration otherwise the net yearly salary drawn by the worker, you need to decide on. Phase 3: Gathering of Data it will be easier for you toward collect data in a phased way as you have already set your priority plus measurement parameters,. Here are a few pointers that you requisite to bear in mind before you gather data: we saw the diverse sources of data in the preceding chapter. Take stock of the data accessible before you collect data. For example, you can just stare at the payroll to know the amount of employees available in the case of the reducing of the workforce question. This might save you the time necessary for collecting this specific data again. Similarly, all accessible info need to be organized. spend a good amount of time in determining on the questions that you want toward ask if you intend to collect info from external sources in the shape of a questionnaire. You should you circulate it simply when you are

pleased with the questionnaire plus believe that it serves your main objective. You will have varied data in hand which would not be possible to compare if you keep circulating diverse questionnaires. you have to Ensure that you have appropriate logs when you enter the data collected. This might help you evaluate the trends in the market. For example, assume you are conducting a survey concerning your product over a time of two months. The shopping habits of people change drastically throughout holiday seasons, more than any other time of the year. You will end up with surplus figures, when you do not comprise the date plus time in your data record, and this will influence your decisions considerably. For the purpose of data gathering Check the budget assigned. You will be able to recognize the approaches of data collection that are cost effective Based on the existing budget. For instance, if you have a tight budget, you can opt for free online survey tools as opposite to the published questionnaires that are involved in the packages. Similarly, to conduct mini surveys and collect the data required you can make the finest use of social networking sites. You can go for printed and smart questionnaires that can be distributed along with the package otherwise that can be circulated at retail outlets, if you have a sufficient budget. While marketing your product in one go you can also organize competitions to collect data. For the clienteles to drop these filled out surveys you can set up drop boxes at adjacent cafes and malls. Phase 4: Data Cleaning Data cleaning is crucial in this procedure in terms of ensuring that worthless data does not find its means into the analysis phase. For instance, when you correct the spelling errors in the collected surveys and feed them into your

system, it is nothing however data cleaning. It will affect the excellence of your decision when you have junk data in the system. For example, assume 50 out of 100 persons responded to your surveys. But, you get 10 incomplete survey forms. You could not count these ten forms for the reason of analysis. In reality, you have acquired only a 40% response toward your questionnaire, not 50%. These numbers will make a big variance for the management that would be making choices. Similarly, you need to be extra cautious at this stage if you are directing an area-wide survey, since most people have a tendency to not reveal their correct addresses in the surveys. Hence, you will never be capable to catch these errors unless you have a fair impression about the populace of each region. Why is it significant to catch these errors? Assume that your survey results display that the mainstream (say 70%) of your client base is from X area. In realism, the populace of the region is not even adjacent to 30% of your client base. Now, assume that you decide to make a choice based exclusively on your investigation results. You choose to launch a special marketing drive in this area. Obviously, the marketing drive would not improve your sales since even if all the peoples in the region purchase your product, they comprise not 70% only 30% of your customer base, as you imagined. Henceforth, these little statistics play a significant role when it comes toward big and costly decisions. As you can see, for making better choices improving the excellence of data is highly significant. You should computerize this process as this includes a lot of time. For instance, you can get the computer toward detect entries that have improper or incomplete zip codes to identify fake addresses. If you are using an Excel sheet toward store your data this could be easily done. Otherwise, you can get in touch with the software designer toward put certain algorithms in place to take care of such data if you have modified software for feeding

plus storing data. Phase 5: Analysis of Data Now it is time to process it as you have collected the essential data. You may resort to diverse techniques to study your data. Some of the methods are as follows: Investigative data analysis In this method data sets are analyzed with a view to review their distinct features. This method was settled by John W. Tukey. As said by him, too much significance was being placed on numerical hypothesis analysis, which is nothing but assenting data analysis. He felt the need to usage data for the drive of analysis hypotheses. The key objects of investigative data analysis are as follows: (i) Evaluating the assumptions on which the arithmetic inference would be based (ii) Suggestion of theories in connection by the causes of the phenomena below question (iii) and tools

Supporting the choice of appropriate arithmetical techniques

(iv) Providing a basis for additional data collection over such modes as surveys otherwise experiments In the fields of data mining as well as data analysis several techniques set by investigative data study have been widely used. To induce arithmetic thinking in students these methods also form part of definite curricula. You will be

required toward clean up more data as you perform investigative data analysis. In some cases, to complete the study you will be requisite to collect more data to confirm that the analysis is supported by meaningful plus complete data. Descriptive figures By this method, data is analyzed to classify and define the main features otherwise characteristics of the data collected. This is diverse from inferential statistics, where the data collected is examined to learn more around the sample. These verdicts are then inferred to the general populace based on the sample. For now, descriptive statistics only objects to summarize plus describe the data collected. These explanations about the collected data could either be quantitative or visual. These summaries might just be the start of your data analysis procedure. These could form the base on which further study is conducted toward process the data. To understand this well, let us look at an instance. The shooting proportion in the game of basketball is nothing however a descriptive statistic. This shooting proportion designates the performance of the band. It is calculated through dividing the number of shots completed by the number of shots taken. For example, if a basketball player’s shooting proportion is 50%, it means that he creates one shot in each two. Other tools used below descriptive statistics comprise, range, variance, mean, median, mode standard deviation, etc. Statistics visualization Data visualization is nothing however the depiction of data in a visual form. This can be done with the aid of such tools as plots, statistical graphics, informational graphics, charts, and tables. The objective of data picturing is to

communicate the data in an operative fashion. It helps in the analysis of data as well as in reasoning about data and evidence while you are able to represent data efficiently in a visual form. While put in visual form even complex data could be understood and examined by people. These visual representations furthermore facilitate easy assessment. For example if all the related data are symbolized in visual form you will be capable to do so easily, if you are vested with the job of studying the performance of your product plus that of your competitor. All your data team requisites to do is use the data relating to the parameters, for example price, number of units sold, specs, etc., and then put it in graphic form. This way, you would be able to evaluate the raw data effortlessly. You will also be capable to establish a connection between the diverse parameters and make verdicts accordingly. For example, if you notice that your sales are lower than your competitor’s, and your price is higher than your competitor’s, then you know wherever the problem lies. The reduced sales can be credited to the upsurge in price. This could be easily addressed through reworking your prices. You can also usage software available in the market Separately from these three main methods. The prominent software presently available for the drive of data analysis comprises Minitab, Stata, plus Visio. Let us not forget the versatile Excel. Phase 6: Interpreting the Outcomes Once you have examined your data, it is time to understand your results. Here are a few queries that you need toward ask:

Does the investigated data answer your key query? If yes, how so? did your data help you defend them if there were any objections to begin with? If yes, how so? Do you consider there are any limitation to your outcomes? Are there any viewpoints that you haven’t measured while setting priority? Do you have trained persons to interpret data correctly? If the analyzed data pleases all the above queries, then your examined data is final. This info can now be used for the resolve of decision-making.

Chapter 2 Why use data analysis? The importance of precisely interpreting the data could not be emphasized sufficient. Your website, company, etc., must have skilled professionals who distinguish how to take organic data plus interpret the outcomes properly. For example, let us say your company finds it essential to examine data from two of the most prevalent social media stages – Facebook and Twitter. Your company could not depend on an untrained proficient to respond efficiently to your “likes” otherwise “tweets” on a minute-by-minute base. To manage their social platforms most companies today hire a Social Media Manager. These individuals efficiently respond to your clienteles in a way that represents your brand and are skilled to know the details of each social platform. It’s necessary to hire specialists who have the training essential to take the unstructured and otherwise arbitrary data and structure it in a comprehensible manner. This will alteration the dynamics of what choices need to be made founded on the data and how your firm operates. Trained professionals can trace the customers’ behavior concerning the use of your product and take all of your clients’ “likes” on your business Facebook page. Follow the decision-making procedure of these customers. The customers like your product, then what? Do they reap the profits of using your product? Do they read the product description? What makes your firm’s product well than the rest of the competition? Is your product sensibly priced in contrast to your competitor’s values? Trained data analysts will be able to analyze the outline that your customers will take and to trace these questions. They will follow the trace that the

customers make. They can investigate the data from the unique “like” from your customer all the way to the buying on your website. The right persons who have the training to follow plus analyze this process can aid your company generate increased product sales. They tool this info and disseminating it to the suitable team members through the company. Having meaningful and properly interpreted data may be the difference among your company expanding its effect or shutting down owing to misinterpreted statistics. An example of how you can study tweets is by interpreting ancient tweets and distinctive the considerable “tweet” from the casual “tweet.” Data interpreters are able to analyze the effect of such “tweets” on customer buying habits and analyze historical data from preceding company “tweets”. These experts can translate which “tweets” are just social and which “tweets” are considerable. The analyst is competent to trace the impact on the customer’s initial mindset as to whether he or she will aid achieve the company’s core objective of buying the product from the initial root message texted on Twitter. Which text is more considerable than others? Why is it more effective? Do images with the “tweets” tend to convince your customer base to purchase your product? Which “tweets” work finest with what areas in the world? Which “tweets” work greatest with whatever age group? These are important questions that could be answered by data. They identify what marketing strategies are working the finest on each platform and they show why it is significant to have analysts review. Analysts can understand large amounts of data with the usage of visual graphs showing statistical data figures. These can be given to the suitable departments so as to they can make decisions to improve the general sales experience for your clienteles.

Chapter 3 The significance of data analysis for your business: Data can improve the efficacy of your business in several ways. Here is a flavor of how data can play a significant role in upping your game. Improving Your Promotion Strategies: It is easier for a company toward come up with inventive and attractive marketing approaches Based on the data collected. It is easier for a firm to alter current marketing strategies plus policies in such a style that they are in line by the current trends plus customer expectation. Classifying Pain Points Data can help you categorize any deviations from the usual if your business is driven by prearranged processes and patterns. These small deviations might be the reason behind the sudden increase in customer complaints, decrease in sales, or decrease in productivity. You will be capable to catch these little accidents early and take educative actions With the aid of data. Detecting Scam It will be easier for you to notice any fraud that is being dedicated when you have the numbers in hand. For example, when you have the acquisition invoice of 100 units of tins as well as then you see from your sales reports that merely 90 tins have been sold, then you are misplaced ten tins from your list, and you know where to look. Most companies are unaware of the fraud being committed in the first place and they are being silent victims of deception. One

significant reason for this is the lack of suitable data management, which might have helped perceive fraud easily in the initial stages.

recognize Data Breaches The blast of complex data streams in the previous few years has brought in the region of fraudulent practices on a new set of difficulties. They have become delicate and complete. Your company’s retail, payroll, accounting, and other business systems is adversely impacted by their negative effects. In other words, data hackers would become more devious in their attack on business data systems. Your company can stop fraudulent data compromises in your system, which can strictly cripple your business by using data analytics plus triggers. To detect early signs of deceitful activity in your data system Data analytics tools allow your firm to develop data testing procedures. Standard fraud testing might not be feasible in certain conditions. Special tailored tests can be developed plus used to trace any probable fraudulent activity in your data procedures if this is the case with your company. Traditionally, to investigate fraud plus implement breach stoppage strategies companies have waited till their operations have been impacted fiscally. This is no longer viable in today’s rapidly altering data-saturated world. With info being disseminated global so rapidly, undetected fraudulent action can cripple a firm and its subsidiaries in no time worldwide. Contrariwise, data analytics testing could stop potential fraudulent data annihilation by revealing indicators that deception has begun to seep in to data systems. If these data analytic tests are applied occasionally Fraud can be stopped rapidly for a company as well as its partners worldwide.

Improving Client Experience Data also comprises the feedback provided by clienteles as I beforehand mentioned. You will be able to work on areas that can aid you improve the excellence of your product or service Based on their feedback and so satisfy the customer. Similarly, you will be able to modify your product or service in a better style when you have a repository of client feedback. For instance, there are firms that send out customized private emails to their clienteles. This sends out a message that the firm genuinely cares about its clienteles and would like toward satisfy them. This is possible exclusively owing to effective data management. Decision Making Data is crucial for creating important business choices. For instance it is important that you first collect data around the current trends in the market, the pricing of the competitors, the size of the consumer base, etc., if you want to launch a novel product in the marketplace. If the decisions made by a company are not driven by data, then it could cost the company a lot. For instance there is an option that your product might be high-priced, if your firm decides to launch a product without seeing the price of the contestant’s product. As is the case with most high-priced products, the company would have concern growing the sales figures. I do not really just refer to the choices pertaining to the product otherwise service presented by the company While I say decisions. Data can also be valuable in making decisions with respect to the role of departments, manpower managing, etc. For example, data can aid you assess the number of workers required for the operative functioning of a division, in line with the

business necessities. This info can help you decide whether a definite division is overstaffed otherwise understaffed.

Hiring Procedure Using data in choosing the right personnel seems to be a deserted practice in business life. It’s critical to place the most competent person in the correct job in your company. You want your business to be extremely successful in each facet of operation. Using data to employ the right person is a sure method to put the best individual in the job. What kinds of data would you usage to appoint a professional? Big companies, which have astral budgets, use big data toward locate and choice the most skilled persons for the correct jobs. Start-ups and small firms would benefit hugely from using big data to appoint the right group of persons to make their recruitment prosperous from the start. For hiring the correct fit for groups of all sizes This Avenue for collecting data for hiring drives has proven to be an effective avenue. Yet again, companies can usage their data scientists to excerpt and interpret the specific data required by human resource divisions. Using Social Media Platform to Recruit For finding high-profile applicants for the right places within firms Social media platforms (Twitter, Facebook, and LinkedIn to name a few) are concealed gold mines as data source. Take Twitter, for example; firm job recruiters can follow persons who tweet precisely about their business. A company can find and recruit the perfect candidates founded on their knowledge of a precise industry or job inside that industry through this

process. Do their “tweets” motivate new thoughts as well as possibly new inventions for their business? If so, you have a whole pool of prospective job applicants. Facebook is alternative option for data collecting for potential job applicants. Remember, corporations might use them as part of a main cost-effective strategy and these avenues are virtually free. Facebook is all about gathering social networking data for firms looking to expand their staff or replace a current open position. Company recruiters could join industry niche groups otherwise niche job groups. “Liking” and following group member’s comments will allow highly motivated job ads to be posted within the group and will establish the firm’s presence within the group. The company can upsurge views, thereby spreading the pool of prospective job candidates. It is easy to establish a timeline to brand the firm as an innovative plus cuttingedge place to work. By engaging with friends/followers who are in the same industry as your company You establish your existence. For a minimal promotion fee, you can promote your job advertisement. You geometrically upsurge your reach amongst potential job seekers through doing this. You will reel in a greater yield of highly accomplished job searchers, so greatly increasing the proportions of people who are the faultless fit for your job if your firm issues highly functioning job data posts. Niche Social Group Niche socials groups are particular groups that you can join on social plus web platforms that can aid you find precise skillsets. For example, what better place to find a prospective recruit than by joining a precise human resources social group if you are seeing to hire a human resources manager? You may find the correct person to fit into your human resources place by Locate social

connections inside that group and then post expressive but appealing job data posts. Group members will certainly have referrals Even if your firm does not find the right person. Again, approaching these crowds is a very cost-effective method to promote your job posts. Innovative Data Collecting Methods for the Appointment Process Why not try new approaches of data collection to appoint the right professional and think outside the hiring process box? Use social collecting data websites that collect data, such as Google+, LinkedIn, Facebook, and Twitter. Your company can extracting relevant data from posts made by prospective job candidates plus search on these sites. Such data can be used to aid your company connect with highly effective job applicants. Avery good data pool to usage is keywords. Keywords are used on the internet for every type of exploration imaginable. Why not use the most noticeable keywords in your online job description? Your firm can widely upsurge the number of views that your job posting will entice by doing this. To find the right job candidate for your firm you can also use PCs and software. Traditionally, analyze whether a current employee is a correct fit for another job inside the company or these data sources have been used to either dismiss a company’s employee. Why not try a whole new data gathering system? This system would be fixed in a set of standards diverse from the usual IQ tests, skill testing, otherwise physical exams. These are still valued tools to measure candidates by, however they are limiting. Another focus can be the sturdy personality traits an applicant may possess. Is the individual negative? Is the person an isolationist who doesn’t get together with other people? Is the person argumentative? These kinds of people can be recognized through this personality trait database

plus then filtered out as probable team members. This type of data, will save the firm time, resources, and training resources when correctly extrapolated. The company has for the job and the individual who would possibly fill the job by eliminating a gap between the anticipations. Another advantage of this data gathering system is that the results will identify not only people with the right persona to fit in with the present company culture but also skilled persons for the correct jobs. It’s imperative that a person will be able to engage other staffs to produce the most effective working relations and is sociable. The healthier the working atmosphere, the more effective firm production is, generally.

Gamification This is an exclusive data tool that isn’t presently in extensive use. It does motivate applicants to press in and place forth their finest effort in the job selection procedure. You provide persons with “badges” and additional virtual goods that will inspire them to persevere over the process. In turn, their skills in being capable to perform the job necessities will be eagerly obvious. This also creates the job application a fun experience in place of a typical boring task. Job Previews Pre-planning the job employing process with precise data about what the job requirements are will make the job seeker to know whatever to expect if he or she is appointed for the position. It is said that lots of learning on the work is by trial and error, which extends the learning procedure. This takes more time toward get the worker up to speed to function competently as a valuable

resource inside the company. Including the job preview data into the appointment process decreases the learning curve and aids the employee become effective much quicker. These are some inventive data gathering approaches companies can usage to streamline the hiring procedure. They also aid human resource department’s choice the most skilled persons to fill their employment requirements. Hence, data is vital in aiding trades to make effective verdicts. These are some of the details for the effective functioning of a business why data is crucial. Let us get into the other features of data analysis in the upcoming chapters as we have had a glimpse at the significance of data.

Chapter 4 Types of and explanation of data analytics

Descriptive analytics What is Descriptive Analysis? Descriptive analysis is the most usually used kind of analysis by businesses and oldest. In business, as it provides the knowledge required to make future forecasts, similar to what intelligence organizations do for governments this kind of analysis is frequently referred to as business intelligence,. This category of analysis comprises analyzing data from the preceding through data aggregation plus data mining methods to determine what has occurred so far, which could then be used to determine what is probable to occur in the future. As the name suggests Descriptive analysis fairly literally describes previous events. We can turn such data into facts and statistics that are comprehensible to humans, thereby permitting us to use this data for planning our future events by using various data mining methods and processing this statistics. Descriptive analytics allows us to learn from the previous events, whether they happened a day or a year ago, as well as to usage this data to expect how they might affect future behavior. For instance, if we are able to see trends similar rising or falling numbers and are aware of the regular number of product sales we made per month in the previous three years, we can expect how these trends would influence future sales, otherwise we can see that the quantities are going down. This means we will need to change somewhat so as to get the sales back up, whether it means re-branding, increasing our team, or presenting

new products. Maximum of the statistics businesses usage in their daily operations fall into the category of descriptive analysis. What statisticians would do is to collect descriptive statistics from the past as well as then convert them into a language comprehensible to the management plus employees. Using descriptive analysis permits businesses to see such stuffs as how much of the product sales proportion falls to expenses, how much they are expending on average on several expenses, and how much is perfect profit. All of these permit us to cut corners plus make more revenues in the end, which is the term of the game in business. How Can We Use Descriptive Analysis? Descriptive statisticians usually turn data into understandable output, such as reports with charts that show what kind of trends a company has seen in the past in a graphical and simple way, thus allowing this company to anticipate the future. Other data include those regarding a particular market, the overall international market, or consumer spending power, etc. A good example of descriptive analysis can be a table of average salaries in the USA in a given year. A table like this can be used by various businesses for many purposes. This particular example of statistical analysis allows for deep insight into the American society and individuals’ spending power and has a vast array of possible implications. For instance, from such a table, we could see that dentists earn three times more money than police officers, and such data could possibly be useful in a political campaign or in determining your target audience for a given product. If a business is a fledgling, for example, they could make a vast number of decisions about their business plan on the basis of this table. Values in Descriptive Analysis

There are two main ways of describing data, and these are measures of central tendency and measures of variability or dispersion. When we are talking about measuring a central tendency, we basically mean measuring data and finding the mean value or average from a given data set. This mean is determined by summing up all the data and dividing it by the number of data units, getting an average value that can be used in various ways. Another unit used in measuring the central tendency – which is perhaps even more useful – is the median. Unlike the mean, the median takes into consideration only the middle value of a given data set. For instance, in a string of nine numbers, the fifth number is considered the median. If we arrange all our numbers from lowest to highest, the median will often be a more reliable value than the mean because there could be outliers at either end of the spectrum, which bend the mean into a wrong number. The outliers are extremely small or big numbers that will naturally make the mean unrealistic, and the median will be more useful in cases where there are outliers. Measuring the dispersion or variability allows us to see how spread out the data is from a central value or the mean. The values used to measure the dispersion are range, variance, and standard deviation. The range is the simplest method of dispersion. The range is calculated by subtracting the smallest number from the highest. This value is also very sensitive to outliers, as you could have an extremely small or high number at the ends of your data spectrum. Variance is the measure of deviation that tells us the average distance of a data set from the mean. Variance is typically used to calculate the standard deviation, and by itself, it would serve little purpose. Variance is calculated by calculating the mean, then subtracting the mean from each data value, squaring each of these values to get all positive values, and then finding the sum of these squares. Once we have this number, we will divide it by the total number of data points in the set, and we will have our calculated

variance. Standard deviation is the most popular method of dispersion as it provides the average distance of the data set from the mean. Both the variance and standard deviation will be high in instances where the data is highly spread out. You will find the standard deviation by calculating the variance and then finding its square root. Standard deviation will be a number in the same unit as the original data, which makes it easier to interpret than the variance. All of these values used to calculate the central tendency and the dispersion of data can be employed to make various inferences, which can help with future predictions made by predictive analytics. Inferential Statistics Inferential statistics is the part of analysis that allows us to make inferences based on the data collected from descriptive analysis. These inferences can be applied to the general population or any general group that is larger than our study group. For instance, if we conducted a study that calculated the levels of stress in a high-pressure situation among teenagers, we could use the data we collect from this study to anticipate general levels of stress among other teenagers in similar situations. Further inferences could be made, such as possible levels of stress in older or younger populations, by adding other data from other studies, and while these could be faulty, they could still potentially be used with some degree of credibility.

Diagnostic analytics Diagnostic analytics is characterized by methods such as drill-down, data mining data discovery, and correlations, and is a form of advance analytics which inspects data or content to response the question “Why did it occur?” Diagnostic analytics takes a profounder look at data to attempt toward understand the causes of events plus behaviors. What Are The Benefits of Diagnostic Analytics? Diagnostic analytics lets you understand your data faster to answer critical workforce questions. Cornerstone View provides the fastest and simplest way for organizations to gain more meaningful insight into their employees and solve complex workforce issues. Interactive data visualization tools allow managers to easily search, filter and compare people by centralizing information from across the Cornerstone unified talent management suite. For example, users can find the right candidate to fill a position, select high potential employees for succession, and quickly compare succession metrics and performance reviews across select employees to reveal meaningful insights about talent pools. Filters also allow for a snapshot of employees across multiple categories such as location, division, performance and tenure

Predictive analytics We have seen how data analysis are vital for the effective operative of a business. Let us look at another extension of data mining that plays a dynamic role in the development of a business. In this chapter, I will be taking you over the various features of predictive analytics. And help you appreciate the role it plays in easing the effective operational of a business. What is Predictive Analytics? In simple terms, predictive analytics is nothing but the art of obtaining information from collected data and utilizing it for predicting behavior patterns and trends. With the help of predictive analytics, you can predict unknown factors, not just in the future but also in the present and past. For example, predictive analytics can be used to identify the suspects in a crime that has already been committed. It can also be used to detect fraud as it is being committed. What are the Types of Predictive Analytics? Predictive analytics could be referred toward as predictive modeling. In modest terms, it is the act of combination data with predictive models as well as arriving on a conclusion. Let us stare at the three models of predictive analytic. Predictive Models Predictive models are nothing however models of the relationship among the precise performance of a definite element in a sample as well as a few known qualities of the sample. This model aims at evaluating the probability that an

alike element from a diverse sample might exhibit the similar performance. It is extensively used in marketing. In marketing, predictive models are used to classify subtle patterns, which are then used toward identify the clients’ preference. These models are proficient of performing calculations as plus when a definite transaction is occurring, i.e., live dealings. For instance, they are accomplished of evaluating the chance or risk related with a certain deal for a given client, thereby helping the client decide if he or she wants toward enter into the deal. Given the developments in the speed of calculating, individual agent modeling schemes have been designed toward simulate human reactions or conduct for certain situations. Predictive Models in Relation to Crime Scenes To help police departments plus criminal investigators rebuild crime scenes without violating the reliability of the proof 3D technology has brought big data toward crime scene studies. There are two kinds of laser scanners used through crime scene specialists: Time-of–flight laser scanner The scanner shoots out a ray of light that bounds off the targeted thing. Diverse data points are measured as the light yields to the sensor. It’s accomplished of gauging 50,000 points per second. Phase shift 3D laser scanners These scanners are much more effective however also much more expensive. They gauge 976,000 data points per second. These scanners usage infrared laser technology. These data laser scanners create crime scene rebuilding much easier. The

process takes a lot less time than customary crime scene rebuilding takes. The benefit of 3D technology is that investigators could re-visit the corruption scene anywhere. Investigators could now do this whereas they are at home, in their workplaces, or out in the field. This makes their occupation more mobile, as well as they can visit the crime scenes almost anywhere. To recall the details of the crime scene they no longer have to rely on notes or their memories. Furthermore, they visit the crime scene one time, and that is it. They have all the data imageries recorded on their scanner. Investigators are able to re-visit crime scenes by viewing the images on computers or iPads. The distance between objects will be reviewed (like weapons.) The beauty of this technology is that crime experts do not have to second guess the information gathered from crime scenes. The original crime scene is constructed right there on the scanner images. It’s as if the crime was committed right there on the scanned images. The images tell the story about who the perpetrators were and how they carried out the crime. Investigators can look at the crime scenes long after they are released. Nothing in the crime scene will be disturbed or compromised. Any evidence that is compromised is inadmissible in court and cannot be used. All evidence must be left in the original state and should not be tampered with. This is no problem when the evidence is recorded in the data scanners. Law enforcement engineers are able to reconstruct the whole crime scene in a courtroom. Forensic evidence is untouched and left intact, thus guaranteeing a higher rate of convictions.

Forensic Mapping in Crime Scene Rebuilding The persistence of 3D forensic mapping is to rebuild every detail of the crime scene holistically. This is a actual pure way of rebuilding crime scene proof. None of the proof is touched or by accident thrown away. Investigators do not have toward walk in or round the evidence, therefore avoiding the option of anything being by accident dropped otherwise kicked. Predictive Analysis Techniques Regression Techniques These methods form the base of predictive analytics. They object to establish a mathematical equation, which would in turn serves as a model for signifying the interactions amongst the different variables in question. Different models could be applied for execution predictive analysis Founded on the circumstances. Let us look at several of them in detail nowadays below. Linear Regression Model This model evaluates the relationship between the set of independent variables related with it and the dependent variable in a given state. This is usually uttered in the formula of an equation. The dependent variable is uttered as a linear function of diverse parameters. These parameters could be adjusted in such a way that it leads toward the optimization of measure of fit. The objective of this model is the selection of parameters with the aim of minimizing the sum of the squared residuals. This is known as the ordinary least squares estimation. Once the model is estimated, the statistical significance of the different coefficients used in the model must be checked. This is where the t-statistic comes into play, whereby you test whether the coefficient is different from zero. The ability of the model to predict the

dependent variable depending on the value of the other independent variables involved can be tested by using the R2 statistic.

Discrete Choice Models Linear regression models is continuous and could be used in cases in which the dependent variable has a limitless variety. However, there are definite cases in which the dependent variable is not incessant. In such cases, the dependent variable is distinct. Logistic Regression A categorical variable is one that has a static number of values. For example, if a variable can take two standards at a time, it is named a binary variable. Categorical variables that have additional than two values are mentioned to as polytomous variables. One instance is the blood kind of a person. Logistic regression is used to determine and measure the relationship between the categorical variable in the equation and the other independent variables associated with the model. This is done by utilizing the logistic function to estimate the probabilities. It is similar to the linear regression model. However, it has different assumptions associated with it. There are two major differences between the two models. They are as follows: The linear regression model uses a Gaussian distribution as the conditional distribution, whereas the logistic regression uses a Bernoulli distribution.

The predicted values arrived at in the logistic regression model are probabilities and are restricted to 0 and 1. This is because the logistic regression model is capable of predicting the probability of certain outcomes. Probit Regression Probit models are used on behalf of logistic regression for forthcoming with models for categorical variable. This is used in the case of binary variables, namely, categorical variables, which could take only two values. This technique is used in economics to forecast those models that usage variables that are not simply continuous however also binary in nature.

Machine Learning Techniques Machine learning is an arena of artificial intelligence. This is initially used for the drive of developing methods that will ease computers to learn. Machine learning contains of an array of statistical approaches for classification plus regression. This is now being hired in dissimilar fields, for example medical diagnostics, credit card fraud detection, face recognition, speech recognition, and stock market study. Neural Networks Neural networks are non-linear modeling methods and classy capable of modeling compound functions. These methods are extensively used in the arenas of neuroscience, finance, cognitive psychology, physics, engineering, as well as medicine.

Multilayer Perceptron The multilayer perceptron is prepared of an input layer plus output layer. Besides these, there are one or more hidden layers prepared up of nonlinearly activating nodes otherwise sigmoid nodes. The weight vector plays an significant role in regulating the weights of the network. Radial Basis Functions A radial basis function has an inherent distance criterion in link with a center. These functions are usually used for the interruption and smooth out of data. These functions have furthermore been used as portion of neural networks on behalf of sigmoid functions. In such cases, the net has three layers, viz., the hidden layer through the radial basis functions, the input layer, plus the output layer. Support Vector Machines Support vector machines are hired for the drive of exploiting and detecting complex outlines in data. This is accomplished by classifying, clustering, plus ranking the data. These learning machineries can be used for the drive of performing regression estimates and binary categorizations. There are numerous types of support vector machines, counting polynomial, linear, plus sigmoid, to name a few. Naïve Bayes This is founded on the Bayes conditional probability regulation. This is an easy method that uses built classifiers. In other words, this is used for the drive of classifying numerous tasks. This technique is based on the supposition that

the forecasters are statistically independent. Such independence creates it a great tool for ordering. K-Nearest Neighbors The adjacent neighbor algorithm forms portion of the pattern acknowledgment statistical methods. Under this technique, there are no fundamental assumptions related with the distribution from which the sample is drawn. It contains of a training set as well as has both positive plus negative values. While a new sample is drawn, it is categorized founded on its distance from the adjacent neighboring training set. Geospatial Predictive Modeling The underlying standard of this technique is the supposition that the incidences of the events being modeled are restricted in terms of circulation. In other words, the events of events are neither arbitrary nor uniform in circulation. Instead, additional spatial environment issues, such as infrastructure, sociocultural, topographic, etc., involved. 1. Deductive method This method is founded on a subject matter expert otherwise qualitative data. Such data is then used for the drive of describing the association that exists among the occurrence of an occasion and the factors related with the atmosphere. 2. Inductive method This technique is based on the spatial association that exists between the happenings of events as well as the factors that are related with the

atmosphere. Every happening of an event is first plotted in the topographical space.

Prescriptive analytics The prescriptive analytics model is the state-of-the-art branch in the data world. Several are already in dispute that it is the single way to go in the analytical world in the upcoming. It doesn’t appear like prescriptive analytics has not caught fire in the trade world at present. It will convert more widespread in diverse industries. Companies will understand the profits of having a model that will propose answers to future questions. This the most advanced constituent of this comparatively new technology.

Chapter 5 Collecting Data Starter Software to gather Data There are also plenty of free, open source products accessible that you can get started by right now, for no money down while there are certainly amply of commercial predictive analytic software on the marketplace today. If you don’t see the kind of software that you are in search of here, odds are a rapid online search will be capable to point you in the correct direction while the following list would give you a good place to start Apache Mahout: Apache Mahout is a free scalable algorithm connected to machine learning plus primarily focused on clustering classification, and collaborative filtering Created by the Apache Software foundation. When it comes to common math processes counting things like statistics plus linear algebra it also allows users to access communal Java libraries for help. For those who need to use them instead, it also offers access to primeval Java assortments as well. The algorithms that it usages for filtering, organization and clustering have all been executed based round the Apache Hadoop map paradigm. As of September 2016, this is still a comparatively new product which means that there are still gaps in whatever it can and cannot offer whereas it has a wide variety of algorithms by now available. In order to function properly Apache Mahout does not need Apache Hadoop. Apache Mahout could be found online for free at mahout.apache.org. GNU Octave: GNU Octave is a program language software intended for highlevel use. It was originally intended to help with complex numerical calculations of both the linear and nonlinear diversity as well as other similar arithmetical based experiments. It primarily uses a batch-oriented language that

is mainly compatible by MATLAB also. It was shaped as part of the GNU Project. It is a free software by default founded on the GNU General Public License. The fact that it is usually compatible with MATLAB is notable since Octave is the main free competitor to MATLAB. Using the command line so as to force Octave to try and comprehensive the file name, function otherwise variable in query Notable GNU Octave features comprise the aptitude to type a TAB character. It does this using the text previous to where the cursor is location as a standard for what requirements to be completed. You can also shape your data structures to a restricted degree as well as look into your command history. Furthermore, GNU Octave usages logical short-circuit Boolean kind operators which are then assessed in the way that a short circuit will be. It also offers a restricted form of support for exception handling that is based on the impression of unwind and protect. GNU Octave is accessible online at GNU.org/Software/Octave. KNIME: KNIME is a platform that emphases on integration, reporting plus data analytics also recognized as the Konstanz Information Miner. Also, it uses different constituents of various other jobs including data mining plus machine learning over a unique pipeline founded on modular data. It also claims a graphical user interface which creates making the nodes that are then used in data preprocessing a much less complex process. Though it has been growing in the past decade KNIME is mostly used in the medical field and can also be useful while it comes to studying financial statistics, business intelligence plus customer data. KNIME is founded on the Java platform using Eclipse. And it also utilizes an addition in an effort to add plugins for a better array of functionality than what the base program could offer. When it comes to data visualization, data

transformation, data analysis, database management plus data integration the free version comprises more than 200 modules that of choices. You can even use KNIME reports to spontaneously generate a wide diversity of clear and edifying charts as well Using the additional extension obtainable for free to design reports. KINE can be found online, at KNIME.org. Open NM: OPEN NM for short is a C++ software library plus is authorized below the GNU Lesser General Public Certificate which means it is free toward those who use it in worthy faith. Overall, it can be supposed of as a software package covering a general resolve artificial intelligence system. This software is useful in that with the objective of increasing supervised learning, it combines several layers of processing units in a nonlinear style. Furthermore, its exclusive architecture allows it to work by neural networks which contain possessions of universal approximation. Furthermore, it permits for programing through multiprocessing means for example OpenMP for those interested in growing computational performance. OpenNN also usages data mining algorithms that come bundled as function. Using the including interface for application programming these can then be ultimately embedded in other tools and software of your selecting. This makes it much easier than it might or else be to find novel ways to integrate predictive study into even more jobs when used correctly. Take note, it does not offer a customary graphical interface however some visualization riggings do support it. OpenNN is accessible online, for free, at OpenNN.net Orange Software: Orange Software is data mining software that is written in the prevalent Python programming language. It has a front in with a visual constituent which makes added types of visualization much calmer as it does with program plus data analysis unlike some of the free options accessible. It also doubles as a Python library for those who are in to that kind of thing. This program was shaped by the University of Ljubljana plus their Bioinformatics

Laboratory of the Faculty of Computer plus Information Sciences. The individual components available in Orange Software can be used for almost everything from basic data visualization toward selecting subsets to empirical assessment, predictive modeling, preprocessing, algorithm generation and are individually mentioned to as widgets. Furthermore, visual programing can be effectively implemented through an interface. This allows users to directly generate workflows quickly plus easily by linking together a predefined collection of widgets. As they are still free to alter widgets plus manipulate data through Python this doesn’t stymie forward-thinking users. The latest form of Orange Software includes numerous core components that were shaped in C++ with wrappers that were then shaped using Python. The installation defaults for the software comprise regression, visualization, classification, widget sets for supervised plus unsupervised data analysis, and data collection as well as other numerous algorithms plus preprocessing tools. Orange Software is obtainable online, for free, at Orange.biolab.si. R: programing language recognized simply as R is a great environment for software connected to statistical computing in addition to visualization options. It is commonly used by data miners plus statisticians who are working on the formation of statistical programs otherwise data analysis and was created by the R Foundation for Statistical Computing. When it comes to using numerous methods, be they graphical or statistical, R and the libraries that it is connected to are useful. These comprise things like linear modeling, nonlinear modeling, clustering, classification time series analysis, and various statically tests. It can create interactive graphics plus graphs that are publication quality without any additional tweaking required, Additional strengths comprise things like its aptitude to generate static graphics.

Scikit-Learn: Scikit-Learn is a different type of software-based mechanism library printed in Python. It also features diverse types of regression clustering, and classification algorithms counting several that are less communal in the open source space. Algorithms comprise DBSCAN, k-means, random forests gradient boosting, and support vector machineries. It also works flawlessly through the SciPy and NumPy Python libraries which shield scientific libraries plus numerical libraries correspondingly. Though the code base was later rewritten through other developers, ScikitLearn was shaped by David Cournapeau as portion of the Google Summer of Code. starting in 2015 it is once again below active development which creates it a great choice for those who do not care for the other options in the open source space that are in this list While it was not vigorously being iterated upon for numerous years. Scikit-Learn could be found online, for free at Scikit-Learn.org. WEKA: WEKA, for short is a software set that was written in Java through the University of Waikato in New Zealand. WEKA has manifold user interfaces that provide access toward a wide diversity of options unlike several open source analysis platforms. The preprocess panel permits users to import data from databases plus filtering data over the use of predetermined algorithms. It also makes it likely to delete attributes otherwise instances founded on predefined criteria. Additionally, the classify panel determine the correctness of the models that are shaped through the process and makes it easy for users toward use either regression otherwise classification algorithms based on diverse data sets. The associate panel is valuable for users who are fascinated in gaining access toward numerous algorithms that are useful while it comes to determining the

numerous relationships that certain data points have through one another. The cluster panel is valuable for those looking for more options while it comes to the intensification algorithm if you are interested in finding the normal distributions combination and optimizing clustering effectively and comprises the k-means algorithm. Lastly, the select attributes panel would allow users toward access even additional algorithms; this time they will be connected to various analytical attributes that might be found in an assumed dataset. The visualize panel is valuable for those who are interested in making scatter plot matrixes which create it easy for scatter plots toward be analyzed additional based on the info they provide.

Chapter 6 Mistakes to avoid Choosing a data ware house based on what you requisite in the moment: When it comes to selecting the data warehouse that is correct for you it could be easy to not think about the long tenure. You are going to be cursing the limits you accidentally enforced on yourself as making longstanding data decision with merely the short term in mind is an easy method to ensure that a few years

down the line. This means that if you want the future in your approval, you will want toward look five, years down the road in terms of wherever you want your firm to be at that time. Don’t forget to focus as much on the business strategy and the technical features you would be using. Considering metadata as a reflection: Failing to make the correct Meta data selections early on can have long reaching plus disastrous implications in the long term while it might seem like somewhat you can easily go back plus tinker with at a later date. You should consider it to be the main integrator while it comes to making different kinds of data models play good with one another In place of thinking of it as an afterthought. This means while making your choices and moreover document everything along the way you would want to consider longstanding data requirement. You will want to start by confirming that every column and table that you generate has its own descriptions in addition to key phrases to do this properly. By simply picking a tagging and naming convention early and sticking with it through your time by the data warehouse it is astonishing how many problems you can rapidly and easily solve in the long term. Underestimate the usefulness of ad hoc querying: For lack of a better word, simple, in reality it could easily expand significantly, causing bandwidth costs toward skyrocket and productivity to reduction overall due to the additional strain While generating a simple report appears as though it should be. By simply relying on the metadata layer to

make the reports in its place this issue could be avoided. Without directly affecting the secure plus sanctified nature of the original data this will make things go much more effortlessly. This can also make it easier for less familiar users to get wherever they need to and interrelate with the system appropriately. It can make it easier for it to gain a wider quantity of buy in from main individuals. Letting for supersede function: you may be desirous to pick somewhat that is visually appealing, without actually considering the long-term implications of whatever that decision can mean While it comes to deciding on a reporting layer for the data warehouse that you are eventually going to, positively, be using for years to come. Specifically, it is significant that you don’t select a visual presentation style. It causes the system to run sluggish than it otherwise would. Take a minute to ponder how often this system is going to be used plus by how many people to understand why even a five second difference in load time could be an enormous time waster. It won’t take long for five seconds toward become hours, if not days with so numerous multipliers. Speedier systems will furthermore make it more likely for various group members to actively partake with the system because its ease of use will be greater, if not necessarily prettier In addition to saving you and your team time. Additionally, the faster that data can be generated, the more probable it is to remain true to the source. It will ensure you have a higher general quality of data to work through than you or else might. Focusing on cleaning data after it has already been stored: You are going to want to confirm any data that you really put into it is going to

be as precise and as clean as possible once your data warehouse is up plus running. Team members are much more likely to actually fill out pertinent information in regards to themes the business cares about, as well as notice errors immediately as opposed to at a later date there are multiple reasons that this is the right choice. Furthermore, it is out of mind as well, is just as true with data by way of it is with anything else the old adage that once somewhat is out of sight. It is much easier to overlook about it and move on to another job even if team members have the finest intentions when it comes to data, afterward it has been filed away. if too much unfiltered data gets over then the data as a entire is much more likely to find differences when compared with what really happened While the occasional piece of unfiltered data slipping through is not going to affect stuffs all that much,. Recall, the longer that a warehouse continues to gather data, the larger the number of incorrect pieces of info it has collected will be. This doesn’t essentially mean that there is anything incorrect with the current scheme. That does not mean you want to add additional issues to pile. When it comes to error-ridden data is to comprise some sort of active observing system in the data managing system that you usage. Quality control can furthermore be governed through common usage. The data coherence plan that your business by now has in place. Allowing data warehouse tasks to be purely the concern of those in IT: It is important that they do not control the total of the project when it comes to implementing your own data warehouse, while those in the IT department are probable going to be a great source of lawful knowledge. Because a crucial feature for your group is especially hard to use, they should be consulted while

it comes to key features to confirm that the system is not lifeless in the water inside six months. Treating only certain types of data as relevant: It can be easy to envision Big Data as just that, big, plus uniform, and capable to fit easily into a one-size-fits-all bin While you are first starting out. The realism is that Big Data really comes in three primary groups as well as diversity of shapes and sizes. Each of these crowds is especially pertinent to a certain segment of business. You will need to distinguish how you plan on using all of them if you hope toward create an effective management system. Data could be unstructured, it is likely audio images, or video, and however simple text can furthermore be unstructured. Instead, data can be structured which comprises thinks like mathematical models, actuarial modes, risk models, financial models, sensor data plus machine data. Lastly, in-between is the semi-structured data which comprises things similar software modules, earnings reports spreadsheets, and emails. Focusing on data quantity over quality: With Big Data on everyone’s minds and lips, it can be easy to get so focused on obtaining as much information as possible that quantity of data ends up mattering more than the quality of the data that you collect. This is a big mistake, however, as data of a low quality, even if it has been cleaned can still skew analytics in unwanted ways. This means that not only is it important to know how to process Big Data effectively and to seek out the most useful and relevant information possible at each and every opportunity; it is also extremely important to understand how to improve the quality of data that is being collected in the first place. When it comes to unstructured data, it is important to note that if you are

interested in improving the quality of the data that is being collected, this should be done by improving the libraries that are used for language correction prior to the data being uploaded to the warehouse. If translation is required, then it is best to have a human hand in that process as the finer points of translation are still lost in most cases on automatic translation programs. When it comes to semi-structured data with either numeric or text values, it is important to run it through the same process of correction that you would more traditional text files. Additionally, you are going to need to plan for lots of user input to make sure the data comes out the other side in its most useful and accurate state. Structured data should generally be in a useful state already and should not require further effort. Failing to think about granularity: Once again, because of its lumbering nature, it is easy for those who are just getting started with analytics to create a data warehouse without taking into account the level of granularity that will ultimately be required from the data in question. While you won’t be able to determine exactly how granular you are going to need to go up-front, you are going to want to be aware of the fact that it will eventually be required and plan for it in the construction phase. Failing to do so can leave you with an inability to process the relevant metrics that you are interested in, as well as leaving you foggy on their relevant hierarchies and related metrics. The situation can grow out of control extremely quickly, especially when you are working with either semi-structured or text-based data. As these are the two types of data that you are going to come into contact with the most, it is definitely worth giving some early consideration. Contextualizing incorrectly: While it is important to contextualize the data that you collect, it is going to be equally important to contextualize it in the

right way for later use. Not only will this make the data more useful in the long term when the original details have largely faded away, it will also reduce the potential risk of inaccuracy that brings with it an increased chance of skewed analytics further down the line. Especially, dealing with text that comes from multiple businesses or multiple disciplines, it is important that you always plan to have a knowledgeable human on hand to ensure that this information makes it into the data warehouse in a manner that is not only useful, but also as accurate as possible. Finally, contextualizing is important because, when tagged correctly, it can make it easier to pepper the database with additional interconnected topics.

Conclusion Let’s hope it was informative and able to provide you with all of the tools you need to achieve your goals both in the near term and for the months and years ahead. Remember, just because you’ve finished this book doesn’t mean there is nothing left to learn on the topic. Becoming an expert at something is a marathon, not a sprint – slow and steady wins the race.