How Does Explanatory Data Analysis help Statisticians?
- Undercover the underlying data structure
- Maximize insight into a set of data
- Detect anomalies and outliers
- Extract important variables
- Create parsimonious models
- Test underlying assumptions
- Determine optimal factor settings
- Testing hypotheses
- Estimating parameters
Tools and techniques used in exploratory data analysis
- Box and whisker plots: In a box and whisker plot, you draw a box covering half of the data sample. You also draw another line at the median. Lastly, to complete this plot, draw whiskers from the original box to the largest and smallest data values.
- Stem and leaf display: This technique takes each value of data and divides it, or rather categorizes it into two classes: stem and leaf. A stem and leaf display is preferred to other data display techniques like bar charts because the values of data can be easily recovered.
- Rootogram: A rootogram looks exactly like a histogram except that it displays the square roots of the total number of observations made when analyzing a quantitative variable in different ranges. This display is usually used together with a fitted distribution. The purpose of the square roots is to make the deviations between the curve and the bars equal.
- Medium polish: This technique is used in two-way tables to construct data models. Each model created from the data represents the contents of each cell, which can be a column effect, row effect, a common value, or a residue. Although the medium polish technique is similar to two-way ANOVA, the constructs of the models are determined using a median rather than the mean.
Functions performed using exploratory data analysis
- Clustering and dimension reduction help you create graphical patterns and displays of high-dimensional information and data containing various variables.
- Univariate visualization and analysis of each field present in a raw set of data
- Summary statistics and bivariate visualizations that allow analysis to examine the relationship between variables in a set of data and the target variable
- Multivariate visualizations to understand and map interactions between various fields in a dataset
- K-Means clustering, which involves creating “centers” for different clusters using the nearest mean
- Predictive modeling
Exploratory data smoothing
- Resistant time series smoothing: This method is used by data scientists to better identify trends and patterns in time series. There are a number of nonlinear smoothers that can be used to “smooth” sequential time series data. They include 3RSSH, 3RSS, 3RSR, 5RSSH, and 5RSS. These are applied in the first step of exploratory data analysis to minimize the influence of potential outliers before applying a moving average.
- Scatterplot smoothing: In scatter diagram smoothing, the data analyst draws a smooth curve on the scatterplot to summarize an existing relationship between variables. The curve is drawn in a manner that makes assumptions about the strength or form of the relationship. Scatterplot smoothing is similar to nonparametric regression in that it displays the connection between a response variable and a predictor variable in a manner that makes assumptions about the type of the connection. There are several methods to smooth a scatter diagram including:
- Running means
- Running lines
- Locally weighted scatterplot smoothing (LOWESS)
College students pursuing statistics courses sometimes find difficulty dealing with projects assigned on this topic and sometimes search the web for professional aid. If this sounds like you and would like someone to hold your hand in any topic related to exploratory data analysis, consider taking exploratory data analysis homework help from our experts.