Importance Of Using Computers For Finding Patterns In A Dataset
Section 1
SECTION 1
To conduct any kind of study, data is necessary. These data can be qualitative and quantitative in nature. Qualitative data is the of data that contains certain characteristics about the factor of interest. These factors are known as variables whose values changes from time to time depending on the situation. Similarly, quantitative data is the type of data contains numerical values of a variable. A variable that contains qualitative data is known as qualitative data and that contains quantitative data is known as quantitative variable. Example of a qualitative variable can be educational level of people and an example of quantitative variable can be age of the people.
Quantitative variables can be of two types, discrete variables and continuous variables. Variables whose values are countable are discrete variables and whose values are uncountable are continuous variables. Example of discrete variables can be age of the people and continuous variable can be weight of people.
Dataset is usually known as a collection of information or data on some variables of interest. Thus, datasets can be summarized in various different types of ways depending on the type of the variables that are involved in the dataset. For example, Comparison of two qualitative variables can be made by evaluating the proportions of the qualities in the different groups and comparing them. A qualitative and a quantitative variable can be summarized and compared by evaluating the means of the different groups of qualities and comparing them. The third type is the comparison of two quantitative variables which can be compared by introducing a scatterplot which shows the nature of the relationship between two quantitative variables.
These comparisons are usually done with the help of some computing softwares as that makes the process faster and requires less effort.
SECTION 2
a)
When distance increase selling price decreases. The relationship between distance and selling price is
y=0.2004x+20018, where x is the distance travelled and y is the selling price.
b) The estimated selling price of a car that has travelled 30,000 km = 0.2004 * 30000 + 20018 = $14,006.
c) For all the 10,000 estimates,
Average = 14000
Standard Deviation = 392
Section 2
Required zscore = (14006 – 14000) / 392 = 0.02
d) P (Z < 0.02) = 0.50798
e)For sample 231,
Expected Rank = P (Z < zscore) * 10000 = 0.50798 * 10000 = 5080
SECTION 3
a)
which sample ? 
231 

Count of Which version ? (A or B) 
Column Labels 


Row Labels 
n 
y 
Grand Total 
A 
3 
90 
93 
B 
24 
94 
118 
Grand Total 
27 
184 
211 
which sample ? 
231 

Count of Which version ? (A or B) 
Column Labels 


Row Labels 
n 
y 
Grand Total 
A 
3.23% 
96.77% 
100.00% 
B 
20.34% 
79.66% 
100.00% 
Grand Total 
12.80% 
87.20% 
100.00% 
b)
c) Version A is much more preferred than version B by most of the people.
d)For the selected sample 231, the estimated difference in proportion of preference = (0.9677 – 0.7966) = 0.171
 Total number of samples = 1000
Selected sample number = 231
Average = 0.1
Standard deviation = 0.0505
Zscore = (0.171 – 0.1) / 0.0505 = 1.41
 P (Z < 1.41) = 0.9207
 Expected rank for sample 231 = P (Z < zscore) * 1000 = 0.9207 * 1000 = 921
e)
 Let p_{1}be the proportion of people who prefer version A and p_{2} be the proportion of people preferring version B. Therefore,
H_{0}: p_{1 }– p_{2} = 0
H_{1}: p_{1 }– p_{2} ≠ 0
 The required pvalue is 0.0002
 From the pvalue it can be said that H_{0}is rejected.
 The two proportions are statistically significant and thus, the proportions are not equal to each other.
SECTION 4
a)
which sample? 
231 

Row Labels 
Count of which machine? (A or B) 
Average of $ Casino profit from bet 
StdDev of $ Casino profit from bet 
A 
103 
0.184466019 
4.519560045 
B 
97 
0.164948454 
1.351545507 
Grand Total 
200 
0.175 
3.369143905 
b) The average profit from Casino A is $0.18 and the average profit from Casino B is $0.16 but the variation of profit from Casino A is much high ($4.51) and the variation of profit from Casino B $1.35, which is less than Casino A. Thus, Casino A is much reliable than Casino A in terms of profit as the profit is more probable than Casino A though the average profit of Casino A is higher.
c)
 For the selected sample 231, the estimated difference in sample means = (0.18 – 0.16) = 0.02
 Total number of samples = 1000
Selected sample number = 231
Average = 0.4
Standard deviation = 0.46
Zscore = (0.02 – 0.4) / 0.46 = 0.83
 P (Z < 0.83) = 0.2032
 Expected rank for sample 231 = P (Z < zscore) * 1000 = 0.2032 * 1000 = 203
d)
 Let µ_{1}be the mean profit from Casino A and µ_{2} be the mean profit from Casino B. Therefore,
H_{0}: µ_{1 }– µ_{ 2} = 0
H_{1}: µ_{ 1 }– µ_{ 2} ≠ 0
 The required pvalue is 0.97
 From the pvalue it can be said that H_{0}is accepted.
 The average profits from Casino A and Casino B are equal.
SECTION 5
The back to back histogram shows the number of students and the number of administrators that are involved in the distinctive categories in the university. Thus, it can be said that a back to back histogram is used to compare two categorical variables.
In any business the performance measures of the employees can be measured with the help of a back to back histogram.
SECTION 6
a)
sample 
231 

Column Labels 


no 
yes 
Grand Total 

Count of do you support proposed change? 
86 
121 
207 
sample 
231 

Column Labels 


no 
yes 
Grand Total 

Count of do you support proposed change? 
0.415458937 
0.584541063 
1 
b)
Sample Number = 231
Sample Size = 207
Number of people supporting the change = 121
Required proportion = (121/207) = 0.58
c)
 Total number of samples = 1000
Average = 0.6
Standard Deviation = 0.0357
zscore = (0.58 – 0.6) / 0.0357 = 0.56
2.P (Z < 0.56) = 0.2877
3.Expected rank for sample 231 = P (Z < zscore) * 1000 = 0.2877 * 1000 = 288
d)
The required 95% confidence interval for the proportion = (0.5128, 0.6472).