Extra Activity 3:-
- Here is the Information of this Activity>>
1. You have the following two options to consider for the dataset:
- Use any of the data from the sheet for the experiment. (The sheet contains the credit rating dataset of different corporate bonds and the ratings assigned to them. Each row represents a bond and the rating given by the credit rating agency).
Note: The dataset is synthetically generated and scaled.
- Collect your own data comprising 1000 rows and 5 to 6 columns with column data type similar to the above given data.
2.
Transfer the data from the source mentioned above into a Google Spreadsheet and mention the name of the used data (e.g. Dataset1) from the sheet. You can either embed the spreadsheet directly into your Student Portfolio page or provide a link to access it.
Create a google doc file and execute tasks (3) and (4) within it.
3. (i) Select three numerical variables, such as X1, X2, and X3 from your dataset. Create scatterplots to visualize the relationships between these variable pairs. For instance, generate scatterplots for X1 vs. X2, X2 vs. X3, and so on.
Additionally, provide your insights regarding the observed relationships between these variable pairs based on the scatterplot visualizations.
(ii) Calculate the covariance for the chosen pairs of numerical variables, such as Cov(X1, X2), Cov(X2, X3), and Cov(X1, X3). Then, analyze and interpret the relationships between these variables based on the computed covariance values.
4. (i) Calculate the mean and standard deviation for the three chosen numerical variables. Afterward, provide an interpretation of the results to better understand the characteristics of these variables.
(ii) Formulate a statement to find a suitable bound (both an upper bound and a lower bound) by applying Chebyshev's inequality. Subsequently, give an interpretation of this result to understand the significance of the bound in terms of the variable's behavior and variability.
Example: Suppose that it is known that the number of items produced in a factory during a week is a random variable with mean 50 and variance 25. Then, by using the Chebyshev’s inequality, you can be at least 75% sure that this week’s production will be between 40 and 60 or the probability that this week’s production will be between 40 and 60 is at least 0.75.
And, the probability that this week’s production will be at least 60 or at most 40 is at most 0.25.
Note: Please perform 4(ii) for the selected three numerical variables. Please refer the sample solution doc file for the above activity.
No | Criteria | Weight | Criteria Description | Absent (0 point) | Sufficient (1 points) | Exemplary (2 points) |
1. | Presence of data and analysis document | 5 | The URL of the Google sheet and Google doc file or embedded Google sheet and doc file are present under the section “Extra Activity 3” in the Statistics 2 | Either data is not present or an appropriate data is not shared | One of the data from the given sheet has been taken along with Google doc file | The data presented is valid and different from the given example along with the Google doc file |
2. | Presence of scatter plots and Interpretation about each plot | 10 | The scatter plots for each selected numerical variable is present and Interpretations about each plot is given in Google doc file | Plots and interpretations are absent | Either the scatter plots (ot all three) are present without their interpretations or vice versa. | Required (at least three) scatter plots and their interpretations are present |
3. | Computation of Covariance and Interpretation of the result | 10 | Covariance for each pair of the selected variables is computed and Interpretations about each obtained value is given in Google doc file | Calculations and interpretations of covariance (for at least three pairs of variables) are absent | Only computation of covariances are present without their interpretations | Computation of covariances and their interpretations are present for at least three pairs of variables |
4. | Calculations of mean and variances of each variable and interpretation of the result | 5 | Mean and variance for each selected numerical variable is calculated and interpretation about each calculated mean and variance is given | Calculations and their interpretations for mean and standard deviation are absent | Only calculation of mean and standard deviation (for at least three variables)are present Calculations and interpretations for mean and variances are present for less than three variables. | Calculation and their interpretation of means and standard deviations are present for at least three variables. |
5. | A lower and an upper bounds and their interpretations are present. | 20 | Find an upper bound and a lower bound by formulating a statement for each selected variable and interpretation about each obtained bound. | None of the bounds and their interpretations are present | Only the calculations of bounds (for at least three variables)are present | Calculations of bounds and their interpretations (for at least three variables) are present. |
Submission due date-19th Feb 2025
Peer review due date-26th Feb 2025
- Here we Recommend Video :