Statistic 2 Extra Activities : 👇📌
⭕IIT Madras BS in Data Science✅ :
- Stats 2 Extra Activities
Extra Activity 2:-
- Here is the Information of this activity >>
·
Activity : Perform a Monte Carlo Simulation using Google Colab.
This activity requires you to work on the Python
Notebook that was discussed in Week 0 Part 2 content.
1.
Open the Python Notebook shared with
you in the supplementary content
2. Make a copy this
file into your drive (You are free to organize the python notebook within your
drive)
3. Open the file using Google Colab
4. Go through the code in Google Colab (and also refer to last
video of Week 0 Part 2) to understand how the parameters
can be manipulated in Python Notebook.
5. Select an
experiment and event from various examples seen so far in the lectures (or any
other problem that interests you).
6. Add the selected
experiment and event to the notebook and perform a Monte Carlo simulation to
verify the computed probability.
7. Add the Google
Colab notebook to the site (Make sure that sharing permission is view for
everyone in Onlinedegree domain) under the heading ‘Activity 2’.
Grading criteria:
No. |
Grading
criteria |
Criteria
description |
Score
for target performance |
Score
for absent/Not target performance |
1. |
Presence
of the colab notebook |
The URL of the colab notebook
present under the section “Activity 2” in
the Statistics 2 class page. |
10
Marks |
0 Marks |
2. |
Description |
The details of the simulated
experiment and event added to the Google Sites page. |
10
Marks |
0 Marks |
3. |
Computation |
The calculations
of probabilities for the above events are present in the google colab
notebook(as text). |
30
Marks |
0 Marks |
4. |
Simulation |
Monte Carlo simulation
performed for the above experiment in the Colab notebook. |
30
Marks |
0 Marks |
5. |
Analysis |
The probabilities computed
manually are similar to the value obtained using Monte Carlo simulation. |
20
Marks |
0 Marks |
Add
the marks in (1), (2), (3), (4) and (5) above and provide the total mark as
your review score. If a student gets 0 mark for all the criteria, then
give them the score 1.
- Here we Recommend some videos :
- Here is the Information of this Activity>>
1. You have the
following two options to consider for the dataset:
- Use any of the data from the sheet for the experiment. (The sheet contains the
credit rating dataset of different corporate bonds and the ratings
assigned to them. Each row represents a bond and the rating given by the
credit rating agency).
Note: The dataset is synthetically generated and scaled.
- Collect
your own data comprising 1000 rows and 5 to 6 columns with column data
type similar to the above given data.
2.
Transfer the data from the source mentioned above into a Google Spreadsheet and
mention the name of the used data (e.g. Dataset1) from the sheet. You can
either embed the spreadsheet directly into your Student Portfolio page or
provide a link to access it.
Create
a google doc file and execute tasks (3) and (4) within it.
3.
(i) Select three numerical variables, such as X1, X2, and X3 from your
dataset. Create scatterplots to visualize the relationships between these
variable pairs. For instance, generate scatterplots for X1 vs. X2, X2 vs. X3,
and so on.
Additionally,
provide your insights regarding the observed relationships between these
variable pairs based on the scatterplot visualizations.
(ii) Calculate the
covariance for the chosen pairs of numerical variables, such as Cov(X1, X2),
Cov(X2, X3), and Cov(X1, X3). Then, analyze and interpret the relationships
between these variables based on the computed covariance values.
4.
(i) Calculate the mean and standard deviation for the three chosen
numerical variables. Afterward, provide an interpretation of the results to
better understand the characteristics of these variables.
(ii) Formulate a
statement to find a suitable bound (both an upper bound and a lower bound) by
applying Chebyshev's inequality. Subsequently, give an interpretation of
this result to understand the significance of the bound in terms of the
variable's behavior and variability.
Example: Suppose that it
is known that the number of items produced in a factory during a week is a
random variable with mean 50 and variance 25. Then, by using the Chebyshev’s
inequality, you can be at least 75% sure that this week’s production will be
between 40 and 60 or the probability that this week’s production will be
between 40 and 60 is at least 0.75.
And, the
probability that this week’s production will be at least 60 or at most 40 is at
most 0.25.
Note: Please
perform 4(ii) for the selected three numerical variables. Please refer the sample solution doc file for the above
activity.
No |
Criteria |
Weight |
Criteria
Description |
Absent (0
point) |
Sufficient
(1 points) |
Exemplary
(2 points) |
1. |
Presence
of data and analysis document |
5 |
The URL
of the Google sheet and Google doc file or embedded Google sheet and doc file
are present under the section “Extra Activity 3” in the Statistics 2 |
Either data is not present or
an appropriate data is not shared |
One of
the data from the given sheet has been taken along with Google doc file |
The
data presented is valid and different from the given example along with the
Google doc file |
2. |
Presence
of scatter plots and Interpretation about each plot |
10 |
The
scatter plots for each selected numerical variable is present and Interpretations
about each plot is given in Google doc file |
Plots
and interpretations are absent |
Either
the scatter plots (ot all three) are present without their interpretations or
vice versa. |
Required
(at least three) scatter plots and their interpretations are present |
3. |
Computation
of Covariance and Interpretation
of the result |
10 |
Covariance
for each pair of the selected variables is computed and Interpretations about
each obtained value is given in Google doc file |
Calculations
and interpretations of covariance (for at least three pairs of variables) are
absent |
Only
computation of covariances are present without their interpretations |
Computation
of covariances and their interpretations are present for at least three pairs
of variables |
4. |
Calculations
of mean and variances of each variable and interpretation of the result |
5 |
Mean
and variance for each selected numerical variable is calculated and
interpretation about each calculated mean and variance is given |
Calculations
and their interpretations for mean and standard deviation are absent |
Only
calculation of mean and standard deviation (for at least three variables)are
present Calculations
and interpretations for mean and variances are present for less than three
variables. |
Calculation
and their interpretation of means and standard deviations are present for at
least three variables. |
5. |
A lower
and an upper bounds and their interpretations are present. |
20 |
Find an
upper bound and a lower bound by formulating a statement for each selected
variable and interpretation about each obtained bound. |
None of
the bounds and their interpretations are present
|
Only
the calculations of bounds (for at least three variables)are present |
Calculations of bounds and
their interpretations (for at least three variables) are present. |
Submission due
date-19th Feb 2025
Peer review due date-26th Feb 2025
- Here we Recommend some Videos :
- Here is the Information of this Activity>>
Fitting a
distribution (Discrete / Continuous)
Collect the dataset and try to approximate a variable with the discrete
or continuous random variable.
OR
Model a dataset with the discrete or continuous random variable and
verify it.
Note: An example
submission (for Discrete or Continuous) along with evaluation criteria can be
accessed from the following:
·
Example submission for Discrete Case
·
Example submission for Continuous case
Please check the
above before you start preparing your submission and subsequently evaluate
others. And, give 0 marks if he/she has copied the example submission.
Instruction for submission:
1. You have to create a Google Spreadsheet with the data and Google
document with the calculations (or uploaded images of handwritten calculations)
related to the distribution.
2. Embed/Hyperlink the GSheet, GDoc in your portfolio under the heading
‘Extra activity-4’
3. Make sure that you provide View permission for the Google Site, G-Doc
and G-Sheet for ‘Everyone in Indian Institute of Technology Domain’
Grading Criteria
Criteria |
Weight |
Absent (0 point) |
Sufficient (1 point) |
Exemplary (2 point) |
Presence of data and analysis
document |
5 |
Either data is not present or
an appropriate data is not shared (i.e. peer shared a random image or
file instead of data |
Only data is present without analysis document |
The data presented is valid along with the
analysis document |
Validity of data
Goodness of Fit (calculated PDF
/ PMF using formula and probability obtained from data) |
10
15 |
There is no explanation regarding validity of
data There are no calculations available |
There is description and mention of why data is
valid
The calculations are provided however the two
values are not similar. |
There is a detailed description
of why the data is valid with additional insights The calculations are shown and
values are approximately similar |
Appropriateness of steps used in calculations |
15 |
The steps of calculation are not shown |
Some steps are shown, but are not sufficient to
calculate a valid mode |
The submission shows sufficient steps to
calculate a valid model of the data |
Alternative model (to see if there is a
better fit) |
5 |
There are no alternative models provided, even if
it is possible to have at least one alternative model. |
Minimum one alternative model(if possible) has
been suggested. However, detailed calculation for alternative models is not
provided to compare with initial fit. |
Minimum one alternative model has been
suggested with detailed calculations. Comparisons of the fit (with
earlier model(s)) is also done. |
Important
Dates
- Release of submission link: 25th February, 2025
-
Student Submission due date: 9th March, 2025
- Peer Review due date: 19th March, 2025
- Students can start collecting data and prepare the calculations. The submission only requires them to share their Google Site (which you have already created).
Note : It is required to do the required number (at least 5) of peer reviews in extra activity. If a student does not complete required number of peer reviews he/she will get 0 marks even if he/she submitted the assignment.
- Here is the Information of this Activity>>
- Here we Recommend some Videos :