COMPS-Question 1(a) Have a look at the plot below.

For question (a):

Proximity, Similarity, Continuity

Key Concept

Gestalt Principles in Data Visualization

Explanation

Proximity is used as data points for each city are placed closer to each other to indicate they are part of the same series. Similarity is shown through the use of consistent line styles for each city to indicate they are part of the same type of data (average temperature). Continuity is evident as the lines are continuous, guiding the eye along the path of temperature change over time.

For question (b):

Variables to Aesthetics Mapping

Key Concept

Visual Encoding in Data Visualization

Explanation

The Date is mapped to the x-axis using a linear scale, the Average Temperature is mapped to the y-axis using a linear scale, and the City is mapped to line color/hue with a nominal scale.

For question (c):

Storytelling with Data

Key Concept

Data Narratives and Chart Modification

Explanation

A possible story could be the trend of rising temperatures for both cities over time, highlighting the impact of industrialization or climate change. To tell this story, one could emphasize the upward trend by annotating key historical events, using color to highlight temperature increases, adjusting the y-axis to zoom in on changes, or smoothing the lines to focus on long-term trends.

A data scientist wants to apply g(t) to their time series to compute the derivative. However, they find that their data is too noisy and requires smoothing. They decide – 3 – Turn Over f(t) g(t) f(t) * g(t) h(t) Non-alpha only to use a frequency based low pass filter, by transforming the data into the frequency domain, multiplying filter weights to the coefficients and then applying the inverse Fourier transform. Should they first convolve with g(t) and then apply the low pass filter, or first apply the low pass filter and then convolve with g(t)? Briefly explain your reasoning (max 3 sentences). [3 marks] (d) Can you think of a way to apply the smoothing and compute the derivative at the same time using only single operation? Briefly explain your reasoning (max 2 sentences). [3 marks]

The order of operations matters when dealing with noisy data and applying a derivative filter (g(t)) and a low pass filter. If you convolve with g(t) first, you may amplify the noise before smoothing it out, which is not desirable. Conversely, if you apply the low pass filter first, you will smooth out the noise, which can then allow for a more accurate computation of the derivative when you convolve with g(t). Therefore, it is generally better to first apply the low pass filter to the noisy data and then convolve with g(t).

Yes, you can apply smoothing and compute the derivative at the same time using a single operation by designing a filter that combines both the derivative and low pass filter characteristics. This can be done by modifying the frequency response of the filter to include both the derivative operation (which is a multiplication by

i\omega

in the frequency domain, where

\omega

is the frequency and

i

is the imaginary unit) and the low pass filter characteristics.

Question 3 We would like to use direct principal component analysis (PCA) or dual PCA to reduce the dimension of n, d-dimensional dataset X = [x1, · · · , xn] ∈ R d×n . If we define X = − 1 1 2 2 4 1 1 1 2 2 . (a) For direct PCA, what would be the co-variance matrix S of X? Please show your derivation to compute this matrix numerically (including any intermediate matrices you calculate). [4 marks] (b) The eigenvalues and eigenvectors of S are given as follows: λ1 = 10, u1 = " 1 √ 2 √ 1 2 # and λ2 = 2, u2 = " √ 1 2 − 1 √ 2 # . Assume we use direct PCA to project X into Y such that the variance of the projected dataset Y has the minimum variance. Please work out in this case how to compute Y numerically. Please present your results using the fraction style (not decimal style). [4 marks] (c) Now, please show your derivation regarding how to recover the dataset ˜X from Y such that it has the same dimension of the original X. Next, please explain (1) why ˜X is not strictly equal to X and (2) how we can recover the exact X from Y? Please present your results using the fraction style (not decimal style). [4 marks] (d) In dual PCA, we need to compute a similar square matrix as S. What would be this matrix? Please show your derivation to compute this matrix numerically. Next, please prove that in terms of training data projection how dual PCA is derived from direct PCA using the concept of singular value decomposition. [4 marks] (e) Now we want to use direct PCA to reduce the dimension of a new dataset of 10 dimensions (i.e., d = 10). Assume that the first five eigenvalues of the co-variance matrix are respectively λ1 = 10, λ2 = 2, λ3 = 0.15, λ4 = 0.05 and λ5 = 0.02, – 4 – Turn Over Non-alpha only and that the remaining eigenvalues are all zero. Please work out which principal components we should use to project the dataset such that afterwards we can retain at least 95% of the variance of this dataset? Please show your numerical derivation in detail. [4 marks]

(a) For direct PCA, the covariance matrix

S

of

X

is calculated as follows:

- First, we need to compute the mean of each dimension of

X

. - Then, we subtract the mean from each corresponding dimension to get the mean-centered data matrix

X_{centered}

. - Finally, we calculate the covariance matrix

S

using the formula

S = \frac{1}{n-1} X_{centered} X_{centered}^T

. Given

X = \begin{bmatrix} -1 &amp; 1 &amp; 2 &amp; 2 &amp; 4 \\ 1 &amp; 1 &amp; 1 &amp; 2 &amp; 2 \end{bmatrix}

, let's calculate

S

: 1. Calculate the mean of each row:

\mu_1 = \frac{-1 + 1 + 2 + 2 + 4}{5} = \frac{8}{5}

\mu_2 = \frac{1 + 1 + 1 + 2 + 2}{5} = \frac{7}{5}

2. Subtract the mean from each element in the corresponding row to get

X_{centered}

:

X_{centered} = \begin{bmatrix} -1-\frac{8}{5} &amp; 1-\frac{8}{5} &amp; 2-\frac{8}{5} &amp; 2-\frac{8}{5} &amp; 4-\frac{8}{5} \\ 1-\frac{7}{5} &amp; 1-\frac{7}{5} &amp; 1-\frac{7}{5} &amp; 2-\frac{7}{5} &amp; 2-\frac{7}{5} \end{bmatrix}

3. Compute

S

:

S = \frac{1}{5-1} X_{centered} X_{centered}^T

After performing the matrix multiplication and division by

n-1

, we get the covariance matrix

S

.

(b) To compute

Y

numerically using direct PCA:

- We project

X_{centered}

onto the eigenvectors

u_1

and

u_2

. - The projection is done by multiplying

X_{centered}^T

with the matrix of eigenvectors

U = [u_1 \ u_2]

. - The result is the projected dataset

Y = U^T X_{centered}^T

. Using the given eigenvalues and eigenvectors, we can calculate

Y

.

(c) To recover the dataset

\tilde{X}

from

Y

:

- We multiply the projected dataset

Y

by the transpose of the eigenvector matrix

U^T

. - The result is

\tilde{X} = UY

. -

\tilde{X}

is not strictly equal to

X

because PCA reduces dimensionality by discarding less significant components, which results in some loss of information. - To recover the exact

X

from

Y

, we would need all the original eigenvectors and eigenvalues, including those corresponding to the discarded components.

(d) In dual PCA, the matrix we need to compute is

X^T X

:

- This matrix is similar to the covariance matrix but computed in the dual space. - To derive dual PCA from direct PCA using singular value decomposition (SVD), we decompose

X

into

X = U \Sigma V^T

. - The columns of

V

are the eigenvectors of

X^T X

, and the non-zero elements of

\Sigma

are the square roots of the eigenvalues of

X^T X

. - The relationship between direct PCA and dual PCA is that the eigenvectors of

X^T X

correspond to the eigenvectors of

XX^T

(the covariance matrix in direct PCA) when multiplied by

X

.

(e) To retain at least 95% of the variance of the new dataset:

- We calculate the cumulative variance explained by the eigenvalues. - We select the smallest number of principal components such that the cumulative variance is at least 95%. - The cumulative variance is calculated as the sum of selected eigenvalues divided by the sum of all eigenvalues. - We perform the calculation and select the principal components accordingly. After performing the numerical derivation, we can determine which principal components to use.

AskSia

Plus