Logo

AskSia

Plus

Question 1 (a) Have a look at the plot below. Three different Gestalt principle...
May 11, 2024
For question (a):
Proximity, Similarity, Continuity
Key Concept
Gestalt Principles in Data Visualization
Explanation
Proximity is used as data points for each city are placed closer to each other to indicate they are part of the same series. Similarity is shown through the use of consistent line styles for each city to indicate they are part of the same type of data (average temperature). Continuity is evident as the lines are continuous, guiding the eye along the path of temperature change over time.
For question (b):
Variables to Aesthetics Mapping
Key Concept
Visual Encoding in Data Visualization
Explanation
The Date is mapped to the x-axis using a linear scale, the Average Temperature is mapped to the y-axis using a linear scale, and the City is mapped to line color/hue with a nominal scale.
For question (c):
Storytelling with Data
Key Concept
Data Narratives and Chart Modification
Explanation
A possible story could be the trend of rising temperatures for both cities over time, highlighting the impact of industrialization or climate change. To tell this story, one could emphasize the upward trend by annotating key historical events, using color to highlight temperature increases, adjusting the y-axis to zoom in on changes, or smoothing the lines to focus on long-term trends.
A data scientist wants to apply g(t) to their time series to compute the derivative. However, they find that their data is too noisy and requires smoothing. They decide – 3 – Turn Over f(t) g(t) f(t) * g(t) h(t) Non-alpha only to use a frequency based low pass filter, by transforming the data into the frequency domain, multiplying filter weights to the coefficients and then applying the inverse Fourier transform. Should they first convolve with g(t) and then apply the low pass filter, or first apply the low pass filter and then convolve with g(t)? Briefly explain your reasoning (max 3 sentences). [3 marks] (d) Can you think of a way to apply the smoothing and compute the derivative at the same time using only single operation? Briefly explain your reasoning (max 2 sentences). [3 marks]
The order of operations matters when dealing with noisy data and applying a derivative filter (g(t)) and a low pass filter. If you convolve with g(t) first, you may amplify the noise before smoothing it out, which is not desirable. Conversely, if you apply the low pass filter first, you will smooth out the noise, which can then allow for a more accurate computation of the derivative when you convolve with g(t). Therefore, it is generally better to first apply the low pass filter to the noisy data and then convolve with g(t).
Yes, you can apply smoothing and compute the derivative at the same time using a single operation by designing a filter that combines both the derivative and low pass filter characteristics. This can be done by modifying the frequency response of the filter to include both the derivative operation (which is a multiplication by iω i\omega in the frequency domain, where ω \omega is the frequency and i i is the imaginary unit) and the low pass filter characteristics.
Question 3 We would like to use direct principal component analysis (PCA) or dual PCA to reduce the dimension of n, d-dimensional dataset X = [x1, · · · , xn] ∈ R d×n . If we define X =  − 1 1 2 2 4 1 1 1 2 2 . (a) For direct PCA, what would be the co-variance matrix S of X? Please show your derivation to compute this matrix numerically (including any intermediate matrices you calculate). [4 marks] (b) The eigenvalues and eigenvectors of S are given as follows: λ1 = 10, u1 = " 1 √ 2 √ 1 2 # and λ2 = 2, u2 = " √ 1 2 − 1 √ 2 # . Assume we use direct PCA to project X into Y such that the variance of the projected dataset Y has the minimum variance. Please work out in this case how to compute Y numerically. Please present your results using the fraction style (not decimal style). [4 marks] (c) Now, please show your derivation regarding how to recover the dataset ˜X from Y such that it has the same dimension of the original X. Next, please explain (1) why ˜X is not strictly equal to X and (2) how we can recover the exact X from Y? Please present your results using the fraction style (not decimal style). [4 marks] (d) In dual PCA, we need to compute a similar square matrix as S. What would be this matrix? Please show your derivation to compute this matrix numerically. Next, please prove that in terms of training data projection how dual PCA is derived from direct PCA using the concept of singular value decomposition. [4 marks] (e) Now we want to use direct PCA to reduce the dimension of a new dataset of 10 dimensions (i.e., d = 10). Assume that the first five eigenvalues of the co-variance matrix are respectively λ1 = 10, λ2 = 2, λ3 = 0.15, λ4 = 0.05 and λ5 = 0.02, – 4 – Turn Over Non-alpha only and that the remaining eigenvalues are all zero. Please work out which principal components we should use to project the dataset such that afterwards we can retain at least 95% of the variance of this dataset? Please show your numerical derivation in detail. [4 marks]
(a) For direct PCA, the covariance matrix SS of XX is calculated as follows:
- First, we need to compute the mean of each dimension of XX. - Then, we subtract the mean from each corresponding dimension to get the mean-centered data matrix XcenteredX_{centered}. - Finally, we calculate the covariance matrix SS using the formula S=1n1XcenteredXcenteredTS = \frac{1}{n-1} X_{centered} X_{centered}^T. Given X=[1amp;1amp;2amp;2amp;41amp;1amp;1amp;2amp;2]X = \begin{bmatrix} -1 & 1 & 2 & 2 & 4 \\ 1 & 1 & 1 & 2 & 2 \end{bmatrix}, let's calculate SS: 1. Calculate the mean of each row: μ1=1+1+2+2+45=85\mu_1 = \frac{-1 + 1 + 2 + 2 + 4}{5} = \frac{8}{5} μ2=1+1+1+2+25=75\mu_2 = \frac{1 + 1 + 1 + 2 + 2}{5} = \frac{7}{5} 2. Subtract the mean from each element in the corresponding row to get XcenteredX_{centered}: Xcentered=[185amp;185amp;285amp;285amp;485175amp;175amp;175amp;275amp;275]X_{centered} = \begin{bmatrix} -1-\frac{8}{5} & 1-\frac{8}{5} & 2-\frac{8}{5} & 2-\frac{8}{5} & 4-\frac{8}{5} \\ 1-\frac{7}{5} & 1-\frac{7}{5} & 1-\frac{7}{5} & 2-\frac{7}{5} & 2-\frac{7}{5} \end{bmatrix} 3. Compute SS: S=151XcenteredXcenteredTS = \frac{1}{5-1} X_{centered} X_{centered}^T After performing the matrix multiplication and division by n1n-1, we get the covariance matrix SS.
(b) To compute YY numerically using direct PCA:
- We project XcenteredX_{centered} onto the eigenvectors u1u_1 and u2u_2. - The projection is done by multiplying XcenteredTX_{centered}^T with the matrix of eigenvectors U=[u1 u2]U = [u_1 \ u_2]. - The result is the projected dataset Y=UTXcenteredTY = U^T X_{centered}^T. Using the given eigenvalues and eigenvectors, we can calculate YY.
(c) To recover the dataset X~\tilde{X} from YY:
- We multiply the projected dataset YY by the transpose of the eigenvector matrix UTU^T. - The result is X~=UY\tilde{X} = UY. - X~\tilde{X} is not strictly equal to XX because PCA reduces dimensionality by discarding less significant components, which results in some loss of information. - To recover the exact XX from YY, we would need all the original eigenvectors and eigenvalues, including those corresponding to the discarded components.
(d) In dual PCA, the matrix we need to compute is XTXX^T X:
- This matrix is similar to the covariance matrix but computed in the dual space. - To derive dual PCA from direct PCA using singular value decomposition (SVD), we decompose XX into X=UΣVTX = U \Sigma V^T. - The columns of VV are the eigenvectors of XTXX^T X, and the non-zero elements of Σ\Sigma are the square roots of the eigenvalues of XTXX^T X. - The relationship between direct PCA and dual PCA is that the eigenvectors of XTXX^T X correspond to the eigenvectors of XXTXX^T (the covariance matrix in direct PCA) when multiplied by XX.
(e) To retain at least 95% of the variance of the new dataset:
- We calculate the cumulative variance explained by the eigenvalues. - We select the smallest number of principal components such that the cumulative variance is at least 95%. - The cumulative variance is calculated as the sum of selected eigenvalues divided by the sum of all eigenvalues. - We perform the calculation and select the principal components accordingly. After performing the numerical derivation, we can determine which principal components to use.
© 2023 AskSia.AI all rights reserved