ITDS Final
赶紧毕业吧
2024-12-10
Missing Values
Explain how would you pre-process the data if you would like to use linear classification/regression methods and the data would contain only categorical/nominal attributes. What could we do with missing values in this case?
Evaluation
Explain how internal and external evaluation of clusters work.
Regularized LR
Write down the objective function for regularized linear regression. Explain, under which values (high or small) of the regularization hyper-parameter, the resulting model will overfit or underfit.
Error
What is the relation between the prediction error on the test set and the model complexity?
Binary
What is a possible way to classify color images of animals to three different classes using binary classification methods? How would you represent the data? How would you do cross-validation in this case (i.e. how would you select the folds)?
2024-12-27
Metric
Verify if $\mathbf{d(x,y)} = \max(|x - y|, 1)$ a distance measure for two binary strings $x$ and $y$ of equal length satisfies the properties of a metric.
验证 $\mathbf{d(x,y)} = \max(|x - y|, 1)$ 是否满足作为两个等长二进制字符串 $x$ 和 $y$ 的距离度量的性质。
Linkage
Provide a scenario or a dataset where complete linkage clustering would be less effective and justify your reasoning.
Equation
Given the following equation: $\sum_{i=1}^{n}(y_i - \hat{y}i)^2 + \lambda \sum{j=1}^{p}\beta_j^2$, what do the components of this equation represent? Discuss the impact of using very small and very large values of $\lambda$.
给定以下方程:$ \sum_{i=1}^{n}(y_i - \hat{y}i)^2 + \lambda \sum{j=1}^{p}\beta_j^2 $,该方程的各部分代表什么?讨论使用非常小和非常大的 $\lambda$ 值的影响。
Multi-Class
How can logistic regression be modified to perform multi-class classification?
k-Fold
Explain the concept of k-fold cross-validation and describe the steps involved in performing it.