Subject: Multivariate Data Analysis

Scientific Area:

Mathematics

80 Hours

Number of ECTS:

7,5 ECTS

Language:

Portuguese

# Overall objectives:

1 - The aim of this CU is to provide an extended overview of the more usual statistical methodologies to perform a multivariate data analysis , in order to raise awareness of the importance on a multivariate approach whenever there is a big amount of statistical data available.
2 - To use a statistical package (SPSS) to perform a multivariate statistical analysis to data bases supplied in classes.
3 - Provide the students with the statistical technics to deal with data bases built after research projects in very diverse areas of science. The theoretical works will make the student to understand the capacity acquired to perform research, to interpret, to model and communicate results of the Multivariate Analysis of Statistical Data in an effective way.

# Syllabus:

1 - Two-way Analysis of Variance. Fixed and random effects models. Decomposition of the sum of squares, tests for model parameters, unbalanced samples. Testing model adequacy. Choosing a variance-stabilizing transformation. Multiple comparisons in the two-way analysis of variance. Repeated measures model and nested classification model (mixed model).
2 - Multiple Linear Regression. The general linear model; underlying hypotheses; parameter estimation. Variable selection techniques; residual analysis. Model fit. Tests and confidence intervals for the model coefficients. Qualitative independent variables. Standardized regression coefficients. Precision of the regression equation. Partial correlation coefficients and partial F-tests. Choosing the best regression model. Outliers and influential points.
3 - Logistic Regression. Some basic notions of Epidemiology. The simple model and the multiple model. Reference to polychotomous logistic regression. Odds ratio and relative risk; logit. Model fit, tests and confidence intervals for the model parameters. Interactions in the logistic regression model. Influence diagnostics.
4 - Principal Components Analysis. About the calculation of the principal components; principal component scores. The various forms of X'X and the interpretation of the principal components in terms of correlations. Standardized principal components; explained variance; how many principal components to keep; stopping rules. Rotation and interpretation of the principal components.
5 - Cluster Analysis. The probabilistic model. Hierarchical methods and non-hierarchical methods. Proximity measures and distance measures. Cophenetic correlation and validity of the clusters. Choosing the number of clusters. Dendrogram.

# Literature/Sources:

# Assesssment methods and criteria:

Classification Type: Quantitativa (0-20)

Evaluation Methodology:
Lectures. Theoretical-practical classes to solve real data problems using the software SPSS. Presentations (by the students). One individual work, with oral presentation and discussion. The theoretical work aims to make the student to realize that he is able to study new subjects by himself and to prepare a communication, to present it and to discuss it. Weight in final grade is 50%. An oral individual exam, theoretical-practical assessment consisting of an application of one or more multivariate data analysis techniques to a database provided to the student. Weight in final grade is 50%.