Subject: Multivariate Data Analysis

Scientific Area:

Mathematics

Workload:

80 Hours

Number of ECTS:

7,5 ECTS

Language:

Portuguese

Overall objectives:

1 - The aim of this CU is to provide an extended overview of the more usual statistical methodologies to perform a multivariate data analysis , in order to raise awareness of the importance on a multivariate approach whenever there is a big amount of statistical data available.
2 - To use a statistical package (SPSS) to perform a multivariate statistical analysis to data bases supplied in classes.
3 - Provide the students with the statistical technics to deal with data bases built after research projects in very diverse areas of science. The theoretical works will make the student to understand the capacity acquired to perform research, to interpret, to model and communicate results of the Multivariate Analysis of Statistical Data in an effective way.

Syllabus:

1 - Two-way Analysis of Variance. Fixed and random effects models. Decomposition of the sum of squares, tests for model parameters, unbalanced samples. Testing model adequacy. Choosing a variance-stabilizing transformation. Multiple comparisons in the two-way analysis of variance. Repeated measures model and nested classification model (mixed model).
2 - Multiple Linear Regression. The general linear model; underlying hypotheses; parameter estimation. Variable selection techniques; residual analysis. Model fit. Tests and confidence intervals for the model coefficients. Qualitative independent variables. Standardized regression coefficients. Precision of the regression equation. Partial correlation coefficients and partial F-tests. Choosing the best regression model. Outliers and influential points.
3 - Logistic Regression. Some basic notions of Epidemiology. The simple model and the multiple model. Reference to polychotomous logistic regression. Odds ratio and relative risk; logit. Model fit, tests and confidence intervals for the model parameters. Interactions in the logistic regression model. Influence diagnostics.
4 - Principal Components Analysis. About the calculation of the principal components; principal component scores. The various forms of X'X and the interpretation of the principal components in terms of correlations. Standardized principal components; explained variance; how many principal components to keep; stopping rules. Rotation and interpretation of the principal components.
5 - Cluster Analysis. The probabilistic model. Hierarchical methods and non-hierarchical methods. Proximity measures and distance measures. Cophenetic correlation and validity of the clusters. Choosing the number of clusters. Dendrogram.

Literature/Sources:

Montgomery D. C. , 1997 , Design and Analysis of Experiments. , Wiley
Boniface D. R. , 1995 , Experimental Design and Statistical Methods for Behavioral and Social Research , Chapman and Hall
Cobb G. W. , 1998 , Introduction to Design and Analysis of Experiments , Springer
Draper N. R., Smith H. , 1998 , Applied Regression Analysis , Wiley
Johnson D. E. , 1998 , Applied Multivariate Methods for Data Analysts , Duxbury Press & Brooks/Cole
Mendenhall W., Sincich T. , 2003 , A Second Course in Statistics: Regression Analysis , Prentice & Hall
Hosmer D. W., Lemeshow S. , 1989 , Applied Logistic Regression , Wiley
Mardia K. V., Kent J. T., and Bibby J. M. , 1997 , Multivariate Analysis , Academic Press
Wackerly D. D., Mendenhall W., Scheaffer R. L. , 2002 , Mathematical Statistics with Applications , Duxbury
Field A. , 2011 , Discovering Statistics Using SPSS , Cengage Learning
Marôco J. , 2014 , Análise Estatística com o SPSS Statistics , Report Number
Pallant J. , 2013 , SPSS Survival Manual: A Step by Step Guide to Data Analysis Using IBM SPSS , McGraw-Hill
Pestana M. H., Gageiro J. N. , 2014 , Análise de Dados para as Ciências Sociais - A Complementaridade do SPSS , Edições Sílabo
Zar J. H. , 1999 , Biostatistical Analysis , Prentice Hall International
Scheffé H. , 1959 , The Analysis of Variance , John Wiley
Casella G., Berger R. L. , 1990 , Statistical Inference , Duxbury
Rohatgi V. K. , 1976 , An Introduction to Probability Theory and Mathematical Statistics , John Wiley and Sons
Kline P. , 2000 , An Easy Guide to Factor Analysis , Routledge
Everitt B. S., Landau S., Leese M. , 2001 , Cluster Analysis , Arnold
Glantz S. A., Slinker B. K. , 2001 , Primer of Applied Regression and Analysis of Variance , McGraw-Hill
Mead R. , 1992 , The Design of Experimentals: Statistical Principles for Practical Applications , Cambridge University Press
Cochran W. G., Cox G. M. , 1957 , Experimental Designs , John Wiley & Sons
Armitage P., Berry G. , 1994 , Statistical Methods in Medical Research , Blackwell Science
Marôco J. , 2010 , Análise de Equações Estruturais: Fundamentos Teóricos, Software and Aplicações , Report Number

Assesssment methods and criteria:

Classification Type: Quantitativa (0-20)

Evaluation Methodology:
Lectures. Theoretical-practical classes to solve real data problems using the software SPSS. Presentations (by the students). One individual work, with oral presentation and discussion. The theoretical work aims to make the student to realize that he is able to study new subjects by himself and to prepare a communication, to present it and to discuss it. Weight in final grade is 50%. An oral individual exam, theoretical-practical assessment consisting of an application of one or more multivariate data analysis techniques to a database provided to the student. Weight in final grade is 50%.

Subject Leader:

Sílvio Filipe Velosa