class: center, middle, inverse, title-slide # Bases du modèle linéaire ## Chapitre I - Régression linéaire simple ### Antoine Bichat - Émilie Lebarbier
AgroParisTech --- # Chargement des données ```r chenilles <- read.table("chenilles.txt", sep = "", header = TRUE) ``` -- ```r dim(chenilles) ``` ``` [1] 33 12 ``` -- ```r head(chenilles) ``` ``` Altitude Pente NbPins Hauteur Diametre Densite Orient HautMax NbStrat 1 1200 22 1 4.0 14.8 1.0 1.1 5.9 1.4 2 1342 28 8 4.4 18.0 1.5 1.5 6.4 1.7 3 1231 28 5 2.4 7.8 1.3 1.6 4.3 1.5 4 1254 28 18 3.0 9.2 2.3 1.7 6.9 2.3 5 1357 32 7 3.7 10.7 1.4 1.7 6.6 1.8 6 1250 27 1 4.4 14.8 1.0 1.7 5.8 1.3 Melange NbNids LogNids 1 1.4 2.37 0.86289 2 1.7 1.47 0.38526 3 1.7 1.13 0.12222 4 1.6 0.85 -0.16252 5 1.3 0.24 -1.42712 6 1.4 1.49 0.39878 ``` --- # Analyse descriptive de `NbNids`, ... .pull-left[ ```r range(chenilles$NbNids) ``` ``` [1] 0.03 3.00 ``` ```r mean(chenilles$NbNids) ``` ``` [1] 0.8112121 ``` ```r median(chenilles$NbNids) ``` ``` [1] 0.67 ``` ```r sd(chenilles$NbNids) ``` ``` [1] 0.8062287 ``` ] -- .pull-right[ ```r hist(chenilles$NbNids) ``` <img src="chap1_files/figure-html/hist_nids-1.png" width="504" style="display: block; margin: auto;" /> ] --- # ... de `LogNids`, ... .pull-left[ ```r range(chenilles$LogNids) ``` ``` [1] -3.50656 1.09861 ``` ```r mean(chenilles$LogNids) ``` ``` [1] -0.8132803 ``` ```r median(chenilles$LogNids) ``` ``` [1] -0.40048 ``` ```r sd(chenilles$LogNids) ``` ``` [1] 1.244941 ``` ] -- .pull-right[ ```r hist(chenilles$LogNids) ``` <img src="chap1_files/figure-html/hist_lognids-1.png" width="504" style="display: block; margin: auto;" /> ] --- # ... et de `NbStrat`, mais avec le {tidyverse} ```r library(tidyverse) summarise(chenilles, min = min(NbStrat), max = max(NbStrat), mean = mean(NbStrat), med = median(NbStrat), sd = sd(NbStrat)) ``` ``` min max mean med sd 1 1.1 2.9 1.981818 2 0.5664884 ``` -- ```r ggplot(chenilles) + aes(x = NbStrat) + geom_histogram(fill = "grey80", col = "black", bins = 15) ``` <img src="chap1_files/figure-html/ggplot-1.png" width="504" style="display: block; margin: auto;" /> --- # Nuages de points .pull-left[ ```r plot(chenilles$NbStrat, chenilles$NbNids) ``` <img src="chap1_files/figure-html/scaterplot_1-1.png" width="504" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r ggplot(chenilles) + aes(x = NbStrat, y = LogNids) + geom_point() ``` <img src="chap1_files/figure-html/scaterplot_2-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Régression linéaire simple ```r reg <- lm(NbNids ~ NbStrat, data = chenilles) reg ``` ``` Call: lm(formula = NbNids ~ NbStrat, data = chenilles) Coefficients: (Intercept) NbStrat 2.605 -0.905 ``` --- # Graphes de diagnostic ```r par(mfrow = c(2, 2)) plot(reg) ``` <img src="chap1_files/figure-html/lmsimpleplots-1.png" width="504" style="display: block; margin: auto;" /> --- # Transformation log ```r reg_log <- lm(LogNids ~ NbStrat, data = chenilles) par(mfrow = c(2, 2)) plot(reg_log) ``` <img src="chap1_files/figure-html/lmlog-1.png" width="504" style="display: block; margin: auto;" /> --- # Normalité des résidus ```r res <- residuals(reg_log) shapiro.test(res) # Teste la normalité ``` ``` Shapiro-Wilk normality test data: res W = 0.94118, p-value = 0.07353 ``` ```r ks.test(res, "pnorm", # Teste l'adéquation à une loi mean = mean(res), sd = sd(res)) ``` ``` Warning in ks.test(res, "pnorm", mean = mean(res), sd = sd(res)): ties should not be present for the Kolmogorov-Smirnov test ``` ``` One-sample Kolmogorov-Smirnov test data: res D = 0.12694, p-value = 0.6623 alternative hypothesis: two-sided ``` --- # Table d'analyse de la variance ```r anova(reg_log) ``` ``` Analysis of Variance Table Response: LogNids Df Sum Sq Mean Sq F value Pr(>F) NbStrat 1 17.499 17.4987 16.9 0.0002681 *** Residuals 31 32.097 1.0354 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Résultats de l'analyse ```r summary(reg_log) ``` ``` Call: lm(formula = LogNids ~ NbStrat, data = chenilles) Residuals: Min 1Q Median 3Q Max -2.0833 -0.8512 0.2844 0.9167 1.5247 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.7737 0.6537 2.713 0.010780 * NbStrat -1.3054 0.3175 -4.111 0.000268 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.018 on 31 degrees of freedom Multiple R-squared: 0.3528, Adjusted R-squared: 0.3319 F-statistic: 16.9 on 1 and 31 DF, p-value: 0.0002681 ``` --- # Extraction des statistiques importantes ```r S <- summary(reg_log) S$coefficients ``` ``` Estimate Std. Error t value Pr(>|t|) (Intercept) 1.773743 0.6537458 2.713200 0.0107797371 NbStrat -1.305379 0.3175324 -4.111009 0.0002680646 ``` ```r S$r.squared ``` ``` [1] 0.3528237 ``` ```r S$adj.r.squared ``` ``` [1] 0.3319471 ``` ```r pf(S$fstatistic[1], S$fstatistic[2], S$fstatistic[3], lower.tail = FALSE) ``` ``` value 0.0002680646 ``` --- # La même, mais avec {broom} ```r library(broom) tidy(reg_log) ``` ``` # A tibble: 2 x 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) 1.77 0.654 2.71 0.0108 2 NbStrat -1.31 0.318 -4.11 0.000268 ``` ```r glance(reg_log) ``` ``` # A tibble: 1 x 11 r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 0.353 0.332 1.02 16.9 2.68e-4 2 -46.4 98.7 103. # … with 2 more variables: deviance <dbl>, df.residual <int> ``` --- class: center, middle, inverse # Des questions ? .footnote[Slides créées avec le package <b><a href="https://github.com/yihui/xaringan" target="_blank">xaringan</a></b>.]