Functional central limit theorems for single-stage sampling designs

H. Boistard, H.P. Lopuhaä and A. Ruiz-Gazen,

to appear in the Annals of Statistics.

Download a pdf version.

For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz-Thompson empirical process and the Hájek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to generic sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.

Doubly robust inference for the distribution function in the presence of missing survey data

H. Boistard, G. Chauvet and D. Haziza.

Article published in the Scandinavian Journal of Statistics, vol. 43, n. 3, p. 683–699, 2016.

Download a pdf version.

Item nonresponse in surveys occurs when some, but not all, variables are missing. Unadjusted estimators tend to exhibit some bias, called the nonresponse bias, if the respondents differ from the nonrespondents with respect to the study variables. In this paper, we focus on item nonre- sponse, which is usually treated by some form of single imputation. We examine the properties of doubly robust imputation procedures, which are those that lead to an estimator that remains consistent if either the outcome variable or the nonresponse mechanism are adequately modeled. We establish the double robustness property of the imputed estimator of the finite population distribution function under random hot-deck impu- tation within classes. We also discuss the links between our approach and that of Chambers and Dunstan (1986). The results of a simulation study support our findings.

Robust Dickey-Fuller tests based on ranks for time series with additive outliers

V.A. Reisen, C. Lévy-Leduc, M. Bourguignon and H. Boistard,

to appear in Metrika.

Download a pdf version.

In this paper the unit root tests proposed by Dickey and Fuller (DF) and their rank counterpart suggested by Breitung and Gouri ́eroux (1997) (BG) are analytically in- vestigated under the presence of additive outlier (AO) contaminations. The results show that the limiting distribution of the former test is outlier dependent, while the latter one is outlier free. The finite sample size properties of these tests are also investigated under different scenarios of testing contaminated unit root processes. In the empirical study, the alternative DF rank test suggested in Granger and Hallman (1991) (GH) is also considered. In Fotopoulos and Ahn (2003), these unit root rank tests were analytically and empirically investigated and compared to the DF test, but with outlier-free processes. Thus, the results provided in this paper complement the studies of the previous works, but in the context of time series with additive outliers. Equivalently to DF and Granger and Hallman (1991) unit root tests, the BG test shows to be sensitive to AO contaminations, but with less severity. In practical situations where there would be a suspicion of additive outlier, the general con- clusion is that the DF and Granger and Hallman (1991) unit root tests should be avoided, however, the BG approach can still be used.

Approximation of rejective sampling inclusion probabilities and application to high order correlations

H. Boistard, H.P. Lopuhaä and A. Ruiz-Gazen,

Article published in Electronic Journal of Statistics, vol. 6, p. 1967–1983, 2012.
Download a pdf version.

This paper is devoted to rejective sampling. We provide an expansion of joint inclusion probabilities of any order in terms of the inclusion probabilities of order one, extending previous results by Hájek (1964) and Hájek (1981) and making the remainder term more precise. Following Hájek (1981), the proof is based on Edgeworth expansions. The main result is applied to derive bounds on higher order correlations, which are needed for the consistency and asymptotic normality of several complex estimators.

Large sample behavior of some well-known robust estimators under long-range dependence

C. Lévy-Leduc, H. Boistard, E. Moulines, M. S. Taqqu and V. A. Reisen.

Article published in Statistics, vol. 45, n. 1, p. 59–71, 2011.
Download a pdf version.

The paper concerns robust location and scale estimators under long-range dependence, focusing on the Hodges-Lehmann location estimator, on the Shamos-Bickel scale estimator and on the Rousseeuw-Croux scale estimator. The large sample properties of these estimators are reviewed. The paper includes computer simulation in order to examine how well the estimators perform at finite sample sizes.

Robust estimation of the scale and of the autocovariance function of Gaussian short and long-range dependent processes

C. Lévy-Leduc, H. Boistard, E. Moulines, M. S. Taqqu and V. A. Reisen.

Article published in Journal of Time Series Analysis, vol. 32, n. 2, p. 135-156, 2011. Download a pdf version.

A desirable property of an autocovariance estimator is to be robust to the presence of additive outliers. It is well-known that the sample autocovariance, being based on moments, does not have this property. Hence, the use of an autocovariance estimator which is robust to additive outliers can be very useful for time-series modeling. In this paper, the asymptotic properties of the robust scale and autocovariance estimators proposed by Rousseeuw and Croux (1993) and Ma and Genton (2000) are established for Gaussian processes, with either short-range or long-range dependence. It is shown in the short-range dependence setting that this robust estimator is asymptotically normal at the rate \sqrt{n} , where n is the number of observations. An explicit expression of the asymptotic variance is also given and compared to the asymptotic variance of the classical autocovariance estimator. In the long-range dependence setting, the limiting distribution displays the same behavior than that of the classical autocovariance estimator, with a Gaussian limit and rate \sqrt{n} when the Hurst parameter H is less than 3/4 and with a non-Gaussian limit (belonging to the second Wiener chaos) with rate depending on the Hurst parameter when H\in (3/4, 1) . Some Monte- Carlo experiments are presented to illustrate our claims and the Nile River data is analyzed as an application. The theoretical results and the empirical evidence strongly suggest using the robust estimators as an alternative to estimate the dependence structure of Gaussian processes.

Asymptotic properties of U-processes under long-range dependence

C. Lévy-Leduc, H. Boistard, E. Moulines, M. S. Taqqu and V. A. Reisen.

Article published in Annals of Statistics, vol. 39, n. 3, p. 1399-1426, 2011. Download a pdf version.

Let (X_i)_{i\geq1} be a stationary mean-zero Gaussian process with covariances \rho(k) = E(X_1X_{k+1}) satisfying:
\rho(0=1 and \rho(k)=k^{-D}L(k) where D is in (0,1) and L is slowly varying at infinity. Consider the U-process \{U_n(r), r \in I\} defined as
U_n(r)=\frac{1}{n(n-1)}\sum_{1\leq i\neq j \leq n}\mathbb{1}_{\{G(X_i,X_j\leq r\}},
where I is an interval included in \mathbb{R} and G is a symmetric function. In this paper, we provide central and non-central limit theorems for U_n . They are used to derive, in the long-range dependence setting, new properties of many well-known estimators such as the Hodges- Lehmann estimator, which is a well-known robust location estimator, the Wilcoxon-signed rank statistic, the sample correlation integral and an associated robust scale estimator. These robust estimators are shown to have the same asymptotic distribution as the classical location and scale estimators. The limiting distributions are expressed through multiple Wiener-Itô integrals.

Central limit theorem for multiple integrals with respect to the empirical process

Joint work with Eustasio del Barrio. Article publicated in Statistics and Probability Letters, vol. 79(2), p. 188-195, January 2009.
Download a pdf version.

In that article, we give some results of weak convergence of multiple integrals with respect to the empirical process. We consider objects of type
J_{n,m}(h)=\int'h(x_1, \dots, x_m)d\mathbb{G}_n(x_1)\dots \mathbb{G}_n(x_m),
where h is a symmetric real-valued square integrable function of m variables. X_1, \dots, X_n is a P-distributed i.i.d. sample, and \mathbb{P}_n=\frac{1}{n}\sum_{i=1}^n\delta_{X_i} and \mathbb{G}_n=\sqrt{n}(\mathbb{P}_n-P) are respectively the associated empirical measure and the empirical process. \int' is the integral where the integration on the diagonal has been omitted. We include the case of non degenerate kernels with respect to the underlying distribution. Our results are related to earlier results on U-statistics. We introduce a stochastic integral with respect to the Brownian bridge which allows us to express the limit in a unified way in the degenerate and non degenerate cases. Using the multiple integral with respect to the empirical process has an advantage with respect to using U-statistics: the Central Limit Theorem we obtain is simpler. It does not involve the degeneracy of the kernel and the limit is expressed in a precise way.

PhD Thesis: asymptotic efficiency of tests related with the Wasserstein statistic

PhD Thesis with advisors Eustasio del Barrio and Fabrice Gamboa, defended on the 16th of July, 2007, before the tribunal composed by Profesors Jean-Marc Azaïs, Bernard Bercu, Eustasio del Barrio, Fabrice Gamboa and Carlos Matrán.
This thesis is composed of three main parts. In the first part, we study some asymptotic properties of multiple integrals with respect to the empirical process. The second part is devoted to the study of the asymptotic efficiency of the Wasserstein test. The equivalence of the Wasserstein statistic with a double integral with respect to the empirical process allows us to apply the results of the first part. A simulation study is added to the study of the asymptotic power. The third part deals with large deviations for L-statistics. A large deviations principle is obtained using the topology of the Wasserstein distance on the space of measures, under conditions on the extremes.

Large deviations for L-statistics

Published in Statistics and Decisions, 25(2), p. 89-125. Download a pdf version.

The purpose of this article is to establish a functional Large Deviations Principle (LDP) for L-statistics under conditions on the extremes. The method is based on Sanov’s theorem and the usual tools of the theory of large deviations. We first prove a LDP under a quite strong extremes condition. We provide the full treatment of the case of the uniform distribution and an example in which the rate function can be calculated very precisely. Afterwards, we obtain a LDP under weaker extremes conditions. The case of the exponential distribution, which does not match the former integrability conditions, is treated owing to another method: we give a functional LDP based on the Gärtner-Ellis theorem. We extend our study to normalized L-statistics under strong extremes conditions.