We present a new non-parametric estimator of the conditional density of the kernel type. It is based on an efficient transformation of the data by quantile transform. By use of the copula representation, it turns out to have a remarkable product form. We study its asymptotic properties and compare its bias and variance to competitors based on nonparametric regression.
arXiv:0709.3192v3 [stat.ME] 12 Jun 2008
A
quan
tile- opula
approa
h
to
onditional
densit
y
estimation.
Olivier
P
.
F
augeras
L.S.T.A,
Université
Paris
6
175,
rue
du
Chevaler
et,
75013
Paris,
F
r
an
e
T
el:+(33)
1
44
27
85
62
F
ax:+(33)
1
44
27
33
42
Abstra t
W
e
presen
t
a
new
non-parametri
estimator
of
the
onditional
densit
y
of
the
k
ernel
t
yp
e.
It
is
based
on
an
e ien
t
transformation
of
the
data
b
y
quan
tile
transform.
By
use
of
the
opula
represen
tation,
it
turns
out
to
ha
v
e
a
remark
able
pro
du t
form.
W
e
study
its
asymptoti
prop
erties
and
ompare
its
bias
and
v
arian e
to
omp
etitors
based
on
nonparametri
regression.
A
omparativ
e
n
umeri al
sim
ulation
is
pro
vided.
Key
wor
ds:
onditional
densit
y,
k
ernel
estimation,
opula,
quan
tile
transform,
nonparametri
regression,
1991
MSC:
62G007,
62M20,
62M10
1
In
tro
du tion
1.1
Motivation
Let ((Xi, Yi); i = 1, . . . , n)
b
e
an
indep
enden
t
iden
ti ally
distributed
sample
from
real-v
alued
random
v
ariables (X, Y )
sitting
on
a
giv
en
probabilit
y
spa e.
F
or
predi ting
the
resp
onse Y
of
the
input
v
ariable X
at
a
giv
en
lo
ation x,
it
is
of
great
in
terest
of
estimating
not
only
the
onditional
mean
or
r
e
gr
ession
fun tion E(Y |X = x),
but
the
full
onditional
density f(y|x).
Indeed,
estimat-
ing
the
onditional
densit
y
is
m
u
h
more
informativ
e,
sin e
it
allo
ws
not
only
to
re al ulate
the
onditional
exp
e ted
v
alue E(Y |X)
and
onditional
v
arian e
Email
addr
ess:
olivier.faugeras gmail. om
(Olivier
P
.
F
augeras
).
Preprin
t
submitted
to
Elsevier
No
v
em
b
er
1,
2018
from
the
densit
y
,
but
also
to
pro
vide
the
general
shap
e
of
the
onditional
den-
sit
y
.
This
is
esp
e ially
imp
ortan
t
for
m
ulti-mo
dal
or
sk
ew
ed
densities,
whi
h
often
arise
from
nonlinear
or
non-Gaussian
phenomenas,
where
the
exp
e ted
v
alue
migh
t
b
e
no
where
near
a
mo
de,
i.e.
the
most
lik
ely
v
alue
to
app
ear.
Moreo
v
er,
for
situations
in
whi
h
onden e
in
terv
als
are
preferred
to
p
oin
t
estimates,
the
estimated
onditional
densit
y
is
an
ob
je t
of
ob
vious
in
terest.
1.2
Estimation
by
kernel
smo
othing
A
natural
approa
h
to
estimate
the
onditional
densit
y f(y|x)
of Y
giv
en
X = x
w
ould
b
e
to
exploit
the
iden
tit
y
f(y|x) = fXY (x, y)
fX(x)
(1)
where fXY
and fX
denote
the
join
t
densit
y
of (X, Y )
and X
,
resp
e tiv
ely
.
By
in
tro
du ing
P
arzen-Rosen
blatt
k
ernel
estimators
of
these
densities,
namely
ˆfn,XY (x, y) : = 1
n
n
X
i=1
K′
h′(Xi −x)Kh(Yi −y)
ˆfn,X(x) : = 1
n
n
X
i=1
K′
h′(Xi −x)
where Kh(.) = 1/hK(./h)
and K′
h′(.) = 1/h′K′(./h′)
are
(res aled)
k
ernels
with
their
asso
iated
sequen e
of
bandwidth h = hn
and h′ = h′
n
going
to
zero
as n →∞,
one
an
onstru t
the
quotien
t
ˆf R
n (y|x) :=
ˆfn,XY (x, y)
ˆfn,X(x)
and
obtain
an
estimator
of
the
onditional
densit
y
.
Su
h
an
estimator
w
as
rst
studied
b
y
Rosen
blatt
[26℄,
and
more
re en
tly
b
y
Hyndman
et
al.
[17
℄,
who
sligh
tly
impro
v
ed
on
Rosen
blatt’s
k
ernel
based
estimator.
1.3
Estimation
by
r
e
gr
ession
te
hniques
As
p
oin
ted
out
b
y
n
umerous
authors,
see
e.g.
F
an
and
Y
ao
[7℄
hapter
6,
this
approa
h
is
equiv
alen
t
to
the
one
arising
from
onsidering
this
onditional
densit
y
estimation
problem
in
a
regression
framew
ork.
Indeed,
let F(y|x)
b
e
the
um
ulativ
e
onditional
distribution
fun tion
of Y
giv
en X = x.
It
stems
from
the
fa t
that
E
1|Y −y|≤h|X = x
= F(y + h|x) −F(y −h|x) ≈2h.f(y|x)
2
as h →0 ,
that,
if
one
repla e
the
exp
e tation
in
the
ab
o
v
e
expression
b
y
its
empiri al
oun
terpart,
one
an
apply
the
usual
lo
al
a
v
eraging
metho
ds
and
p
erform
a
regression
estimation
on
the
syn
theti
data ((1/2h)1|Yi−y|≤h
; i =
1, . . . , n).
By
a
Bo
hner
t
yp
e
theorem,
one
an
ev
en
repla e
the
transformed
data
b
y
its
smo
othed
v
ersion
Y ′
i := Kh(Yi −y) := 1
hK
Yi −y
h
.
In
parti ular,
the
p
opular
Nadara
y
a-W
atson
regression
estimator
ˆf NW
n
(y|x) :=
Pn
i=1 Y ′
i K′
h′(Xi −x)
Pn
i=1 K′
h′(Xi −x)
redu es
itself
to
the
same
estimator
of
the
onditional
densit
y
of
the
double
k
ernel
t
yp
e
as
b
efore
ˆf NW
n
(y|x) :=
Pn
i=1 Kh(Yi −y)K′
h′(Xi −x)
Pn
i=1 K′
h′(Xi −x)
= ˆf R
n (y|x).
T
aking
adv
an
tage
of
this
regression
form
ulation,
F
an,
Y
ao
and
T
ong
[8℄
pro-
p
osed
a
onditional
densit
y
estimator
whi
h
generalizes
the
k
ernel
one
b
y
use
of
the
lo
al
p
olynomial
te
hniques.
In
parti ular,
it
allo
ws
to
ta
kle
with
the
bias
issues
of
the
k
ernel
smo
othing.
Ho
w
ev
er,
and
unlik
e
the
former,
it
is
no
longer
guaran
teed
to
ha
v
e
p
ositiv
e
v
alue
nor
to
in
tegrate
to
1
with
resp
e t
to y
.
With
these
issues
in
mind,
Hyndman
and
Y
ao
[18℄
built
on
lo
al
p
oly-
nomial
te
hniques
and
suggested
t
w
o
impro
v
ed
metho
ds,
the
rst
one
based
on
lo
ally
tting
a
log-linear
mo
del
and
the
se ond
one
on
onstrained
lo
al
p
olynomial
mo
deling.
An
o
v
erview
an
b
e
found
in
F
an
and
Y
ao
[7℄
(
hapter
6
and
10).
V
ery
re en
tly
,
Gy
ör
and
K
ohler
[15℄
studied
a
partitioning
t
yp
e
estimate
and
studied
its
prop
erties
in
to