Projekt per år
Sammanfattning
Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: "Are aggregated databases currently providing the right information to enable effective and unbiased reuse?" We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.
Originalspråk  engelska 

Artikelnummer  02360 
Tidskrift  Ecological Applications 
Volym  31 
Nummer  6 
Antal sidor  8 
ISSN  10510761 
DOI  
Status  Publicerad  sep. 2021 
MoEpublikationstyp  A1 Tidskriftsartikelrefererad 
Vetenskapsgrenar
 1181 Ekologi, evolutionsbiologi
 111 Matematik
 112 Statistik
Projekt
 1 Slutfört

Multivariate Gaussian processes for hierarchical modelling of species distributions
Vanhatalo, J., Kaurila, K. & Numminen, S.
Suomen Akatemia Projektilaskutus
01/09/2018 → 31/08/2022
Projekt: Forskningsprojekt