Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling
PublisherInstitut d'Estadística de Catalunya
Rights accessOpen Access
In the context of capture-recapture studies, Chao (1987) de rived an inequality among capture frequency counts to obtain a lower bound for the size of a population bas ed on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on spe cies incidence (detection/non-detection) data in multiple sampling units. The inequality implies tha t the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering pa- per, Colwell and Coddington (1994) gave the name “Chao2” to t he estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to ma ny research fields and led to fruitful generalizations. Here, we first review Chao’s inequality un der various models and discuss some re- lated statistical inference questions: (1) Under what cond itions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic specie s richness? (3) Can other incidence fre- quency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which specie s richness estimators can be compared for equally-large or equally-complete samples via sample- size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Cha o’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness ( the total length of undetected branches of a phylogenetic tree connecting all species), with associate d rarefaction and extrapolation. A small empir- ical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD
CitationChao, A.; Colwell, R. K. Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling. "SORT", 21 Juny 2017, vol. 1, p. 3-54.