Abstract: Species distribution models relate the geographic occurrence pattern of a species to environmental features and are used for a variety of scientific and management purposes. One source of data for building species distribution models is citizen science, in which volunteers report locations where they observed (or did not observe) sets of species. Since volunteers have variable levels of expertise, citizen science data may contain both false positives and false negatives in the location labels (present vs. absent) they provide, but many common modeling approaches for this task do not address these sources of noise explicitly. In this paper, we propose to formulate the species distribution modeling task as a classification problem with class-conditional noise. Our approach builds on other applications of class-conditional noise modeling to crowdsourced data, but we focus on leveraging features of the noise processes that are distinct from the class features. We describe the conditions under which the parameters of our proposed model are identifiable and apply it to simulated data and data from the eBird citizen science project.
Source: Species Distribution Modeling of Citizen Science Data as a Classification Problem with Class-conditional Noise