
A biodiversity simulation framework
We have made a simulation framework modelling biodiversity decline to improve and validate conservation procedures (in this context, decisions about knowledge collecting and region security throughout a landscape) employing an RL algorithm. We applied a spatially explicit person-dependent simulation to assess potential biodiversity improvements centered on all-natural processes of mortality, replacement and dispersal. Our framework also incorporates anthropogenic procedures such as habitat modifications, selective removal of a species, swift local weather improve and existing conservation efforts. The simulation can incorporate 1000’s of species and thousands and thousands of folks and track populace sizes and species distributions and how they are afflicted by anthropogenic action and weather adjust (for a specific description of the product and its parameters see Supplementary Methods and Supplementary Desk 1).
In our model, anthropogenic disturbance has the outcome of altering the all-natural mortality rates on a species-distinct degree, which depends on the sensitivity of the species. It also influences the full number of persons (the carrying capacity) of any species that can inhabit a spatial device. For the reason that sensitivity to disturbance differs amongst species, the relative abundance of species in each cell alterations immediately after introducing disturbance and on achieving the new equilibrium. The impact of local climate improve is modelled as regionally influencing the mortality of persons dependent on species-certain climatic tolerances. As a end result, more tolerant or warmer-adapted species will are inclined to swap delicate species in a warming ecosystem, consequently inducing range shifts, contraction or expansion throughout species depending on their climatic tolerance and dispersal potential.
We use time-ahead simulations of biodiversity in time and place, with growing anthropogenic disturbance via time, to optimize conservation procedures and evaluate their performance. Along with a representation of the pure and anthropogenic evolution of the system, our framework consists of an agent (that is, the policy maker) taking two sorts of steps: (1) monitoring, which supplies data about the current condition of biodiversity of the procedure, and (2) safeguarding, which uses that information to find places for safety from anthropogenic disturbance. The checking policy defines the level of depth and temporal resolution of biodiversity surveys. At a minimum degree, these consist of species lists for every single mobile, whereas far more detailed surveys give counts of inhabitants dimensions for every single species. The safety policy is knowledgeable by the success of monitoring and selects guarded regions in which more anthropogenic disturbance is preserved at an arbitrarily reduced price (Fig. 1). Since the complete variety of areas that can be guarded is minimal by a finite price range, we use an RL algorithm42 to improve how to execute the protecting steps based on the info provided by monitoring, these that it minimizes species reduction or other conditions dependent on the coverage.
We present a entire description of the simulation process in the Supplementary Approaches. In the sections underneath we existing the optimization algorithm, describe the experiments carried out to validate our framework and display its use with an empirical dataset.
Conservation planning within just a reinforcement finding out framework
In our design we use RL to optimize a conservation policy less than a predefined coverage goal (for instance, to lessen the loss of biodiversity or optimize the extent of protected space). The CAPTAIN framework involves a house of steps, particularly monitoring and defending, that are optimized to increase a reward R. The reward defines the optimality criterion of the simulation and can be quantified as the cumulative value of species that do not go extinct during the timeframe evaluated in the simulation. If the value is set equal throughout all species, the RL algorithm will lower in general species extinctions. On the other hand, distinct definitions of benefit can be utilised to lower decline primarily based on evolutionary distinctiveness of species (for example, reducing phylogenetic diversity loss), or their ecosystem or economic worth. Alternatively, the reward can be established equal to the amount of safeguarded spot, in which scenario the RL algorithm maximizes the number of cells guarded from disturbance, irrespective of which species manifest there. The amount of area that can be secured through the shielding action is established by a finances Bt and by the expense of safety (C_t^c), which can range throughout cells c and by time t.
The granularity of checking and guarding actions is centered on spatial units that may perhaps consist of 1 or extra cells and which we define as the safety units. In our method, safety models are adjacent, non-overlapping locations of equal dimensions (Fig. 1) that can be guarded at a cost that cumulates the expenses of all cells involved in the device.
The checking action collects facts within just each and every safety unit about the condition of the process St, which involves species abundances and geographic distribution:
$$S_t={H_t,D_t,F_t,T_t,C_t,P_{t},B_{t}}$$
(1)
where by Ht is the matrix with the quantity of individuals across species and cells, Dt and Ft are matrices describing anthropogenic disturbance on the procedure, Tt is a matrix quantifying local climate, Ct is the expense matrix, Pt is the existing protection matrix and Bt is the out there spending budget (for far more information see Supplementary Procedures and Supplementary Table 1). We outline as function extraction the result of a perform X(St), which returns for each individual safety device a established of capabilities summarizing the point out of the method in the unit. The quantity and variety of attributes (Supplementary Approaches and Supplementary Desk 2) depends on the checking coverage πX, which is decided a priori in the simulation. A predefined monitoring coverage also determines the temporal frequency of this motion all through the simulation, for example, only at the 1st time phase or recurring at each and every time step. The functions extracted for each and every device represent the input upon which a guarding action can choose area, if the spending plan makes it possible for for it, adhering to a safety plan πY. These features (listed in Supplementary Desk 2) include things like the range of species that are not already shielded in other units, the quantity of uncommon species and the expense of the device relative to the remaining price range. Unique subsets of these functions are utilized depending on the checking plan and on the optimality criterion of the security coverage πY.
We do not suppose species-specific sensitivities to disturbance (parameters ds, fs in Supplementary Desk 1 and Supplementary Strategies) to be acknowledged capabilities, due to the fact a specific estimation of these parameters in an empirical scenario would have to have focused experiments, which we look at unfeasible throughout a huge amount of species. Rather, species-distinct sensitivities can be learned from the technique by the observation of variations in the relative abundances of species (x3 in Supplementary Table 2). The functions examined throughout various policies are specified in the subsection Experiments underneath and in the Supplementary Procedures.
The preserving motion selects a defense unit and resets the disturbance in the incorporated cells to an arbitrarily lower level. A safeguarded device is also immune from foreseeable future anthropogenic disturbance boosts, but security does not prevent local weather transform in the unit. The product can contain a buffer location together the perimeter of a guarded device, in which the level of defense is reduced than in the centre, to mimic the typically damaging edge outcomes in safeguarded locations (for illustration, higher vulnerability to severe weather conditions). Despite the fact that defending a disturbed spot theoretically permits it to return to its preliminary biodiversity ranges, inhabitants progress and species composition of the safeguarded spot will nonetheless be managed by the death–replacement–dispersal procedures described higher than, as well as by the condition of neighbouring spots. As a result, defending an location that has already been through biodiversity loss could not end result in the restoration of its authentic biodiversity amounts.
The defending motion has a cost decided by the cumulative price of all cells in the chosen security device. The cost of safety can be established equal across all cells and continual as a result of time. Alternatively, it can be outlined as a perform of the present-day degree of anthropogenic disturbance in the mobile. The value of each individual defending motion is taken from a predetermined finite funds and a device can be safeguarded only if the remaining spending budget enables it.
Policy definition and optimization algorithm
We frame the optimization problem as a stochastic regulate dilemma wherever the state of the system St evolves through time as explained in the part higher than (see also Supplementary Solutions), but it is also influenced by a established of discrete steps decided by the security coverage πY. The safety policy is a probabilistic coverage: for a given established of policy parameters and an enter condition, the coverage outputs an array of chances linked with all probable guarding actions. Though optimizing the design, we extract actions according to the possibilities created by the policy to make certain that we check out the area of steps. When we operate experiments with a mounted coverage instead, we pick the action with optimum chance. The input state is transformed by the attribute extraction function X(St) described by the monitoring policy, and the features are mapped to a chance through a neural community with the architecture described beneath.
In our simulations, we resolve checking plan πX, consequently predefining the frequency of checking (for case in point, at just about every time move or only at the to start with time step) and the amount of money of info created by X(St), and we improve πY, which decides how to most effective use the accessible budget to maximize the reward. Every single action A has a cost, outlined by the purpose Charge(A, St), which in this article we established to zero for the checking motion (X) throughout all monitoring insurance policies. The cost of the protecting action (Y) is as a substitute established to the cumulative charge of all cells in the selected safety device. In the simulations introduced in this article, unless of course usually specified, the defense coverage can only incorporate 1 protected unit at just about every time action, if the finances allows, that is if Price(Y, St) < Bt.
The protection policy is parametrized as a feed-forward neural network with a hidden layer using a rectified linear unit (ReLU) activation function (Eq. (3)) and an output layer using a softmax function (Eq. (5)). The input of the neural network is a matrix x of J features extracted through the most recent monitoring across U protection units. The output, of size U, is a vector of probabilities, which provides the basis to select a unit for protection. Given a number of nodes L, the hidden layer h(1) is a matrix U × L:
$$h_ul^(1)=gleft(mathopsumlimits_j =1^Jx_ujW_jl^(1)right)$$
(2)
where u ∈1, …, U identifies the protection unit, l ∈1, …, L indicates the hidden nodes and j ∈1, …, J the features and where
is the ReLU activation function. We indicate with W(1) the matrix of J × L coefficients (shared among all protection units) that we are optimizing. Additional hidden layers can be added to the model between the input and the output layer. The output layer takes h(1) as input and gives an output vector of U variables:
$$h_u^(2)=sigma left(mathopsumlimits_l=1^Lh_ul^(1)W_l^(2)right)$$
(4)
where σ is a softmax function:
$$sigma(x_i) = fracexp(x_i)sum_uexp(x_u)$$
(5)
We interpret the output vector of U variables as the probability of protecting the unit u.
This architecture implements parameter sharing across all protection units when connecting the input nodes to the hidden layer this reduces the dimensionality of the problem at the cost of losing some spatial information, which we encode in the feature extraction function. The natural next step would be to use a convolutional layer to discover relevant shape and space features instead of using a feature extraction function. To define a baseline for comparisons in the experiments described below, we also define a random protection policy (hatpi ), which sets a uniform probability to protect units that have not yet been protected. This policy does not include any trainable parameter and relies on feature x6 (an indicator variable for protected units Supplementary Table 2) to randomly select the proposed unit for protection.
The optimization algorithm implemented in CAPTAIN optimizes the parameters of a neural network such that they maximize the expected reward resulting from the protecting actions. With this aim, we implemented a combination of standard algorithms using a genetic strategies algorithm43 and incorporating aspects of classical policy gradient methods such as an advantage function44. Specifically, our algorithm is an implementation of the Parallelized Evolution Strategies43, in which two phases are repeated across several iterations (hereafter, epochs) until convergence. In the first phase, the policy parameters are randomly perturbed and then evaluated by running one full episode of the environment, that is, a full simulation with the system evolving for a predefined number of steps. In the second phase, the results from different runs are combined and the parameters updated following a stochastic gradient estimate43. We performed several runs in parallel on different workers (for example, processing units) and aggregated the results before updating the parameters. To improve the convergence we followed the standard approach used in policy optimization algorithms44, where the parameter update is linked to an advantage function A as opposed to the return alone (Eq. (6)). Our advantage function measures the improvement of the running reward (weighted average of rewards across different epochs) with respect to the last reward. Thus, our algorithm optimizes a policy without the need to compute gradients and allowing for easy parallelization. Each epoch in our algorithm works as:
for every worker p do
(epsilon _pleftarrow mathcalN(0,sigma )), with diagonal covariance and dimension W + M
for t = 1,…,T do
Rt ← Rt−1 + rt(θ + ϵp)
end for
end for
R ← average of RT across workers
Re ← αR + (1 − α)Re−1
for every coefficient θ in W + M do
θ ← θ + λA(Re, RT, ϵ)
end for
where (mathcalN) is a normal distribution and W + M is the number of parameters in the model (following the notation in Supplementary Table 1). We indicate with rt the reward at time t, with R the cumulative reward over T time steps. Re is the running average reward calculated as an exponential moving average where α = 0.25 represents the degree of weighting decrease and Re−1 is the running average reward at the previous epoch. λ = 0.1 is a learning rate and A is an advantage function defined as the average of final reward increments with respect to the running average reward Re on every worker p weighted by the corresponding noise ϵp:
$$A(R_e,R_T,epsilon )=frac1Pmathopsumlimits_p(R_e-R_T^p)epsilon _p.$$
(6)
Experiments
We used our CAPTAIN framework to explore the properties of our model and the effect of different policies through simulations. Specifically, we ran three sets of experiments. The first set aimed at assessing the effectiveness of different policies optimized to minimize species loss based on different monitoring strategies. We ran a second set of simulations to determine how policies optimized to minimize value loss or maximize the amount of protected area may impact species loss. Finally, we compared the performance of the CAPTAIN models against the state-of-the-art method for conservation planning (Marxan25). A detailed description of the settings we used in our experiments is provided in the Supplementary Methods. Additionally, all scripts used to run CAPTAIN and Marxan analyses are provided as Supplementary Information.
Analysis of Madagascar endemic tree diversity
We analysed a recently published33 dataset of 1,517 tree species endemic to Madagascar, for which presence/absence data had been approximated through species distribution models across 22,394 units of 5 × 5 km spanning the entire country (Supplementary Fig. 5a). Their analyses included a spatial quantification of threats affecting the local conservation of species and assumed the cost of each protection unit as proportional to its level of threat (Supplementary Fig. 5b), similarly to how our CAPTAIN framework models protection costs as proportional to anthropogenic disturbance.
We re-analysed these data within a limited budget, allowing for a maximum of 10% of the units with the lowest cost to be protected (that is, 2,239 units). This figure can actually be lower if the optimized solution includes units with higher cost. We did not include temporal dynamics in our analysis, instead choosing to simply monitor the system once to generate the features used by CAPTAIN and Marxan to place the protected units. Because the dataset did not include abundance data, the features only included species presence/absence information in each unit and the cost of the unit.
Because the presence of a species in the input data represents a theoretical expectation based on species distribution modelling, it does not consider the fact that strong anthropogenic pressure on a unit (for example, clearing a forest) might result in the local disappearance of some of the species. We therefore considered the potential effect of disturbance in the monitoring step. Specifically, in the absence of more detailed data about the actual presence or absence of species, we initialized the sensitivity of each species to anthropogenic disturbance as a random draw from a uniform distribution (d_s sim mathcalU(0,1)) and we modelled the presence of a species s in a unit c as a random draw from a binomial distribution with a parameter set equal to (p_s^c=1-d_stimes D^c), where Dc ∈ [0, 1] is the disturbance (or ‘threat’ sensu Carrasco et al.33) in the unit. Under this approach, most of the species expected to live in a unit are considered to be present if the unit is undisturbed. Conversely, many (especially sensitive) species are assumed to be absent from units with high anthropogenic disturbance. This resampled diversity was used for feature extraction in the monitoring steps (Fig. 1c). While this approach is an approximation of how species might respond to anthropogenic pressure, the use of additional empirical data on species-specific sensitivity to disturbance can provide a more realistic input in the CAPTAIN analysis.
We repeated this random resampling 50 times and analysed the resulting biodiversity data in CAPTAIN using the one-time protection model, trained through simulations in the experiments described in the previous section and in the Supplementary Methods. We note that it is possible, and perhaps desirable, in principle to train a new model specifically for this empirical dataset or at least fine-tune a model pretrained through simulations (a technique known as transfer learning), for instance, using historical time series and future projections of land use and climate change. Yet, our experiment shows that even a model trained solely using simulated datasets can be successfully applied to empirical data. Following Carrasco et al.33, we set as the target of our policy the protection of at least 10% of each species range. To achieve this in CAPTAIN, we modified the monitoring action such that a species is counted as protected only when at least 10% of its range falls within already protected units. We ran the CAPTAIN analysis for a single step, in which all protection units are established.
We analysed the same resampled datasets using Marxan with the initial budget used in the CAPTAIN analyses and under two configurations. First, we used a BLM (BLM = 0.1) to penalize the establishment of non-adjacent protected units following the settings used in Carrasco et al.33. After some testing, as suggested in Marxan’s manual45, we set penalties on exceeding the budget, such that the cost of the optimized results indeed does not exceed the total budget (THRESHPEN1 = 500, THRESHPEN2 = 10). For each resampled dataset we ran 100 optimizations (with Marxan settings NUMITNS = 1,000,000, STARTTEMP = –1 and NUMTEMP = 10,000 (ref. 45) and used the best of them as the final result. Second, because the BLM adds a constraint that does not have a direct equivalent in the CAPTAIN model, we also repeated the analyses without it (BLM = 0) for comparison.
To assess the performance of CAPTAIN and compare it with that of Marxan, we computed the fraction of replicates in which the target was met for all species, the average number of species for which the target was missed and the number of protected units (Supplementary Table 4). We also calculated the fraction of each species range included in protected units to compare it with the target of 10% (Fig. 6c,d and Supplementary Fig. 6c,d). Finally, we calculated the frequency at which each unit was selected for protection across the 50 resampled datasets as a measure of its relative importance (priority) in the conservation plan.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.