How Common Are The Plants Of Southern California?

Data and Analysis


As of 24 November 2002, we have 53 trails and floras (lists for short in the following) in our Master List, containing 1,980 taxa. We thought it would be fun to see the frequency distribution of those 1,980 taxa among the 53 lists.

Specifically, our database keeps track of the number of lists on which each taxon is found, N for short in the following. In our analysis here, we histogram that number to find how many taxa are found only on a single list out of the 53 lists (N=1); how many are found only on two lists (N=2), etc.

Note in particular that these values of N from the lists as a whole are very different from what the distribution of N is for a particular list. An example will make this clear:

Suppose we have 3 lists in our database. On list 1, we find species a, b and c. On list 2, we find species a, b and d. On list 3, we find species a, b and e.

Note that species a and b are found on all three trails, and that each trail contains a third species found only on that trail. Hence 1/3 of all the taxa on each trail are found uniquely on that trail (N=1), with 2/3 of the taxa on each trail found on all three trails (N=3). A histogram of N for a single trail will have a single value of N=1 and two occurrences of N=3, and will thus be "peaked" at N=3.

The histogram of occurrences from the Master List of all taxa will look very different. The Master List contains five taxa: a, b, c, d and e. Three of those taxa have N=1, and two of the taxa have N=3. The histogram from the Master List, which is what we are analyzing in the following, will be peaked at N=1, with 3/5 of all the taxa found only on a single list. This is simply because the number of "rare" taxa accumulate in the Master List, whereas the large number of observations for the common taxa only increase the value of N for those taxa.

You can see from this example that the average number of unique taxa per trail is simply the {number of taxa in the Master List with N=1} divided by the number of lists.

One major complication of this analysis is that some of our lists contain no subspecies or variety information, either because the list did not attempt to include that information or because we have not yet been able to determine the subspecies on a given trail. Hence the numbers will be artificially fragmented for some taxa, which will make the histogram of N biased toward lower values.

An example will make this clear. Erigeron foliosus is found on 12 lists, primarily from flora that did not give subspecies. Erigeron foliosus var. foliosus is found on 16 lists. It is very likely that all Erigeron foliosus in our lists are the var. foliosus, so a proper analysis would count this as one taxon, Erigeron foliosus var. foliosus, with N=28. Instead, in our preliminary analysis here, it will be counted as two taxa, Erigeron foliosus with N=12 and Erigeron foliosus var. foliosus with N=16.

The bias here toward lower values of N here is probably the source of the numerical differences between the power law exponent derived here and that derived in Regional Flora of the Santa Monica Mountains. In the referenced analysis, the effect of taxon fragmentation is much smaller, resulting in a histogram which is not as concentrated toward lower values.

Although this complication prevents full analysis of the frequency distribution for now, the frequency distribution turns out to be so heavily weighted toward low values of N that this complication does not seem to affect the primary conclusions of this preliminary analysis. However, readers should still take quite seriously the possible limitations imposed by this complication.

There are at least three other effects which must be kept in mind as possible limitations of this preliminary analysis.

  1. We have not controlled for the location of the lists. The lists in our database result from a largely random selection of available floras, as well as a largely random selection of which trails we have done trail guides for.

  2. We have not controlled for the size or shape of the areas which make up each list. Thus we have far from a homogeneous data set, with some lists coming from short linear trails and some lists coming from large circular areas.

  3. Many of our trails guides are incomplete, especially for annuals, due to the severe drought in 2002 and an insufficient number of visits so far.

Hence readers should pay attention to the following caveat:

Caveat: This is only a preliminary, just-for-fun analysis. This analysis will change in the future when our lists mature and we properly take into account the complications.

Hence the following plots and analysis are just-for-fun now. We'll do a proper analysis in the future when our trail guides are more complete.

Data and Analysis

At the time of this analysis, our database contained 14 Bob Muns floras, 33 trail guides, and 6 other floras, for a total of 53 lists. The following table gives histograms for N, the number of lists on which each taxon is found, for the entire data set of 53 lists, and two subsets, one of the 14 Bob Muns floras, and one of the 33 trail guides. A plot of the histograms follows.

NMunsTrailsAll Lists
14 515
15 818
16 614
17 312
18 218
19 34
20  8
21  5
22  6
23  5
24  3
25  5
26  4
27  4
28  4
29  6
30  2
31  2
32  1
33  2
34  2
35  1

In the above plot, three separate histograms of the data are shown, along with a function "fitted by eye" to each data curve. The top curve, represented by blue filled diamonds, gives the histogram for all 53 lists. The middle curve, represented by yellow filled triangles, gives the histogram for 14 flora of Bob Muns. The bottom curve, represented by aqua crosses, gives the histogram for our 33 trail guides.

The main feature of all histograms is the tremendous concentration to taxa found only on a single list, or on a small number of lists. Extremely few taxa are found on many lists. In the floras of Bob Muns, 453 taxa, 41% of all taxa, are found only in a single flora. Similarly, in our trail guides, 360 taxa, 46% of all taxa, are found only in a single trail guide. We consider these numbers in remarkable agreement, especially given the known incompleteness of some of our trail guides.

The inescapable conclusion is that most plant species are uncommon, found only in a few areas. This is one of the most surprising things we have learned from doing our trail guides.

In particular, before we had done many trail guides, we thought at first that it would be easy to do a plant guide for a trail that was a close neighbor of another trail we had done previously. Although it is true that neighboring trails share a lot of species, it was surprising to us that we kept coming across species we had not seen before. The above histogram shows clearly why we kept finding new species.

Remember that although {40-some}% of all taxa are found only on a single list, this does not imply that {40-some}% of each list consists of taxa found only on that list. Using the numbers from the flora of Bob Muns, the average number of unique taxa found on a given list is roughly 453 / 14 = 32. Since Bob's floras typically contain about 200-300 taxa, 10-16% of all the taxa in a given flora are uniquely found in that flora, much lower than the value of 41% for the entire Bob Muns Llst.

The other feature of all histograms is the continuous decline in the number of taxa with large N. If there was a significant population of very common species, the histogram could have shown a concentration that produced a bump at large N. This doesn't seem to be the case.

A list of the most common taxa, those with N larger than 25, are given separately in The Most Common Plants of Southern California.

The data are fitted quite closely by a power law:

Number of taxa with N observations = c * N^(-alpha).

In English, the number of taxa with N observations is a constant factor c times N raised to the power {-alpha}. The minus sign is put before alpha simply to make the values of alpha be positive, since the number of taxa is a declining function of N.

The values of the coefficients (beta is from another calculation given below) are:

Data Setcalphabeta

Thus, for example, the fitted function for all lists is:

Number of taxa with N observations = 716 * N^{-1.25} = 716 / (N*sqrt(sqrt(N))) where sqrt is the square root function.

Remember, these functions were simply "fitted by eye", and should not be interpreted too deeply due to the caveats given above. In particular, we would be very hesitant to attribute any significance to the difference between the exponents for the trails and the other distributions.

Just for fun, we can compare these results to a prediction of the power law for the variation in the number of species with area, A:

Number of taxa = b * A^(beta)

It is straightforward to calculate the percentage of taxa that should be found only on a single trail or in a single flora:

Percentage of taxa found on only a single list = M * { 1 - [(M-1)/M]^beta },

where M is the number of lists. The limiting value for this percentage of taxa found on only a single list is simply the value beta, which is approached very quickly as M gets large.

The values of beta are given in the table above, which are in the ballpark of typical values of beta deduced from the abundance of species world-wide. Again, however, further interpretation of the values is probably not yet warranted.

Stay tuned for a better analysis in a year or two in the future.

Teaser: the Jepson Manual gives an estimate of commonness and rarity for each taxa (see section beginning on their p. 29). Their goal was to roughly denote the most common ~20% of the taxa and the rarest ~20% of the taxa, as well as the middle ~60% of the taxa. It is clear from the histograms above that these two ends of the frequency distribution are very different. We intend in the future to do a detailed comparison between the estimates of commonness in the Jepson Manual with our database.

Go to:

Copyright © 2002 by Tom Chester and Jane Strong
Permission is freely granted to reproduce any or all of this page as long as credit is given to us at this source:
Comments and feedback: Tom Chester | Jane Strong
Last update: 26 December 2002