In geoscience we’re not all that great with uncertainty. Since everything is uncertain, calculating the error margins in geoscience itself is often neglected. However, a geologist is perfectly suited to understand Bayes theorem, a little tool that statisticians swoon over.

I haven’t been on too many field trips, however, most of the time it would go like this:

- The subject matter expert, usually a professor or the field trip coordinator asks “what do you see”.
- Then some student that feels particularly up to it will describe the outcrop in their flavor of geology, maybe structural or sedimentological.
- The expert will acknowledge the parts of the description that fit their narrative and they will tell the audience how this particular outcrop came to be.
- Inconvenient features of the outcrop will be ignored, as they don’t fit the narrative too well. (Optional)

This makes sense, out of all the information we see over the entire varied rock-types, depositional settings and structural features, we have to start somewhere to analyze the outcrop. Statisticians would say the variance is big. A structural geologist will look at fault lines first, as they fit their expertise. In statistics this is called bias. It’s the preference of a certain explanation over another. Most professions are taught to have a certain bias, that’s why WebMD will always suggest cancer as it fits the entire spectrum of symptoms and the human doctor will send you home and tell you to get the flu shot next year.

Machine learners will have recognized where this is leading. There is a trade-off between a targeted idea and the variance of the possibilities: Bias-Variance-Tradeoff. It’s a neat concept that simply states that a biased decision will not account for the entire variance of possibilities.

Geologists have to work in a highly biased decision space. Without knowledge of some regional tectonics, diagenesis, possible volcanic activity even, work in an outcrop gets very hard. It could be anything, so every detail has to be analyzed and weighed.

## Bayes Theorem for Geologists

When we are in the outcrop we collect evidence. We find a healed fracture that could contain Quartz or Calcite. We have tested healed fractures in this area before and we found that almost 95% limestones contain Calcite seams. However, in sandstone we could find 25% Calcite seams. It is easiest for us to test the seams, a simple scratch test is enough. The surrounding rock is very withered and we try not to hammer all the rocks to preserve the nice geosite. The scratch test reveals it is Calcite after all. Now we can use Bayes theorem to calculate the probability that we are looking at a limestone.

We set the probabilities of the rock being sandstone or limestome to be 50/50, as we don’t know better. In statistical terms, we set the “known distribution” or a-priori to be equal: $$ P(Limestone) = P(Sandstone) = 0.5 $$.

We also know that the calcite has a probability of 95% in limestones. In statistical terms the conditional probability is $$ P(Calcite | Limestone) = .95 $$. The same goes for sandstone: $$ P(Calcite | Sandstone) = .25 $$.

In fact this is all we need to perform the Bayes trick. One intermediate step helps us understand Bayes even better:

$$ P(Calcite) = P(Limestone) \cdot P(Calcite | Limestone) + P(Sandstone) \cdot P(Calcite | Sandstone ) $$

This gives the total probability of testing positive for limestone in the outcrop. Now to the juicy juicy Bayes itself. We want to find the conditional probability of having a limestone rock surrounding our Calcite seam, in statistics this is: P(Limestone | Calcite). You may notice that It’s now turned around. It’s “is it limestone, because we found Calcite?” instead of “How likely is it to find Calcite in limestone?”.

\(P(Limestone | Calcite) = \frac{P(Limestone) \cdot P(Calcite | Limestone) }{ P(Calcite) }\)We have all the numbers to do this:

\(P(Limestone | Calcite) = \frac{0.5 \cdot 0.95 }{ (0.5 \cdot 0.95 + 0.5 \cdot .025) } = \frac{0.5 \cdot 0.95 }{ 0.6 } = 0.79\)We get a probability of 79% of this being limestone surrounding a Calcite seam. Proudly, we go to our professor and report the number. It’s interesting, but based on the history of the outcrop, they suggest you might adjust your calculation a little bit. She tells the group that there were huge coral reefs in this area and even shows some fossils in another outcrop. Now that you understand Bayes, you can easily go back and adjust your numbers. The reefs made up 65% of the area and with this expert knowledge you adjust P(Limestone) to 65% and P(Sandstone) to 35%.

\( P(Limestone | Calcite) = \frac{0.65 \cdot 0.95 }{ (0.65 \cdot 0.95 + 0.35 \cdot 0.25) }= \frac{0.65 \cdot 0.95 }{0.705} = 0.875 \)Your adjusted probability goes up to 87.5%. We can see that expert knowledge can be used in a Bayesian approach, which is why many people like it these days. Expert knowledge, or bias, skews the results in a certain direction, something we can use, but need to use with care.

The fantastic cover picture is Elephant Rocks, NZ by Bernard Spragg. The article was inspired by many conversations with my colleague Sebastian Tølbøll Glavind.

#### Jesper Dramsch

#### Latest posts by Jesper Dramsch (see all)

- Kaggle Days Two – Googling in San Francisco - 2019-06-18
- Meet me at the EAGE Annual 2019 - 2019-05-30
- Kaggle Days One – Googling in San Francisco [2/3] - 2019-05-24

Howdy . Awesome write-up on Bayesian methods for geology! I just taught this concept t… https://t.co/rNbGlC0Xm5

Pingback: Data Is Not Neutral — Way of the Geophysicist