An Experiment in Assessing End of Year Predictions: How Did they Fare? (2)

Here are the results of our experiment on the evaluation of a sample of 2012 end of year predictions, following up on the post explaining the methodology used (spreadsheet and an interactive version of the charts can be found here).

Let us start with the bad news. As a whole, the percentage of success is relatively low, 27%, i.e. 44 predictions were correct out of the 165 made. However, this global figure hides very different results.

In terms of method, as shown below, classical analysis (that may cover the use of other methods or not) obtains the whole range of results, from complete inaccuracy to excellent. The validity of the judgement on the future depends upon the knowledge, understanding and genius of the analyst.

predictions, 2012, success rate, evaluationRisk analysis fares better than overall sample, but is still below 50%. This might be related to the absence of differentiation between likelihood and impact as explained in the previous post.

Our sole example of scenario is relatively unsuccessful. However, this is also linked to the very specific form and place scenarios have in terms of foresight: fictionalized narratives mainly aim at making one plausible version of the future real for the target audience. They intend to break cognitive biases and other lenses. They must be built upon a coherent model, which can be seen as the principle, the essence, but the unfolding discrete events themselves are only one example of what might happen. In Kant’s understanding, a scenario is a phenomenon, built upon noumena.

Unsurprisingly, analysis that includes, more or less, a part of recommendation and advocacy, what we could see as normative predictions, do not fare very well.

This brief evaluation, however, tells only one part of the story. As explained in the methodological post, we can draw much more interesting conclusions out of an assessment that is less drastic and marks each prediction first according to the plausibility of the content and second to the accuracy of the timing, despite the inherent subjectivity of the approach.

Issues and countries: a conventional view of national security

The first very interesting result this experiment gives us is about the topic of the predictions itself, what was deemed as relevant and interesting enough to be the object of anticipation.

The overwhelming majority of predictions were made according to countries, be they focused on economics, political economy, geopolitics or politics. The map below shows the intensity of the number of predictions made, the brightest the colour, the more numerous the prediction. Some countries were off the radar, when, for example, coups in Mali and Guinea-Bissau happened, as correctly predicted by Jay Ulfelder, whose forecasts were not included in the experiment. This underlines the danger to leave some countries out when making judgements on the future, because one will automatically tend to focus on those countries where events or problems occurred in the recent past, or on those that were of interest for one reason or another. The limited character of resources however most of the time forces such initial selection, which thus must be made with great care and kept in mind.

nbre per countries scaled1Very few assessments concerned other global problems, when they belong to what is called unconventional national security. Among those identified in our sample, we find: oil, water, gold, the virtual and digital world (although hardly with a cyber-security dimension), augmented reality, and the environment (but only in terms of regime and debates, not in terms of actual natural events and their impacts). Many issues such as most transformational technologies, from nano to biosecurity, health concerns, cyber threats, extreme weather events or resources competition beyond oil were thus left out. One possible explanations is that we are still operating within specializations inherited from the last three centuries, and that for each new issue appearing on the agenda of national security, a new sector of expertise is created, with serious potential adverse consequences on our identification of threats. We may very well become perfect in terms of predictions on old topics, this will always remain insufficient if interactions and feedbacks with new threats are ignored. For example, International Relations – or geopolitical – analysis must fully include the cyber dimension, and cyber-security in terms of national security cannot be fully understood without the international, geopolitical and political dimensions.

Systematically including horizon scanning for emergence of novel dangers and pluri-disciplinary/multi-expertise work would be needed. Another possible explanation is that those unconventional security issues were left out because they were estimated as beyond the 2012 time horizon. We may only wish this latter hypothesis to be correct.

Inaccurate timing and relatively plausible content

If we now look at the countries, object of predictions, and colour them first according to the plausibility of content of the predictions, and second according to the accuracy of the timing, we have the two following maps. The averaged accuracy of the results goes from deep red (inaccuracy) to deep green (accuracy).

2012, prediction, evaluation 2012, prediction, evaluation

The maps confirm the hunch I wished to test: our capacity to predict timing is less good than our ability to understand content and thus foresee coming evolutions. We know quite well what will most probably happen, but we do not know precisely when.

Interestingly, China, Russia and the U.S. fare relatively badly for both content and timing. This could be explained by strong cognitive and ideological biases existing for those three countries, including, for the U.S., which also ranks first for the number of predictions, those biases related to partisan politics… and analysis. Regarding our initial conclusions on methodology, and considering the lack of explanations given by authors, this shows that we should, ideally, and as underlined by forecaster, futurist and strategist Scott Smith in his Year-end lists are hazardous to your health, identify precisely who the author is, his/her target audience, and in which context the predictions were made. The category mixing classical and normative analysis would most probably swell as a result.

Timing for Brazil is completely wrong, and this would be even worse if the prediction made for the BRICS (0 on all counts) had been added, while the results would have been less good for all the other BRICS. Again, we are seeing an ideological bias at work, a “pro-BRICS” bias, which is also the reflect of a global power struggle we can see enacted in any international fora.

These results point towards the absolute necessity to struggle against all biases when making judgements on the future, if proper decisions are to result from this foresight (which is of most probably not the case with our sample, but we have to consider that many decision-makers also read open source predictions and may be influenced by them, knowingly or not).

Novelty and pace

Finally, let us observe the evaluation for all predictions, without aggregation and average (click here to open the chart in full in another window).

results 2012 predictions scaled 1

Besides the points we already made, what is most striking is the way various water issues were erroneously foreseen. If, true enough, only one author is concerned – and he had the merit to select this issue when the corresponding U.S. ICA was not yet published - we can always learn from all mistakes. This erroneous judgements on water security may underline the difficulty of properly estimating issues when those are relatively newly integrated in assessments. First, there is an insufficient accumulated knowledge and understanding. Second, the eagerness to promote a topic that may still be debated and belittled may lead to overstatement.

The wrong timing on various European countries stems most probably from our very imperfect knowledge of internal political dynamics, as those last decades mainstream political science has tended to focus on elite politics and public policy – one of the major cause of the warning failure regarding the “Arab Spring” – even more so in the case of the so-called rich countries. Furthermore, time is very rarely an object of research. Finally, we tend collectively to forget that the political time is long – even very long – and that much of our (recent) habits, approaches and institutions do not accommodate for it… but this will not change the reality of political dynamics.

————-

Nota: The surprising, at first glance, cases when timing gets a better mark than content correspond to predictions that were accurate (or almost accurate) in terms of timing, accompanied by explanation of dynamics that were partly or fully wrong, illogical, or inaccurate.

An Experiment in Assessing End of Year Predictions (1)

experiment, assessing, evaluating, foresight, forecast, prediction Evaluating predictions, or more broadly the end-products resulting from methodologies used to anticipate possible futures, should become the norm rather than the exception, as explained in a previous post. Such exercise should improve methods and processes and direct our efforts towards further research. We shall here make the experiment to assess a sample of open source predictions for the year 2012. This part will address the methodological problems encountered while creating the evaluation itself, and underline the related lessons learned. The second part (forthcoming) will discuss results.

Actually, there is nothing new here as estimating results for “predictions” is one of the fundamental principles of science (a scientific theory must have explanatory, descriptive and predictive power). If a theory does not fulfill the predictive criteria, then it must be disqualified. Things are relatively straightforward when dealing with hard science. They are much more complex when we are in he field of social science, and the very possibility to obtain predictive power is hotly discussed, debated and often discarded. If we consider the family of disciplines, sub-disciplines and methodologies – what we call here strategic foresight and warning (foresight for short) – that deal with future(s)-related analysis, then we are faced with even more challenges. Some methodologies will be considered as scientific, and among them, some are close to hard science, while others belong to the realm of social science. Other approaches will be seen as art and thus are considered as not having to be tested. Furthermore, everyone has her/his own vision of what constitutes good future(s) related analysis, what should be done and used, what is valuable and what is not.

Despite these difficulties, it is still worth our while evaluating those future(s) related efforts, which had the courage to make an evaluation for the future located on a timeline. This is, of course, a very small exercise and experiment, compared with what is done by The Good Judgement project, led by Philip Tetlock, Barbara Mellers, and Don Moore with funding by the Intelligence Advanced Research Projects Activity (IARPA) and explained in this article by Dan Gardner and Philip Tetlock, “Overcoming our aversion to acknowledging our ignorance.” Nevertheless, hopefully, it will also bring interesting results, and the reflection it imposes, the questions it brings are in themselves a very constructive practice.

For this experiment, the sample used is constituted of open source predictions for the year 2012 posted on the web from December 2011 to January 2012, as presented here.

The result of the evaluation, in a Google spreadsheet, can be downloaded here or viewed below. Explanations and discussion follow.

The sources used to evaluate the foresight are given in the seventh column (except when the answer is obvious or common knowledge, and thus does not necessitate reference to a specific source e.g. the European Union still exists).

The variety of format and methodologies, furthermore more or less explained, was a first challenge. How to evaluate consistently “predictions” delivered in ways as varied as classical analysis (e.g. The Financial Times – Beyond BRICS), scenarios (e.g. Tick by Tick Team), risks (e.g. CFR – Preventive Priorities Survey: 2012) or predictions mixed with policy recommendations and advocacy, what could be seen as a version of normative foresight (e.g. Foreign Policy with the International Crisis Group – 10 conficts to watch in 2012)?

The Council on Foreign Relations, risk and making likelihood explicit

The Council on Foreign Relations’ approach is a perfect example of this hurdle. Its risk list for 2012 was particularly difficult to evaluate considering the way it is formulated and the lack of information regarding the methodology (those challenges have been removed or to the least improved with the 2013 version, where we find more detailed explanations and where likelihood and impact are separated). To find out what the CFR exactly meant I had to turn to a companion article to the risk list published in the Atlantic, “Gauging Top Global Threats in 2012“. There we read:

“The contingencies that were introduced for the first time or elevated in terms of their relative importance and likelihood in 2012 included an intensification of the eurozone crisis, acute political instability in Saudi Arabia that threatens global oil supplies, and heightened unrest in Bahrain that spurs further military action.”

A contingency means “an event (as an emergency) that may but is not certain to occur”.

Thus we can deduce that the CFR saw all the events (their “risks”) listed as possible for the year 2012 – if not probable. This is on this basis that the evaluation was made. Hence, all the CFR “risk-statements” were mentally expanded as follows: “a mass casualty attack on the U.S. homeland or on a treaty ally” means, for evaluation, “a mass casualty attack on the U.S. homeland or on a treaty ally” in 2012 is possible and would have a major impact for US National Interest (according to the tier to which the risk belongs) or “a major military incident with China involving U.S. or allied forces” means ”a major military incident with China involving U.S. or allied forces” is possible in 2012 and would have a major impact etc.

Making the “risk-statements” more explicit for evaluation (however not transforming the statement itself) immediately underlines how the fusion of likelihood and impact existing most commonly in the idea of risk (until the concept itself was revised by the new ISO31000: 2009 norm) creates supplementary difficulties in terms of evaluation, hence my personal reluctance to use the concept, despite its fashionable character. What are we to judge: a likelihood? an estimation of impact? a timing? As already mentioned, the CFR Preventive Priorities Survey tackled indeed this problem and now (2013) gives detailed results in terms of impact and likelihood.

This underlines how crucial it would be, ideally, to always include, for all results of future(s)-analysis an estimation of likelihood, as done, for example, in the Intelligence Assessments (see p.14 of the ICA on Global Water Security).

In our sample, each prediction, or series thereof, corresponds to one or another methodology. Yet, rather than trying to standardize thoughts, for example transforming what the authors wanted to write in a sentence easy to evaluate, I chose to keep the text as it was, breaking it down in various paragraphs most of the time, sometimes expanding it mentally as explained above for the CFR, and in agreement with their methodology, but not altering it, and to evaluate it as such. The exercise was constructive in itself and led to interesting points. We shall see with the next post if the results will also say something about the foresight methodology itself.

When the text was far too removed from something that looked like a judgement on the future, for example when it was only an opinion on what was happening, or when it was a 50/50 possibility, I excluded the sentence or paragraph from the sample (in red in the spreadsheet).

Scenarios, timing and content

As I started the concrete phase of evaluating statements with the fictionalized scenario made by Tick by Tick Team (Finance), it very quickly became clear that I had to make two types of assessment: one regarding the plausibility and logic of the content of the prediction itself, the other the accuracy of the timing. Indeed, some of the predictions made still sounded plausible, had not happened in 2012 but could not be ruled out for the short to medium term, e.g. “Greece leaves the Euro, returns to the Drachma.” (3002 – this number corresponds to the identification given in the database, to facilitate reference). To me there is a large difference with a prediction that is plainly wrong in terms of content and thus impossible in terms of timing: e.g. Syria deals with the “initial post-Assad stages” (2011) or “Obama decides not to run for elections” (3011).

Furthermore, this approach will allow me to test a hunch according to which we are in general much better to explain phenomena than to time them, would it be only because we hardly ever work on timing (outside the hard science realm).

Evaluating content and timing: a difficult, uncertain, never-ending task?

Thus, columns 4 and 5 display marks for content and timing, ranging from 0 (completely wrong) to 1 (completely accurate). There is, however, a major hurdle with this approach. First, by judging the content in terms of plausibility of dynamics, I evaluate one understanding (the author’s) against another (mine). There is little we can do about it as this is the core of research and debates in social science, besides giving evidence (column 7, the sources), developing a coherent argument and/or pointing out flaws in the argument subjected to the evaluation. A commissioned report would need to be more detailed and specified than I could be in the framework of a volunteered experiment.

foresight, prediction, evaluationSecond, it implies that by evaluating the plausibility of something happening in the future, then I am myself making a judgement on the future, thus a prediction. Ultimately, those challenges should be resolved through the happenstance of events and facts, which suggests that evaluations should themselves be reviewed and followed in time. This is certainly not ideal, but still better than to lose the information on timing and content, which would happen if one chose with black and white, true or false, 0 and 1 answers.

Objectivity (as much as biases allow) of the person assessing the predictions is crucial, and the use of teams that would discuss and confront their analyses would be best. Furthermore, the latter would also allow overcoming plain lack of knowledge on one issue or another.

This leads us to a last challenge that is not easily overcome for some predictions: the information that is available to the person doing the evaluation. Still using the CFR example, and more particularly the risk of ”a mass casualty attack on the U.S. homeland or on a treaty ally”, some actions taken throughout the year by authorities may have prevented a risk to materialize, thus the prediction could be seen as false. Certainly, had no intelligence, defense and diplomatic actions existed, then such risk would have materialised. Such state’s actions are ongoing, and, as an outsider, we can only estimate (without complete certainty) that it is because of them that the threat did not materialize, not because the risk was incorrectly identified. An evaluation made by an insider with access to all classified documents would be made with more confidence. Here, I could only estimate the reality of the risk to the best of my understanding and knowledge, for example with the use of counterfactuals.

Should all those challenges, the existence of uncertainty even in evaluation, lead us to conclude that trying to evaluate foresight products is useless? My first answer, at this stage, is no because all the questions one asks or should ask oneself and that are forced by the evaluation are crucial and may only lead to better methodologies and thus to better judgements on the future. It is thus a gage of quality. We shall see next what the results of the assessment, keeping in mind all their imperfections, may tell us.

—–

“Gauging Top Global Threats in 2012″ - Interviewee: Micah Zenko, Fellow for Conflict Prevention, 
Interviewer: Robert McMahon, Editor, 
December 8, 2011, The Atlantic.

Useful Rules for Foresight from Taleb’s The Black Swan

This second post on The Black Swan: the impact of the highly improbable by Nassim Nicholas Taleb emphasises some of the author’s points that could be useful to foresight and warning and all “predictive work. Many of those themes are not really new, but already integrated in F&W and, more broadly, analysis. Nonetheless, it is always useful to underline them, as it is so easy to forget best practice. (The first post can be accessed here).

Humility

humility, doubt(Notably pp.190-200) Considering uncertainty, but also our imperfect condition of human beings, the complexity of the social world, feedbacks, our more than insufficient knowledge and understanding, we must be very humble, accept our partial ignorance, our imperfection and mistakes (and make sure those essentially human flaws are accepted by others, which may be more difficult). Yet, we must also struggle to improve ourselves, increase our understanding and our capability to foresee the future. Doubts, humility, real dialogue between those different communities which try to understand the world, reflection upon mistakes – to correct what can be identified as wrong or inefficient – and successes – to reproduce what worked (according to conditions) – are keys for this improvement.

Taleb’s use of and reference to Montaigne’s wisdom also points to the importance of struggling against the loss of memory - institutional, scientific and general – that plagues us. Some things that were understood in the past are now misconstrued or ignored. It would appear, sometimes, that we are part of a race where youth, novelty, fads and shortening of time rule as masters. Yet, shouldn’t we pause for a while and wonder about this behaviour, and its origin. Should we not question the results stemming from this new race forward? For example, in science (soft and hard), it is not because something has been understood, discovered or written decades or centuries ago that it has become wrong. On the contrary, good science starts with knowledge and understanding of past scientific discoveries. Some understandings are outdated, but some are not. Novelty and justness of analysis are not synonymous, while discarding all the past only makes us lose time. Consumerism cannot and should not be applied everywhere.

“Black Swans events” (unpredictability, outliers)

As underlined last week, Taleb makes a distinction between “Mandelbrotian Gray Swans” (rare but expected event that are scientifically tractable, pp. 37, 272-273) and real “Black Swans events,” which are never identified in advance. From that we could make the following “rules”:

  • Making swans gray

Try to imagine as many improbable events as possible, initially suspending disbelief. This is already done; methods, however tentative, exist: wild cards scenarios (e.g. James A. Dewar, “The Importance of “Wild Card” Scenarios”); brainstorming; what if stories and narratives; use of alternative thinkers and thinking.

innovative idea, dangerous idea, wild card, gray swan

The key, here, is imagination and allowing oneself to go beyond groupthink, norms (institutional, social, cultural), belief-systems, even if ideas may feel dangerous (read for example “In defense of dangerous ideas” by Steven Pinker, Harvard college professor, cognitive scientist, July 2007). Then, and as suggested and explained by Dewar, because resources are limited and also because even Swans have to follow a few rules, those potential “gray swans” should be examined in the light of all the other rules. The least likely (or the most absurd) should be discarded. For example, we may always assume that gravity on earth could disappear, or that lambs will become carnivorous, yet, the chances are so minimal that we may choose to dispense with these situations. For those events that remain on the gray swans list, potential impacts can be estimated and highly improbable-high impact scenarios developed.

  • The absence of certainty

Because we may assume that the likelihood of the existence of Black Swans is very high, then we must consider them. This will influence our estimation of probability. We may just forget certainty. This may look like nothing, but I suspect that in the world of security and politics where the issues of power and control – including in personal terms – are so crucial, truly accepting uncertainty and insecurity is a major effort.

Continuing the struggle against biases

(pp.1-164) Cognitive biases being a fatality of human beings, the least we can do is being aware of them, and persist in our struggle against them. Using our increasing knowledge and awareness of cognition, we may continue applying and creating specific training and systematically incorporate related safeguards in methodologies and processes, from organization (for example people joining the exercise at different stage) to teams-composition (people with different background, psychological makeup, etc.).

Meanwhile, introspection and reflection should be promoted by and for those who deal with foresight, forecast and prediction, as, exemplified (here on the question of ethics and potential biases induced by “conflict of interests”) by this very recent post by Jay Ulfelder. “Introspective phases” could and should be included at different stages of the foresight process.

Opium, Dutch East Indies, anachronistic projectionThose phases should notably fight against the known phenomena of anachronistic projections and cultural projections. Anachronistic projections are usually done with regard to past history (judging past actions from the point of view of today’s moral norms; understanding the past through today’s lenses), which obscures understanding. For example, we currently struggle against drug trafficking, notably because this endangers our societies and is seen as morally bad. Yet, opium has been an accepted state activity at least from the 19th century to well into the 20th century. This does not mean going backwards in terms of norms or accepting things that are seen as morally reprehensible or are damaging or hurtful, but would help locating phenomena in their proper context, and thus focusing on dynamics, processes and understanding. This would also (ideally) help us move from an attitude that favours judging and casting blame, with all the power struggles and violence that this implies, towards a much more constructive behaviour, promoting understanding, preventing and healing.

A political scientist* gives somewhere a great trick that we could usefully apply: if you read (we can change it to think/say) somewhere the word “always,” then stop and think.

Not being prey to anachronistic projections would imply considering too the evolution of ideas and norms and setting time-dependent struggles within historical processes. Coming back to foresight, anachronistic projections may as well be done regarding the future. What does this imply? Can we devise methods to try minimizing them? How can we best proceed to include ideas, norms, and beliefs in our models?

Cultural projections are even better known and may be easier to consider. Not falling prey to them will demand knowing our own cultural sets of norms on top of, or even prior to, those of others (e.g., in the anticipation field, Werther, 2008). Just asking ourselves this question during foresight exercises could improve results. Similarly we must struggle against being victims of ideology. This does not mean rejecting this or that belief, just being aware of what influences us.

Finally, the impact of emotions on our cognition, emotional biases, should be considered, as human beings are definitely not rational, emotionless beings.

Falsification rather than confirmation

(Notably pp.55-61) The risks of induction, which are so important to us because so many of our analyses come from collected evidence, are linked to our tendency to seek confirmation rather than falsification (looking for an element, a fact, an event that would prove our hypothesis or explanation wrong). All our analyses – this is valid for all our explanations and understanding, not only for foresight and warning – should include an effort at falsification, however without denying confirming facts. We should always wonder:  which evidence, fact, should I look for to disprove my theory, analysis, estimation, conjecture?

Furthermore, this effort should be mentioned in the final anticipatory product (potentially in an appendix, according the specificities demanded by the delivery needs of the customer) to allow for follow-up and update. Meanwhile, indicators, specifically designed for falsification, should be created. For example, in foresight, if we have concurrent scenarios, the happenstance of an event or any indication showing that one scenario is becoming less likely should be considered and the set of scenarios should be revised accordingly.

Careful causality: “silent evidence”

silence, National Security, silent evidence, causality

(pp.100-121) Taleb starts first by cautioning against the dangers of applying causality when there is none or when “silent evidence” could potentially distort causal reasoning. “Silent evidence” is what we don’t know, cannot know, cannot hear, do not hear. To explain “silent evidence”, Taleb uses Cicero’s story: “Diagoras, a nonbeliever in the Gods, was shown painted tablets bearing the portraits of some worshippers who prayed, then survived a subsequent shipwreck. The implication was that praying protects you from drowning. Diagoras asked, ‘Where were the pictures of those who prayed, then drowned?’” Taleb, however, does not reject causality but encourages to use it with care and caution, trying to think about the possibility of existence of “silent evidence”.

Creation of new adapted quantitative tools

(The whole book) Except in the cases when they can be applied, the author warns us to distrust correlations, which furthermore do not allow for understanding (according to the famous “correlation does not imply causation“), linear trends and Gaussian distribution. Instead, research involving fractals and scalability, complexity theory (as done for example by the New England Complex Systems Institute) should be favoured. Building upon this, we can imagine creating new tools allowing for multi-disciplinary research, articulating, when necessary (there is no need to use something very complicated if a simple analysis or logic are sufficient), complex modeling, agent-based models, and mixing quantitative and qualitative assessments (notably including processes and feedbacks), and integrating within foresight methodologies. What should guide us is always the issue or the problem at hand.

* Unfortunately I cannot remember who wrote this, nor in which book or article. Initially I thought it was Benedict Anderson in Imagined Communities but after having skimmed again through the book, I cannot locate the citation, thus it could be Anthony Smith, or Eric Hobsbawm, or… if anyone knows, I would welcome the reference!

————-

References

Dewar, James A. “The Importance of “Wild Card” Scenarios,” Discussion Paper, RAND.

Pinker, Steven, “In defense of dangerous ideas”, July 2007.

Ulfelder, Jay, “Advocascience“, 27 January 2013, Dart-Throwing Chimp.

Werther, Guntram F. A., Holistic Integrative Analysis of International Change: A Commentary on Teaching Emergent Futures, The Proteus Monograph Series, Volume 1, Issue 3, January 2008.

Taleb’s Black Swans: The End of Foresight?

Meteorids and Earth, Taleb, Black Swan Events, Black SwansSince Nassim Nicholas Taleb published his bestseller The Black Swan: the impact of the highly improbable back in 2007, “Black Swans” and “Black Swans events” have become part of everyday language. They are used as a catchphrase to mean two different things. First, as was the case recently in the Brookings interesting interactive “briefing book” Big Bets and Black Swans: Foreign Policy Challenges for President Obama’s Second Term, “black swans” represent high impact, low probability events, what is also known as wild cards.[i] Second, “black swans” refer to events that could absolutely not be predicted, as, for example for the Economist in ”The prediction games: Our winners and losers from last year’s edition”. Unfortunately, in this case, the label “black swans” excuses foresight errors. It tends to stop explanations and evaluation. Similarly, some will make statements along the line of “oh, but there is no point to do any foresight (or futures work or forecast), did you not read Taleb’s Black Swan? One cannot predict or foresee anything.”

This is a rather crucial assertion for us and it needs to be investigated.

What are exactly those Black Swans about which Taleb writes: they cannot be absolutely unpredictable and low probability events at the same time? Thus, what did Taleb really describe? After having read this book, should we just resign, abandon any work related to anticipation, and, instead, do something else? Or is there more to Taleb’s argument than that? Can we use this book to improve our foresight methodologies, consider deep uncertainty, yet without giving up?

This first post will review the whole book and see if it really points to the absurdity of foresight. The next one will outline some of Taleb’s points that could be more than useful to foresight, forecast, warning, and more generally all anticipation methodologies.

Black and Grey Swans

Taleb is interested in understanding better uncertainty and randomness, notably those “Black Swan events” defined as unpredictable (outliers), with an extreme impact and which are, after the fact, revised as explainable and predictable.

The existence of black swans is logical and even may be seen as obvious, but it does not imply that foresight is doomed, only that it cannot be 100% certain (for the anecdote, the ancient Chinese divination method, the Yi Ching, which used the tossing of yarrow stalks, always removed one stalk before the cast, to account for this unpredictability).

Black Swans and Pacific Black Ducks East Basin Lake Burley Griffin Canberra by Celcom, GFDL or CC-BY-SA-3.0, from Wikimedia CommonsIn a nutshell, The Black Swan denounces the problem and risks of induction, building upon David Hume and Karl Popper. Extremely briefly, an inductive reasoning runs as follows: all the swans observed are white, thus all swans are white… which is proven wrong when one black swan is spotted. Hence the danger of this reasoning, if it is not done cautiously. Incidentally, this is quite crucial for us as so many of our analyses come from collected evidences, and we should always keep the danger of induction in mind, but more on that with the next post.

The Black Swan attacks quantitative predictions made with a Gaussian distribution (or normal distribution or bell curve) when applied to a world that is increasingly complex, notably because of the evolution of social interactions, and includes unexpected events. Such a world should rather be understood through fractals, that would then allow us to anticipate those events that Taleb names “Mandelbrotian Gray Swans.” Here I trust the author because of the consistency and logic of his argument, while I don’t have the mathematical knowledge necessary to go more in-depth into fractals.

The Economist usage is thus correct, whilst Brookings foreign policy experts’ actually refer to ”Gray Swans.”

All doomed: forecasting, prediction, foresight, and also meaning

Taleb’s attack is directed at statistics and quantitative methodologies (correlations, trends, statistical forecasts, etc.), NOT qualitative ones, thus concerns only one part of “foresight.” Nonetheless, we are not safe yet, as Taleb also denounces a “narrative fallacy” and underlines various cognitive biases, which make us believe we can understand history or the present or try to anticipate the future, when, according to him, such endeavours are near impossible as everything is “ruled” by randomness or luck. For Taleb, most historical events with large impact are Black Swans, thus could not be predicted, not even with a low probability of success.

This brief summary would let us believe that, indeed, prediction, forecast, foresight, and even worse understanding and finally meaning are impossible human endeavours. Those stem from our human needs and cognitive make up. To believe in their possibility emerges from our lack of introspection and reflection.

This book is thus not only about anticipating the future, and more broadly the philosophy of science and epistemology but also about meaning. The specific narrative Taleb uses – intertwining personal experience and stories as examples, with logic, (sceptical) empiricism and references to philosophers, novelists and scientists – is part and parcel of the demonstration. Don’t even hope to skim through The Black Swan to understand it. It is here in the literary part of the book, in its very construction, that we find the key to Taleb’s work and to answering our question “is foresight doomed?”

The choice of hope over despair

The demonstration made by the author is extremely well done, very consistent, save for one single paragraph. Our very smart author could not be unaware that someone, somewhere, would tell him: “Hey, wait a minute, if you say that nothing can be predicted and understood, that something random and unexpected (the famous black swan) may always happen, then it may also happen that your argument will be proven wrong by something unexpected or something you do not know.”

Taleb had thought about that, but used something quite akin to a sophist argument (p.192-193, hardback edition): He started by explaining the “Black Swan asymmetry,” which allows you only to assert, for example that “all swans are not white” (nothing more but nothing less), stops his reasoning and then quotes Popper, who, asked about “a possible falsification of falsification,” answered that his questioner was an idiot. Conclusion, the reader does not dare anymore to question Taleb (or Popper or anyone for that matter) for fear of being an idiot. Well done, but unfair, and somehow a shame to use such a device because some new insight could have emerged. Actually, I don’t really care about being named an idiot, but as a reader, I would very much prefer, if I don’t understand, to be explained why, and if I do, to see the author pushing his argument further.

This does not challenges most of the points made in the book, but allows questioning its overall conclusion. Thus, foresight might still be possible, should the impossibility to understand the world, the “narrative fallacy” be questionable too.

If human cognitive biases are numerous, witness the amount of research and findings of cognitive sciences, and as wonderfully synthesised by Heuer, if indeed our knowledge and understanding of the world through historical, political and more broadly social science is most of the time imperfect and still minimal, Taleb here, despite his huge erudition, seems himself to be prey – as all human beings – to the problem of induction, to generalization and to seeking confirmation as validation. Only one properly foreseen event should be sufficient to make foresight and understanding not impossible.

I’ll give here an example (because it is simple, but we could quote many others, as Jay Ulfelder correct 2012 forecast of coups in Mali and Guinea-Bissau or all the correct Economist 2012 predictions etc.). In 1919, Max Weber explained and underlined the main characteristics of the State, including the crucial importance of the legitimate monopoly of violence.[ii] Armed with this knowledge and common sense, and referring to the second Iraq War, it comes that if one destroys the State, it is very highly likely that civil war will occur. It could thus have been (very easily) anticipated that if one destroyed the Iraqi State (through the Ba’ath party), then civil war would more than probably occur, something that was carefully avoided in Germany after WWII despite the “denazification goal.”[iii] This is NOT a reinterpretation of facts after the events, but could have been made beforehand. Why this was not done or rather not heard is another story, let’s not confuse issues. We have here at least one instance of a potentially accurate anticipation (actually many with the other examples). Thus foresight is not impossible.

Of course, something completely unexpected may always happen, forbidding certainty. Of course, our imperfect knowledge and understanding will only at best help us outlining possible futures. Yet, in the meantime, do we have the right to disregard an understanding that could save lives and protect our security (individual, national, and global)? Using our previous example, the knowledge on State and its processes could definitely be helpful in assessing needs and potential impacts before to decide about austerity policies and IMF backed structural adjustment policies as was done worldwide with disastrous consequences, before to enter post-conflict countries and determine strategies. It would be crucial in evaluating what is happening in countries such as those that made the Arab (winter)-spring, and the evolution of situations where the international community intervenes,  such as Afghanistan.[iv] Do we have the right not to work towards improving our understanding? We have to live with uncertainty but isn’t it our job? We have to live with incomplete mastery and imperfection, but isn’t it what it is to be just human?

despair, black swan, TalebWhat seems to me to lie at the core of the book and to lead to Taleb’s overall conclusion, is a very deep despair and a revolt when confronted to uncertainty, to the unexpected, to unfairness, to death and war and suffering (hence the importance of the autobiographical and literary part of the book). For the author, the world has become meaningless, and he finds hope only in the miracle of being alive and of building upon the opportunities of “positive Black Swans.”

Someone else could very well use similar valid points (the problem of induction, the risk of anachronistic projections, etc.), but also see and emphasize connections (even hidden ones, or different ones, for example the probably not yet completely understood fractals, where one can also marvel at the self-similarity property) rather than disconnections, meaning rather than meaninglessness and reach a different conclusion. From a physicist such as Omnes, you would get a completely different outlook on life. This is often the same with astrophysicists, who truly wonder (in both meanings of the word) about the world.

Berlin Cathedral (Germany). Altar area – Stained glass windows ( 1905 ) shwowing an angel with banner of victory als allegory of hope by Anton von Werner.Thus, ultimately, there might be an individual choice to be made on the way one approaches the world.

If we choose not to despair but, on the contrary, to wonder and hope and work hard, we may and even must continue our foresight and warning endeavour, assuming that a few rules are respected, those very rules that Taleb underlines (common sense, humility and using the right causality or the right tool for the right problem, etc.), as we shall see next.

—————–

[i]“A wild card is a future development or event with a relatively low probability of occurrence but a likely high impact on the conduct of business,” BIPE Conseil / Copenhagen Institute for Futures Studies / Institute for the Future: Wild Cards: A Multinational Perspective, (Institute for the Future, 1992); then popularised with John L. Petersen, Out of the Blue, Wild Cards and Other Big Surprises, (The Arlington Institute, 1997, 2nd ed. Lanham: Madison Books, 1999).

[ii] For those interested in comprehending the modern State and its processes, here are a few enlightening works by social scientists, among a more detailed biography:

Ertman, T., (1997), Birth of the Leviathan: Building States and Regimes in Medieval and Early Modern Europe, (Cambridge, Cambridge University Press).

Mann, M., (1986), The sources of social power. 1, A history of power from the beginning to A.D. 1760, (Cambridge: Cambridge University Press).

Tilly, C., (1990), Coercion, capital, and European states, AD 990-1990, (Oxford: Blackwell).

Weber, M. (1919) Le savant et le politique, (Paris : 10/18, 1963) Paru originalement en allemand «Wissenschaft als Beruf » & « Politik als Beruf » 1919.

[iii] Nina Serafino, Curt Tarnoff, and Dick K. Nanto, (Foreign Affairs, Defense, and Trade Division), U.S. Occupation Assistance: Iraq, Germany and Japan Compared, CRS Report for Congress, March 23, 2006,http://www.fas.org/sgp/crs/natsec/RL33331.pdf

[iv] For example, one may wonder, considering that the modern state is essentially a territorial state, if the necessary resources to build or rebuild a state should not be proportionate to territory. If this is correct and if the amount spent on Germany is a good indication, although approximate, then the amount needed for Afghanistan might have been closer to 76 billion USD rather than to the 14 billion pledged. The way to implement a reconstruction, as well as the time necessary to succeed, would most probably have to be revisited. Hélène Lavoix « La construction de la paix et l’estimation des besoins et de l’impact, » Policy paper, Centre d’Etudes et de Recherches Internationales (CERI – Sciences Po) programme on « Peace and Human Sécurity » (CERI-CPHS), April 2008.

The Red (team) Analysis Weekly No83, 17 January 2013

Towards a multiplication of increasingly fragile states? This is what could mean the report on the state of infrastructures in the U.S. (and probably other so called rich countries?). It is a crucial weak signal that could trump all others: imagine weak, increasingly fragile “rich countries” on the backdrop of all the other tensions and threats…

Click on image to read on Paper.li or scroll down to access current issue below.

horizon scanning, weak signal, national security

Scenarios: Improving the Impact of Foresight Thanks to Biases

Featured

Alternative worlds NICForeseeing the future, whatever the name given to the endeavour*, faces two major tasks. First, we have the analysis, the process according to which the foresight, forecast, or, more broadly, anticipation will be obtained. Second, the result must be delivered to and understood by those who need it because they will act on it, to the least integrate the new knowledge received in the decisions they will take**. A huge challenge runs across both those tasks: overcoming the various natural and constructed biases that limit human understanding.

Much thought is usually given to analytical methodologies, which may be seen as nothing else than ways to overcome biases. Analysts commit themselves to many years of study, and force themselves to struggle against those biases, including through their own research and reflections. Managers look for ways to support them through training and constitution of best teams. Teachers and research institutions contribute to this generalised effort, as with, for example, the recent ongoing experiment funded by the Intelligence Advanced Research Projects Activity (IARPA), the “Good Judgement project“. Those enterprises are necessary, even crucial, if we want to improve our foresight, as underlined, for example, by political scientist and forecaster Jay Ulfelder.

We tend, unfortunately, to devote fewer efforts to deal with biases related to the second part of our work, the delivery of the anticipation to and its understanding by the recipients or customers.

This is certainly no easy task, as, there, we must deal with an “other” or worse with others. We have no power on their willingness to make an effort to overcome biases, assuming they accept being also prey to biases.

Results may also be obtained through the use of participatory methodologies, such as, for example, scenarios-building, where classical or analytical ways to mitigate biases are sought. This approach, despite its virtue, is limited because of the often busy agenda of decision-makers, or plainly impossible because of the sheer numbers of recipients. In those cases, only remains the final product that must, alone, reach the customer, be read, viewed or listened to, and understood. The strategy regarding biases, thus, must change. Rather than only focusing on struggling against biases, we may as well accept them and, even better, use them to our advantage.

The biases detailed in Heuer’s masterwork Psychology of Intelligence Analysis show us that fictionalized scenario narratives*** are perfect products to take advantage of some of those usual human cognitive traits to achieve our objectives, even more so if they are adequately combined with visual tools.

Playing with the “vividness criterion”

Crisis in ZefraFictionalized narratives obviously directly use this bias that Heuer describes as follows: “Information that is vivid, concrete, and personal has a greater impact on our thinking than pallid, abstract information that may actually have substantially greater value as evidence.” Heuer, p.116, knowing that, according to Nisbett & Ross, vivid information is information that is concrete, imagery-provoking, and emotionally rich (1980).

Among many, one interesting example is the narrative written by Karl Schroeder for the Directorate of Land Strategic Concepts of National Defense Canada in 2005, Crisis in Zefra. The four fictionalized scenarios of Global Trends 2030, use fiction characters and real or fictional organizations that will be familiar to their main readers, U.S. policy-makers, and a type of narrative as well as a design format that will similarly correspond to something very concrete and real for their clients.

Below are two examples of short fictionalized pieces, created out of material generated during a workshop, and aiming at making real threats related to algorithms.

Black out scenario, fictional narrative, algorithm based threat, threat scenario, futures, foresight scenario, fictional narrative, algorithm based threat, threat scenario, futures, foresight, intelligence, spy,

Emphasizing “consistency”

Any good narrative will pay attention to consistency and thus will use the human “oversensitivity to the consistency (absence of contradiction) of evidence and insufficient sensitivity to the reliability of evidence.” (Heuer, pp.120-122)

Using our flawed perception of cause and effect

As Heuer describes throughout chapter 11, generally, story-telling and thus story coherence is usually wrongly favoured over scientific method and scientific findings/research. Meanwhile we display a need for causal explanations, that is indeed best served by this story-telling. Thus a fictionalized scenario narrative built upon a proper scientific model will allow us transforming scientific research into a product that can be attractive to and believed by customers.

This is what led me, among other motivations, and once the model built for the scenarios on the future of the nation-state, to develop the Chronicles of Everstate in a serialized way, rather than to adopt a more classical and shorter form.

Tweaking the “availability rule”

This rule refers to one of the components that leads us to reach flawed estimates for probabilities. Heuer, using work by Tversky and Kahneman (1973), underlines that “’Availability’ refers to imaginability or retrievability from memory. Psychologists have shown that two cues people use unconsciously in judging the probability of an event are the ease with which they can imagine relevant instances of the event and the number or frequency of such events that they can easily remember.” (Heuer p. 147).

Thus, an interesting narrative – when it is read – will most probably influence the ease with which people can imagine, by themselves, future instances of similar events. We could also wonder if becoming aware of the scenario narrative would affect, though memories, perception of occurrence of such events.

Heuer (p.149), indeed, underlines that the participation to scenario-building exercises impacts estimations of probabilities for participants. Here we suggest that people reading scenarios – or more broadly being exposed to products derived from those scenarios such as films, theatre pieces, games, etc. would similarly be affected.

Using our weakness in assessing probabilities

As judgments concerning the probability of a scenario are influenced by amount and nature of detail in the scenario in a way that is unrelated to the actual likelihood of the scenario (Heuer pp. 156-157), then narrating a scenario or part of it with details will impact the ability of the customer to believe in its plausibility. This should prove extremely useful in convincing recipients to pay attention to potentially least possible futures, in struggling against prejudice and more generally against all organizational and belief-based biases.

Fictionalized narratives are thus a very useful type of products for the delivery of foresight, that we should permanently keep in mind, be it to deliver the result of more or less long scenarios-building processes, as with The Millenium Project 2020 Global Energy Scenarios, or the Global Trends 2030 of the NIC, or in the case of short exercises as shown above for potential algorithms-related threats. Could it also be used for other methodologies, such as  Forecasting?

Comments, ideas and suggestions are welcome!

*The label used signals various assumptions, methodologies, processes, aims and groups of practitioners.

** Even doing nothing is an action.

*** Scenarios are one of the end products of the process known as scenario-building as, for example, presented by Glenn and The Futures Groups and as used here to assess the future of the nation-state. A practical way to write them has been presented with the post “Constructing a foresight scenario’s narrative with Ego Networks.”

—–

Glenn, Jerome C. and The Futures Group International, “Scenarios,” The Millennium Project: Futures Research Methodology, Version 3.0, Ed. Jerome C. Glenn and Theodore J. 2009, Ch 19.

Heuer, Richards J. Jr., Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence Agency, 1999.

Ulfelder, Jay, Forecasting Round-Up No. 3, Dart-Throwing Chimp, 6 Dec 2012.

Kahn, Herman, and Weiner, Anthony J. The Year 2000: A Framework for Speculation on the Next Thirty-Three Years. New York, NY: The Macmillan Co., 1967.

Tversky, Amos and Daniel Kahneman, “Availability: A Heuristic for Judging Frequency and Probability,” Cognitive Psychology, 5 (1973), pp. 207-232.

Durance, Philippe and Michel Godet, “Scenario building: Uses and abuses“, Technological Forecasting and Social Change », Volume 77, Issue 9, November 2010, Pages 1488–1492, doi:10.1016/j.techfore.2010.06.007