Beyond hype and hatred, this article focuses on the way Artificial Intelligence (AI) – actually Deep Learning – is integrated in reality, through sensor and actuator.* Operationalisation demands to develop a different way to look at AI. The resulting understanding allows highlighting the importance of sensor and actuator, the twin interface between AI and its environment. This interface is a potentially disruptive driver for AI.
Sensor and actuator, the forgotten elements
Sensor and actuator are key for the development of AI at all levels, including in terms of practical applications. Yet, when the expansion and the future of AI are addressed, these two elements are most of the time overlooked. This is notably because of this lack of attention that the interface may become disruptive. Indeed, could an approach through sensor and actuator for AI be key to the very generalised boom so many seek? Meanwhile, many subfields of AI could also benefit from such further development. Alternatively, failing to completely integrate this approach could lead to unnecessary hurdles, including temporary bust.
Sensor and actuator, another stake in the race for AI
Furthermore, we are seeing emerging three interacting AI-related dynamics in the world. The twin birth and spread of AI-governance for states and AI-management for private actors interact and feed into an international race for AI-power, i.e. how one ranks in the global relative distribution of power. As a result, AI increasingly influences this very distribution of power ( see The New AI-World in the Making). Thus, the drivers for AI are not only forces behind the expansion of AI, but also stakes in the AI-competition. Meanwhile, how public and private actors handle this competition, the resulting dynamics and entailed defeats and victories also shape the new AI-world in the making.
Thus, if sensor and actuator are crucial in widely operationalising AI, then the ability to best develop AI-governance and AI-management, as well as the position in the international race for AI-power, could also very well depend from the mastery of these sensor and actuator.
This article uses two case studies to progressively explain what sensor and actuator are. It thus details the twin interface between the AI-agent and its environment. As a result and third, we highlight that one understands best AI as a sequence. That understanding allows us envisioning a whole future world of economic activities. That world is however not without danger and we highlight that it will demand a new type of security. Finally, we shall point out the necessity to distinguish the types of reality the AI sequence bridges.
The next article will focus on different ways to handle the AI sequence and its twin interface, notably the actuator. We shall look more particularly at the Internet of Things (IoT), Human Beings themselves, and Autonomous Systems, better known as robots. Meanwhile we shall explore further the new activities AI creates.
Looking at the game against AlphaGo differently
We shall examine again (Google) DeepMind’s AlphaGo, the supervised learning/AI-agent that plays Go and which victory started the current AI phase of development.
Replaying the game against AlphaGo
Now, let us imagine a new game is set between Mr Fan Hui, the Go European Champion AlphaGo defeated by a 5-0 win in October 2015 and the AI-agent (AlphaGo webpage). Mr Fan Hui, as happened in reality, plays first against the AI-agent AlphaGo. In front of him, we can see a goban (the name of the board for the go). AlphaGo is connected to the cloud for access to distributed computing power, as it needs a lot of computing power.
Mr Fan Hui starts and makes its first move placing a white stone on the Goban. And then it is the turn of AlphaGo. How will the AI-agent answer? Will it make a typical move or something original? How quickly will it then play? The suspens is immense, and…
What went wrong?
The (right) way DeepMind did it
If you watch carefully the video below showing the original game, you will notice that, actually, the setting is not exactly what I described above. A couple of other crucial elements are present. If DeepMind had put a human and an AI-agent face to face according to my described setting, then their experiment would have gone wrong. Instead, thanks to the elements they added, their game was a success.
You can observe these three elements at 1:19 of the video, as shown in the annotated screenshot below:
- A: a human player
- B: a screen
- C: a human being with a bizarre device on a table.
In our imagined setting, I did not create an interface to tell the AI-agent that Mr Hui had moved a stone, and which one. Thus, as far as the AI agent was concerned there was no input.
In DeepMind’s real setting we have the human agent (C). We may surmise that the bizarre device on the table in front of her allows her to enter in the computer for the AI-agent the moves that Mr Fan Hui does throughout the game.
More generally, a first input interface must exist between the real world and the AI-agent to see it functioning. Therefore, we need sensors. They will sense the real world for the AI. We also need to communicate to the AI-agent the data the sensors captured, in a way that the AI understands.
Let us assume now that we add agent C and its device – i.e. the sensor system – to our setting.
Again, nothing happens.
Why? The AI-agent proceeds and decides about its move. Yet, the algorithmic result remains within the computer, as a machine output whatever its form. Indeed, there is no interface to act in the real world. What is needed is an actuator.
The interface to the outside world must not only produce an output that our Go Master can understand for each move, but also one that will make sense, for him, during the whole game.
It would not be enough to get just the position of a stone according to coordinates on the board. Such type of result would demand first that Mr Fan Hui has a good visualisation and mapping capability to translate these coordinates on the goban. It would demand, second, that our Go Champion has a truly very good memory. Indeed, after a couple of moves, being able to picture and remember the whole game would be challenging.
DeepMind actually used the needed actuators to make the game between human and AI possible.
At (B), we have a screen that displays the whole game. The screen also most probably shows the AI-agent move each time the latter plays. Then, at (A), we have a human agent, who translates the virtual game on screen in reality on the goban. To do so, he copies the move of the AI-agent as displayed on the screen by placing the corresponding stone on the board.
It is important to note the presence of this human being (A), even though it was probably not truly necessary for Mr Fan Hui, who could have played in front of the screen. First, it is a communication device to make the whole experiment more fully understandable and interesting for the audience. Then, it is possibly easier for Mr Fan Hui to play on a real goban. The translation from a virtual world to a real world is crucial. It is likely to be a major stake in what will really allow AI to emerge and develop.
As we exemplified above, specifying the process of interaction with an AI-agent, highlights the importance of twin interfaces.
This is actually how DeepMind conceptualised one of its latest AI achievement, to which we shall now turn.
Towards seeing as a human being
In June 2018, DeepMind explained how it had built an AI-agent that can perceive its surrounding very much as human beings do it (open access; S. M. Ali Eslami et al., “Neural scene representation and rendering“, Science 15 Jun 2018: Vol. 360, Issue 6394, pp. 1204-1210, DOI: 10.1126/science.aar6170).
“For example, when entering a room for the first time, you instantly recognise the items it contains and where they are positioned. If you see three legs of a table, you will infer that there is probably a fourth leg with the same shape and colour hidden from view. Even if you can’t see everything in the room, you’ll likely be able to sketch its layout, or imagine what it looks like from another perspective.” (“Neural scene representation and rendering“, DeepMind website).
The scientists’ aim was to create an AI-agent with the same capabilities as those of human beings, which they succeeded in doing:
DeepMind uses “sensor and actuator”
What is most interesting for our purpose is that what we described in the first part is exactly the way the scientists built their process and solved the problem of vision for an AI-agent.
They taught their AI-agent to take images from the outside world (in that case still a virtual world) – what we called the sensor system – then to convert it through a first deep learning algorithm – the representation network – into a result, an output – the scene representation. The output, at this stage, is meaningful to the AI-agent but not to us. The last step represents what we called the actuator. It is the conversion from an output meaningful to the AI to something meaningful to us, the “prediction”. For this, DeepMind developed a “generation network”, called a “neural renderer”. Indeed, in terms of 3D computer graphics, rendering is the process transforming calculation into an image, the render.
The screenshot below displays the process at work (I added the red circles and arrows to the original screenshot).
The following video demonstrates the whole dynamic:
Developing autonomous sensors for the vision of an AI-agent
In the words of DeepMind’s scientists, the development of the Generative Query Network (GQN) is an effort at creating “a framework within which machines learn to represent scenes using only their own sensors”. Indeed, current artificial vision systems usually use supervised learning. This means that human intervention is necessary to choose and label data. DeepMind’s scientist wanted to overcome as much as possible this type of human involvement.
The experiment here used a “synthetic” environment (Ibid., p5). The next step will need new datasets to allow expansion to “images of naturalistic scenes” (Ibid). Ultimately, we may imagine that the GQN will start with reality, captured by an optical device the AI controls. This implies that the GQN will need to integrate all advances in computer vision. Besides, the sensors of our AI-agent will also have to move through its environment to capture the observations it needs. This may be done, for example, through a network of mobile cameras, such as those being increasingly installed in cities. Drones, also controlled by AI, could possibly supplement the sensing network.
Improving visual actuators for an AI-agent
Researchers will also need to improve the actuator (Ibid.). DeepMind’s scientists suggest that advances in generative modeling capabilities, such as those made through generative adversarial networks (GAN) will allow moving towards “naturalistic scene rendering”.
Meanwhile, GANs could lead to important advances in terms, not only of visual expression, but also of “intelligence” of AI-agents.
When GANs train to represent visual outputs, they also seem to develop the capability to group, alone, similar objects linked by what researchers called “concepts” (Karen Hao, “A neural network can learn to organize the world it sees into concepts—just like we do“, MIT Technology Review, 10 January 2019). For example, the GAN could “group tree pixels with tree pixels and door pixels with door pixels regardless of how these objects changed color from photo to photo in the training set”… They would also “paint a Georgian-style door on a brick building with Georgian architecture, or a stone door on a Gothic building. It also refused to paint any doors on a piece of sky” (Ibid.) .
Similar dynamics are observed in the realm of language research.
Using a virtual robotic arm as actuator
In a related experiment, DeepMind’s researchers used a deep reinforcement network to control a virtual robotic arm instead of the initial generation network (Ali Eslami et al., Ibid., p.5). The GQN first trained to represent its observations. Then it trained to control the synthetic robotic arm.
In the future, we can imagine a real robotic arm will replace the synthetic one. The final actuator system” will thus become an interface between the virtual world and reality.
AI as a sequence between worlds
Let us now generalise our understanding of sensor and actuator, or interfaces for AI-input and AI-output.
Inserting AI in reality means looking at it as a sequence
We can understand processes involving AI-agents as the following sequence.
Environment -> sensing the environment (according to the task) ->
doing a task -> output of an AI-intelligible result ->expressing the result according to task and interacting actor
The emergence of new activities
This sequence, as well as the details on the GAN actuator for example, shows, that actually more than one AI-agent is needed if one wants to completely integrate AI in reality. Thus, the development of performing AI-agents will involve many teams and labs.
Envisioning the chain of production of the future
As a result, new types of economic activities and functions could emerge in the AI-field. One could have, notably, the assembly of the right operational sequence. Similarly, the initial design of the right architecture, across types of AI-agents and sub-fields could become a necessary activity.
To break down the AI integration in sequence allows us starting to understand the chain of production of the future. We can thus imagine the series of economic activities that can and will emerge. These will go far beyond the current emphasis on IT or consumer analytics, what most early adopters of AI appear to favour so far (Deloitte, “State of ArtificiaI Intelligence in the enterprise“, 2018).
The dizzying multiplication of possibilities
Furthermore, the customisation of the AI sequence could be tailored according to needs. One may imagine that various systems of actuators could be added to a sequence. For example a “scene representation” intelligible to the AI-agent to use our second case study could be expressed as a realistic visual render, as a narrative and as a robotic movement. We are here much closer to the way a sensory stimulation would trigger in us, human beings, a whole possible range of reactions. However, compared with the human world, if one adds the cloud, then the various expressions of the “scene representation” could be located anywhere on earth and in space, according to available communication infrastructure.
The possibilities and combinations entailed are amazing and dizzying. And we shall look in the next articles at the incredible possibilities which are being created.
Towards the need to redefine security?
Altering our very reality
In terms of dangers, if we come to rely only or mainly on a world that is sensed, understood, then expressed by an AI sequence, then we also open the door to an alteration of our reality that could be done more easily than if we were using our own senses. For example, if one relies on a sequence of AI agents to recognise and perceive the external world miles away from where we are located, then an unintentional problem or a malicious intent could imply that we receive wrong visual representations of reality. A tree could be set where there is no tree. As a result, a self-driving car, trying to avoid it, could get out of the road. The behaviour of the users of this very expression of reality will make sense in the AI-world. It will however be erratic outside it.
Actors could create decoys in a way that has never been thought about before. Imagine Operation Fortitude, the operation though which the allies deceived the Nazis during World War II regarding the location of the 1944 invasion, organised with the power of multiple AI-sequences.
Actually, it is our very reality, as we are used to see it expressed through photographs, that may become altered in a way that cannot be directly grasped by our visual senses.
Breaking the world-wide-web?
Here we also need to consider the spread of propaganda and of what is now called “Fake News”, and most importantly of of the “Fake Internet” as Max Read masterly explained in “How Much of the Internet Is Fake? Turns Out, a Lot of It, Actually” (Intelligencer, 26 December 2018). Assuming the spread of “Fake Everything” signals established widespread malicious intention, then adding to it the power of AI-agents could break the world-wide-web. The impacts would be immense. To avoid such a disaster, actors will have to devise very strong regulations and to favour and spread new norms.
Artificial Intelligence completely redefines the way security can be breached and thus must be defended.
Integrating AI-agents according to different realities: Virtual-Virtual and Virtual-Material
From virtual to virtual
When the AI-agent’s environment and the other actors are virtual, then the sequence is – to a point – easier to build. Indeed everything takes place in a world of a unique nature.
However, fear and need to know will most probably imply that human beings will want control at various points of the sequence. Thus, ways to translate the virtual world into something at least perceptible by humans are likely to be introduced. This will enhance the complexity of development.
From virtual to material
When the environment is real and when interactions take place between an AI-agent and human beings, the sequence becomes much more complex. The twin interfaces must indeed become bridges between two different types of world, the digital and the real.
Actually, if we look through these lenses to the deep learning ecosystem and its evolution since 2015, researchers devoted a large part of their initial efforts to create AI-agents able to “do a task” (playing, sorting, labelling, etc.). Meanwhile, scientists have developed ways first to make the real world intelligible to AI-agents. In the meantime, the actuator-systems developed become intelligible to humans but they remain nonetheless mostly virtual.
Lagging behind in expressing the virtual world in the real one – Visual AI-agents
For example, the real world is translated into digital photographs, which the AI-agent through deep learning algorithms recognises. The AI will sort them or label them in a way that human beings understand. For instance, human beings easily understand words, or images displayed on a screen, which are the result of the actuator part of the sequence. Yet, this output remains virtual. If we want to improve further, then we must create and use other devices to enhance or ease the interface from virtual to real. Object recognition proceeds in a similar way.
In terms of visual AI-related efforts, we may wonder if we have not progressed more in giving vision to AI-agents than in using this vision in a way that is useful enough to human beings in the real world.
From virtual to real, sensing more advanced than expressing?
A similar process is at work in China with sound recognition (Joseph Hincks, “China Is Creating a Database of Its Citizens’ Voices to Boost its Surveillance Capability: Report“; Time, 23 October 2017). Data analytics are also a way to explain to AI-agents what internet users are, according to various criteria. Sensors collecting data for example from pipelines (e.g. (Maria S. Araujo and Daniel S. Davila, “Machine learning improves oil and gas monitoring“, 9 June 2017, Talking IoT in Energy’;,Jo Øvstaas, “Big data and machine learning for prediction of corrosion in pipelines“, 12 Jun 2017, DNV GL) or from the flight of an aircraft, or from anything actually, are ways to make the world intelligible to an algorithm with a specific design.
Yet, have we made similar progress into the development of actuators that interface between the virtual world of the AI-agent and the reality of human beings? Alternatively, could it be that we did improve the whole sequence but that progresses remain limited to the virtual world? In all cases what are the impacts in terms of security, politics and geopolitics?
This is what we shall see next, looking more particularly at the Internet of Things, Robots and Human Beings, as potential actuator systems of AI.
*Initially, I used the word “expressor” instead of the adequate word, “actuator”. Thanks to Teeteekay Ciar for his help in finding out.
About the author: Dr Helene Lavoix, PhD Lond (International Relations), is the Director of The Red (Team) Analysis Society. Strategic foresight and warning for national and international security issues is her specialisation. Her current focus is on the future Artificial Intelligence and Quantum world and its security.