Human-machine interaction now and in the future

Technologies and computer systems are assuming important tasks in everyday life and industry – visibly or behind the scenes. Sensors and interfaces allow them to be operated. But how do users and computers communicate with and respond to each other? Machines can be controlled by touch, voice, gestures or virtual reality (VR) glasses.

We have long got used to interaction between human and machine: A smartphone user asks the digital assistant what the weather’s going to be like and it replies. At home, the human voice controls smart thermostats or commands Amazon’s intelligent speaker Echo to play “Summer of ‘69.” A few gestures on the smartphone’s touch screen are enough to view photos from Kenya and enlarge individual pictures. Chatbots conduct automatic dialogs with customers in messengers. Engineers in industry use VR glasses to enable them to walk through planned factory buildings. For all that to be possible, you need human-machine interaction (HMI) that works.


What is human-machine interaction?

HMI is all about how people and automated systems interact and communicate with each other. That has long ceased to be confined to just traditional machines in industry and now also relates to computers, digital systems or devices for the Internet of Things (IoT). More and more devices are connected and automatically carry out tasks. Operating all of these machines, systems and devices needs to be intuitive and must not place excessive demands on users.

How does human-machine interaction work?

Smooth communication between people and machines requires interfaces: The place where or action by which a user engages with the machine. Simple examples are light switches or the pedals and steering wheel in a car: An action is triggered when you flick a switch, turn the steering wheel or step on a pedal. However, a system can also be controlled by text being keyed in, a mouse, touch screens, voice or gestures.

The devices are either controlled directly: Users touch the smartphone’s screen or issue a verbal command. Or the systems automatically identify what people want: Traffic lights change color on their own when a vehicle drives over the inductive loop in the road’s surface. Other technologies are not so much there to control devices, but rather to complement our sensory organs. One example of that is virtual reality glasses. There are also digital assistants: Chatbots, for instance, reply automatically to requests from customers and keep on learning.

Different cultures prefer different manners: While formal information is expected in Japan, friendliness is particularly important to users in the US. In Germany, a chatbot may sometimes answer curtly.

Chatbots and digital assistants

Artificial intelligence and chatbots in human-machine interaction

Eliza, the first chatbot, was invented in the 1960s, but soon ran up against its limitations: It couldn’t answer follow-up questions. That’s different now. Today’s chatbots “work” in customer service and give written or spoken information on departure times or services, for example. To do that, they respond to keywords, examine the user’s input and reply on the basis of preprogramed rules and routines. Modern chatbots work with artificial intelligence. Digital assistants like Amazon’s Alexa and Google Home or Google Assistant are also chatbots.

Personal organizer, reserving tickets for events and shopping online: Users see advantages in chatbots in a variety of services.

They all learn from the requests and thus expand their repertoire on their own, without direct intervention by a human. They can remember earlier conversations, make connections and expand their vocabulary. Google’s voice assistant can deduce queries from their context with the aid of artificial intelligence, for example. The more chatbots understand and the better they respond, the closer we come to communication that resembles a conversation between two people. Big data also plays a role here: If more information is available to the bots, they can respond in a more specific way and give more appropriate replies.

Chatbots and digital assistants will grow in importance moving ahead. The market research company IHS predicts a growth rate of 46 percent in the coming years solely for assistants, such as Amazon’s smart speaker Echo.

The path to refined voice control

Users control systems such as Alexa, Google Assistant, Google Home or Microsoft’s Cortana with their voice. They no longer have to touch a display – all they need to do is say the codeword that activates the assistant (e.g. “Alexa”) and then, for example “Turn the volume down” or “Reduce the temperature in the bedroom.” That’s less effort for users – and more intuitive. “The human voice is the new interface,” prophesied Microsoft’s CEO Satya Nadella back in 2014.

Yet voice recognition is still not perfect. The assistants do not understand every request because of disturbance from background noise. In addition, they’re often not able to distinguish between a human voice and a TV, for example. The voice recognition error rate in 2013 was 23 percent, according to the U.S. Consumer Technology Association (CTA). In 2016, Microsoft’s researchers brought that down to below six percent for the first time. But that’s still not enough.

Infineon intends to significantly improve voice control together with the British semiconductor manufacturer XMOS. The company supplies voice processing modules for devices in the Internet of Things. A new solution presented by Infineon and XMOS at the beginning of 2017 uses smart microphones. It enables assistants to pinpoint the human voice in the midst of other noises: A combination of XENSIV™ radar and silicon microphone sensors from Infineon identifies the position and the distance of the speaker from the microphones, with far field voice processing technology from XMOS being used to capture speech.

“We can suppress background noise even better – and significantly enhance voice recognition and the accuracy rate as a result,” states Andreas Urschitz, President of the Power Management & Multimarket Division at Infineon. Voice control is thus “taken to a new level.” Urschitz believes that operating smart TVs by voice, for example, will grow in importance in the future. The market could grow to 60 million sets with built-in voice control by 2022 – a five-fold increase.

In addition to making daily life easier in general, users are also hoping for less wait when calling hotlines thanks to the growing use of voice assistants.

Andreas Urschitz sees major changes happening to smart household appliances, as well. Robovacs, for instance, are now operated from a touch screen. But that’s impractical, because users have to go to the appliances to stop them. “I assume that such appliances will work using voice-controlled systems in the future.” Yet even operation of them by voice commands is an intermediate step in his view. In the long term, we will control devices by means of gestures. A hand signal will then be enough to stop the robot. “But that’s only the next step,” says Urschitz. “The first is for us to work with XMOS to make voice control more efficient.”

The path to gesture control

Gesture control has a number of advantages over touch screens: Users don’t have to touch the device, for example, and can thus issue commands from a distance. Gesture control is an alternative to voice control, not least in the public sphere. After all, speaking with your smart wearable on the subway might be unpleasant for some and provoke unwanted attention. Gesture control also opens up the third dimension, away from two-dimensional user interfaces.

Google and Infineon have developed a new type of gesture control by the name of “Soli”. They use radar technology for this: Infineon’s radar chip can receive waves reflected from the user’s finger. That means if someone moves their hand, it’s registered by the chip. Google algorithms then process these signals. That even works in the dark, remotely or with dirty fingers. The same uniform hand movements apply to all Soli devices. The Soli chip can be used in all possible devices, such as loudspeakers or smart watches. “Mature algorithms that trace patterns of movement and touch, as well as tiny, highly integrated radar chips, can enable a large range of applications,” says Andreas Urschitz. This technology could dispense with the need for all buttons and switches in the future.

Augmented, virtual and mixed reality

Modern human-machine interaction has long been more than just moving a lever or pressing a button. Technologies that augment reality can also be an interface between human and machine.

One example of this is virtual reality (VR) glasses. They immerse users in an artificially created 3D world, allowing them to experience computer games and 360-degree videos as if they were in the thick of the action. An experiment with kindergarten children demonstrates, for example, that the experience is remembered as such and not as a VR simulation. However, that’s also useful for professional applications: Planning data for machines, systems or factories can be made tangible in virtual reality, for example. With some VR glasses, the smartphone is inserted into the holder and used as a display. Sensors in the mobile phone or glasses detect wearers’ head movements so that they can look around in the virtual world.

With augmented reality (AR) glasses, the real environment remains in the user’s field of vision, although additional, virtual elements are also projected into it. The smartphone game Pokémon Go proved how successful this mix of both elements can be. Different figures are shown, depending on where the user moves the display.

Mixed reality (MR) glasses like Microsoft HoloLens even go a step further, linking virtual with augmented reality. HoloLens is an independent computer that can position 3D objects precisely in the real space. The glasses are controlled by gestures and voice commands. Mixed reality glasses can present scenarios realistically thanks to their high resolution.

Virtual, augmented and mixed reality are not only used for fun and games, but also in Industry 4.0. Apps for Microsoft HoloLens enable virtual training courses for technicians, for example. The Fraunhofer Institute for Factory Operation and Automation (IFF) lets out its mixed reality lab Elbedome to companies. They can use six laser projectors to show machines, factories or entire cities on a 360-degree surface, giving developers or customers the impression of standing right in the planned factory.

A three-dimensional, high-quality depiction of the environment is supplied by the 3D image sensor chip REAL3™ from Infineon. It is fitted in mobile devices, such as some smartphones from Asus and Lenovo. The time-of-flight principle is used here: The image sensor chip measures the time an infrared light signal needs to travel from the camera to the object and back. The devices thus enable direct access to augmented reality: They detect a change in position by means of motion tracking, while the distances of objects are measured by depth perception. Spatial learning ensures the devices recognize places they have already captured.

Opportunities and challenges

Even complex systems will become easier to use thanks to modern human-machine interaction. To enable that, machines will adapt more and more to human habits and needs. Virtual reality, augmented reality and mixed reality will also allow them to be controlled remotely. As a result, humans expand their realm of experience and field of action.

Machines will also keep on getting better at interpreting signals in future – and that’s also necessary: The fully autonomous car must respond correctly to hand signals from a police officer at an intersection. Robots used in care must likewise be able to “assess” the needs of people who are unable to express these themselves.

The more complex the contribution made by machines is, the more important it is to have efficient communication between them and users. Does the technology also understand the command as it was meant? If not, there’s the risk of misunderstandings – and the system won’t work as it should. The upshot: A machine produces parts that don’t fit, for example, or the connected car strays off the road.

People, with their abilities and limitations, must always be taken into account in the development of interfaces and sensors. Operating a machine must not be overly complex or require too much familiarization. Smooth communication between human and machine also needs the shortest possible response time between command and action, otherwise users won’t perceive the interaction as being natural.

One potential risk arises from the fact that machines are highly dependent on sensors to be controlled or respond automatically. If hackers have access to the data, they obtain details of the user’s actions and interests. Some critics also fear that even learning machines might act autonomously and subjugate people. One question that has also not been clarified so far is who is liable for accidents caused by errors in human-machine interaction, and who is responsible for them.


Where is human-machine interaction headed?

Human-machine interaction is far from reaching the end of the line with voice and gesture control and virtual, augmented and mixed reality. In future, more and more data from different sensors will be combined to capture and control complex processes as well (sensor fusion).

At the same time, there will be fewer of the input devices that are customary at present, such as remote controllers, computer keyboards or ON/OFF switches. If computer systems, devices and machines keep on learning and obtain access to more data, they will also become more and more like humans: They can then take over the tasks of sensory organs. A camera will allow them to see, a microphone will let them hear, and clothing fitted with sensors will convey touch. Infineon is working to replicate the human senses increasingly better with the aid of sensors, as Andreas Urschitz explains: “A gas sensor will be able to ‘smell’, a sensor can interpret air pressure, a 3D camera enhances the ‘eyesight’ of a device.”

Machines will analyze what is going on around them with the aid of sensors. The result is completely new forms of interaction. Urschitz names one example: The mobile phone with a gas sensor “smells” a burger being grilled nearby. The digital assistant then recommends taking a look at the menu because a certain burger is currently on offer. At the same time, devices can also interpret and respond to the user’s body language thanks to perception-oriented sensors.

Machines will become smarter and smarter thanks to artificial intelligence. In machine learning, computers deduce findings from data on their own. That’s already possible today, as evidenced by digital assistants like Amazon’s Alexa. Yet if the technology is able to process more and more data in a shorter time, the ability of machines to “think” on their own increases.

What types of human-machine interaction are there?

What types of human-machine interaction are there?

  • Switches, levers, steering wheels and buttons were the main elements used to control machines before the advent of information technology.
  • A new means of operation was added with the invention of the keyboard: Text input in command lines gave an instruction to the system.
  • The mouse permitted a means of graphic control for the first time. It made it possible to click on certain fields on a screen and thus activate them.
  • We are now in the age of the touch screen: People use their fingers to perform actions directly on the device.
  • Multi-touch input is a first step toward gesture control. You spread two fingers to enlarge something on a display.
  • In wearables, body sensors automatically collect data, analyze it and supply information to the user.
  • At the same time, voice control continues to evolve. Digital assistants such as Amazon Alexa, Microsoft Cortana or Google Home carry out commands when the user issues them.
  • More intuitive means of operation are available with gesture control: You simply make a gesture in the air in order to switch on the TV.

Further topics