At the recent FBEC2021, Huawei’s VR/AR product line president Li Tengyue said that Huawei’s AR Engine has been installed 1.1 billion times, covering 106 models and over 2000 applications. The AR Engine service is an engine used to build augmented reality applications on Android. AR Engine provides basic AR capabilities such as motion tracking, environment tracking, human body and face tracking by integrating AR core algorithms.
With AR Engine, developers can provide your application with AR capabilities such as motion tracking, human body and face tracking, and environment tracking, helping applications integrate the virtual world with the real world, and create a new visual experience and interaction method. AR Engine uses these capabilities to better understand the real world and provide users with a new interactive experience integrating virtual and real:
The motion tracking capability of AR Engine mainly uses the camera of the terminal device to identify feature points and track the movement changes of these feature points to continuously track the position and posture of the terminal device.
The environment tracking capability of AR Engine can identify planes (such as ground, walls, etc.) and objects (such as cubes, rectangles, circles, etc.), and can also estimate the light intensity around the plane.
Continuously track the position and posture change trajectory of the device relative to the surrounding environment, establish a unified geometric space between the virtual digital world and the real physical world, and provide your application with an interactive basic platform for the integration of virtual and real. At present, motion tracking mainly includes the following capabilities: motion tracking, hit detection.
Continuously and steadily track the changes in the position and posture of the terminal device relative to the surrounding environment, while outputting the three-dimensional coordinate information of the surrounding environment characteristics.
AR Engine mainly identifies feature points through the camera of the terminal device and tracks the movement changes of these feature points. At the same time, the movement changes of these points are combined with the inertial sensors of the terminal device to continuously track the position and posture of the terminal device.
By aligning the pose of the device camera provided by AR Engine with the pose of the virtual camera that renders 3D content, you can render virtual objects from the observer’s perspective and superimpose them on the camera image to achieve virtual and real fusion.
The user can select points of interest in the real environment by tapping the screen of the terminal device.
AR Engine uses hit detection technology to map points of interest on the screen of the terminal device to points of interest in the real environment, and uses the point of interest as the source to send a ray connected to the location of the camera, and then return to the ray and plane (or feature point). Point of intersection. Hit detection capabilities allow you to interact with virtual objects.
Track the lighting, planes, images, objects, environmental surfaces and other environmental information around the device, and assist your application to realize that virtual objects are realistically integrated into the real physical world in a scene-based way. At present, environment tracking mainly includes the following capabilities: illumination estimation, plane detection, image tracking, environment Mesh, plane semantics, 3D cloud recognition, target semantics.
Track the lighting information around the device, support the estimation of the intensity of the ambient light.
AR Engine can track the light information around the device and calculate the average light intensity of the camera image. The light estimation capability allows virtual objects to be integrated into the real lighting environment and look more realistic.
Detect horizontal and vertical planes (such as floors or walls).
AR Engine can recognize clusters of feature points on horizontal and vertical planes (ground or wall), and can recognize the boundaries of the plane. Your application can use these planes to place the virtual objects you need.
Recognize and track the position and posture of 2D images.
AR Engine provides image recognition and tracking capabilities, detects whether there is an image provided by the user in the scene, and recognizes the posture of the output image.
With image recognition and tracking functions, you can achieve augmented reality based on images (posters or covers, etc.) in real-world scenes. You can provide a set of reference images. When these images appear within the camera’s field of view of the terminal device, AR Engine can track the images for your AR application in real time, enriching the scene understanding and interactive experience.
Real-time calculation and output of environmental mesh data in the current screen can be used to process virtual and real occlusion and other application scenarios.
AR Engine provides real-time output environment Mesh capability. The output content includes the pose of the terminal device in space, the three-dimensional grid under the current camera perspective, and the current model with a rear depth camera supports it, and the supported scanning environment is a static scene.
Through the environmental Mesh capabilities, you can place virtual objects on any reconstructable surface, and are no longer limited to horizontal and vertical surfaces. At the same time, you can use the reconstructed environment Mesh to achieve virtual and real occlusion and collision detection, so that the virtual character can accurately know the current surrounding three-dimensional space situation and help you achieve a better immersive AR experience.
Recognize the semantics of planes (such as desktops, floors, doors, etc.). AR Engine can recognize the semantics of the current plane, and currently it can recognize desktops, floors, walls, seats, ceilings, doors, windows, and beds.
3D cloud recognition
Recognize and track the position and posture of 3D objects.
AR Engine detects whether there are 3D objects configured by the user on the cloud side in the scene. When these objects appear in the camera field of view of the terminal device, the cloud side returns the recognition results to the terminal device in real time to realize the enhancement based on the 3D objects in the real world scene Reality.
Identify the label and shape of the object. AR Engine can recognize the tags and shapes of objects. It can currently recognize tables and chairs. The supported shapes include cubes, circles, and rectangles.
2D cloud recognition
Identify and track the data of cloud 2D images.
AR Engine detects whether there are 2D images configured by the user on the cloud side in the scene. When these images appear in the camera field of view of the terminal device, the cloud side returns the recognition result to the terminal device in real time, realizing the interaction based on the 2D image in the real world scene .
Human body and face tracking
Track real-time information such as faces, human bodies, and gestures to assist your application in realizing the ability of users to interact with virtual objects.
Recognize specific gestures and specific actions. AR Engine provides the recognition of a variety of specific gestures, outputs the recognized gesture category results and gives the palm detection frame screen coordinates, which can be supported by both left and right hands. When multiple hands appear in the image, only one hand (the clearest and highest confidence) recognition result and coordinate information are fed back. Support front and rear camera switching.
Through gesture recognition capabilities, virtual objects can be superimposed on the position of a person’s hand, and certain state switching can be activated according to different gesture changes, which can provide your AR application with basic interactive functions and add new gameplay.
Hand bone tracking
Recognize and track the position and posture of 21 hand joint points to form a hand skeleton model, and can distinguish left and right hands.
AR Engine provides single-hand joint point and bone recognition capabilities, and outputs advanced hand features such as finger endpoints and hand bones. When multiple hands appear in the image, only one hand (the clearest and highest confidence) recognition result and coordinate information are fed back. Currently only Mate 20 Pro and Mate 20 RS front depth cameras are supported.
With hand bone recognition capabilities, you can superimpose virtual objects on more precise hand positions, such as finger tips, palms, etc.; using hand bones, you can drive the virtual hand to make richer and more detailed movements. It can provide your AR application with enhanced interactive features and incredible novel gameplay.
Human body gesture recognition
Recognize specific human postures, currently supports 6 specific postures.
AR Engine provides single-person body posture recognition capabilities, recognizes six static body postures and outputs the recognition results, and supports front and rear camera switching.
You can use the human body gesture recognition ability in application scenarios that need to recognize actions and trigger certain events, such as interactive interface control, game operation action recognition and other trigger interactive behaviors. It is the basic core function of somatosensory applications and provides you with AR applications. Long-distance remote control and collaboration capabilities enrich the interactive experience of applications.
Human skeleton tracking
Recognize and track the 2D position of 23 body bone points (or the 3D position of 15 bone points), support single and double.
AR Engine provides single and double body joint point recognition capabilities. It supports the output of 2D bones (screen coordinate system) and 3D bones (space coordinate system combined with SLAM), and supports front and rear camera switching.
Through the human bone recognition ability, you can superimpose virtual objects on the designated parts of the human body, such as the left shoulder, right ankle, etc.; using human bones, you can drive the virtual doll to make richer and more detailed movements, which can give you AR The application provides a wide range of interactive functions and incredible novel gameplay.
Identify and track the area where the human body is in the current picture, and provide depth information of the area.
AR Engine provides the recognition and tracking capabilities of single or double body contours, and outputs mask information of human contours and corresponding bone point information in real time.
With the human contour tracking capability, you can use the mask information of the human contour to mask virtual objects and scenes. For example, changing the virtual background when taking photos in AR, allowing virtual dolls to hide behind people, etc., can use the Mask ability to achieve a more natural occlusion effect, which can further enhance the realism and viewing experience of AR applications.
Facial expression tracking
Real-time calculation of the pose of the face and the parameter value corresponding to each expression can be used to directly control the expression of the avatar.
AR Engine provides the ability to track facial expressions, track and obtain facial image information, understand the content of adult facial expressions in real time, and convert them into various expression parameters. The expression parameters can be used to control the expression of the avatar.
AR Engine provides 64 types of facial expressions, including facial expressions of the eyes, eyebrows, eyeballs, mouth, tongue and other major facial organs.
The pose and Mesh model data of the face are calculated in real time, and the Mesh model data follows the shape and movement of the face in real time.
AR Engine provides high-precision face Mesh modeling and tracking capabilities. After acquiring face image information, it builds a realistic Mesh model in real time. The Mesh model undergoes corresponding position and shape changes with the movement and deformation of the human face, achieving the effect of accurately capturing actions in real time.
AR Engine provides a Mesh with more than 4,000 vertices and more than 7,000 triangular faces, which can finely outline the contours of the face and enhance the experience.
Face health detection
Real-time calculation of face health information, and calculation of key human health information (such as heart rate, etc.).
AR Engine provides human health detection capabilities, including health information such as heart rate, respiration rate, facial health status, and heart rate waveform signals.
The advantages of AR Engine include:
Software and hardware optimization: Integrate modules, chips, algorithms, and HarmonyOS systems, and use hardware acceleration to provide enhanced reality capabilities with better effects and lower power consumption.
Differentiation capabilities: Based on the unique hardware of Huawei equipment, in addition to basic SLAM positioning and environment understanding capabilities, it also provides gesture and body recognition interaction capabilities.
Multi-device support: It supports the integration of HUAWEI AR Engine on many Huawei devices, and the download volume has exceeded 1.1 billion.