Interest in HoloLens, the mixed reality device from Microsoft, and digital reality in general, is growing rapidly. A large part of that interest comes from developers wanting to know how to build software for the device. And guess what: It isn't that difficult at all. With some basic C# knowledge and a free copy of Unity 3D you can get started in very little time. The first step is to get comfortable with the five main pillars of mixed reality development. First of all there are the three key input forms.

Gaze

Gaze works much in the same way as you use a mouse cursor. With HoloLens your head movements moves the cursor of the device, or your gaze. When you gaze at objects, you can interact with them and have them react to your gaze. It is up to the developer to manage the interaction with the gaze input, and decide when an action is appropriate. The framework will provide you with the tools to handle the interaction, but the actual decision of when to enable it is up to you and your design. 

A basic code block to handle gaze input could look like the this

x
 
1
using UnityEngine;
2
3
public class WorldCursor : MonoBehaviour
4
{
5
    private MeshRenderer meshRenderer;
6
7
    // Use this for initialization
8
    void Start()
9
    {
10
        // Grab the mesh renderer that's on the same object as this script.
11
        meshRenderer = this.gameObject.GetComponentInChildren<MeshRenderer>();
12
    }
13
14
    // Update is called once per frame
15
    void Update()
16
    {
17
        // Do a raycast into the world based on the user's
18
        // head position and orientation.
19
        var headPosition = Camera.main.transform.position;
20
        var gazeDirection = Camera.main.transform.forward;
21
22
        RaycastHit hitInfo;
23
24
        if (Physics.Raycast(headPosition, gazeDirection, out hitInfo))
25
        {
26
            // If the raycast hit a hologram...
27
            // Display the cursor mesh.
28
            meshRenderer.enabled = true;
29
30
            // Move thecursor to the point where the raycast hit.
31
            this.transform.position = hitInfo.point;
32
33
            // Rotate the cursor to hug the surface of the hologram.
34
            this.transform.rotation = Quaternion.FromToRotation(Vector3.up, hitInfo.normal);
35
        }
36
        else
37
        {
38
            // If the raycast did not hit a hologram, hide the cursor mesh.
39
            meshRenderer.enabled = false;
40
        }
41
    }
42
}

In this case the RayCast object is what the gaze is using to detect if any object is in the gaze direction. If the RayCast is true (so we are looking at something), then show the cursor on that object. If we aren't looking at anything (RayCast returns false), then hide the cursor. 

Gaze can also be detected, so you can have objects that react to being looked at. This can be really useful, if there are hundreds or even thousands of objects in the mixed reality experience, in order to highlight which one the cursor is pointing, or gazing, at. 

Gestures

The second form of input is what most people associate with HoloLens, namely gestures. If you have seen any HoloLens demo you would have seen the "tap" gesture that is prevalent for the experience. This gesture forms the base for all other built in gestures, except for one (the "bloom"). As a developer you can handle all of these gestures easily, as they are exposed by the framework. These gestures are tap, hold, manipulate and navigate. As the HoloLens detects the gestures, it raises events for you to handle. An example in code could look like this.

62
 
1
using UnityEngine;
2
using UnityEngine.VR.WSA.Input;
3
4
public class GazeGestureManager : MonoBehaviour
5
{
6
    public static GazeGestureManager Instance { get; private set; }
7
8
    // Represents the hologram that is currently being gazed at.
9
    public GameObject FocusedObject { get; private set; }
10
11
    GestureRecognizer recognizer;
12
13
    // Use this for initialization
14
    void Start()
15
    {
16
        Instance = this;
17
18
        // Set up a GestureRecognizer to detect Select gestures.
19
        recognizer = new GestureRecognizer();
20
        recognizer.TappedEvent += (source, tapCount, ray) =>
21
        {
22
            // Send an OnSelect message to the focused object and its ancestors.
23
            if (FocusedObject != null)
24
            {
25
                FocusedObject.SendMessageUpwards("OnSelect");
26
            }
27
        };
28
        recognizer.StartCapturingGestures();
29
    }
30
31
    // Update is called once per frame
32
    void Update()
33
    {
34
        // Figure out which hologram is focused this frame.
35
        GameObject oldFocusObject = FocusedObject;
36
37
        // Do a raycast into the world based on the user's
38
        // head position and orientation.
39
        var headPosition = Camera.main.transform.position;
40
        var gazeDirection = Camera.main.transform.forward;
41
42
        RaycastHit hitInfo;
43
        if (Physics.Raycast(headPosition, gazeDirection, out hitInfo))
44
        {
45
            // If the raycast hit a hologram, use that as the focused object.
46
            FocusedObject = hitInfo.collider.gameObject;
47
        }
48
        else
49
        {
50
            // If the raycast did not hit a hologram, clear the focused object.
51
            FocusedObject = null;
52
        }
53
54
        // If the focused object changed this frame,
55
        // start detecting fresh gestures again.
56
        if (FocusedObject != oldFocusObject)
57
        {
58
            recognizer.CancelGestures();
59
            recognizer.StartCapturingGestures();
60
        }
61
    }
62
}

In this example we create a GestureRecognizer object, which is handles any gestures you want to subscribe to. In this case, the TappedEvent is being handled by sending a message to any child objects in the Unity hierarchy. These objects can then respond to the message as appropriate and react to the gesture in real time. In the above code snippet, we check every frame in the Update() function if the user is gazing at anything, which they can then tap op.

Voice

The last of the three input methods for HoloLens is voice. Using voice commands is very similar to handling gestures, in a code sense. The HoloLens is extremely good at recognizing voice and words, and this makes the implementation very simple.

46
 
1
using System.Collections.Generic;
2
using System.Linq;
3
using UnityEngine;
4
using UnityEngine.Windows.Speech;
5
6
public class SpeechManager : MonoBehaviour
7
{
8
    KeywordRecognizer keywordRecognizer = null;
9
    Dictionary<string, System.Action> keywords = new Dictionary<string, System.Action>();
10
11
    // Use this for initialization
12
    void Start()
13
    {
14
        keywords.Add("Reset world", () =>
15
        {
16
            // Call the OnReset method on every descendant object.
17
            this.BroadcastMessage("OnReset");
18
        });
19
20
        keywords.Add("Drop Sphere", () =>
21
        {
22
            var focusObject = GazeGestureManager.Instance.FocusedObject;
23
            if (focusObject != null)
24
            {
25
                // Call the OnDrop method on just the focused object.
26
                focusObject.SendMessage("OnDrop");
27
            }
28
        });
29
30
        // Tell the KeywordRecognizer about our keywords.
31
        keywordRecognizer = new KeywordRecognizer(keywords.Keys.ToArray());
32
33
        // Register a callback for the KeywordRecognizer and start recognizing!
34
        keywordRecognizer.OnPhraseRecognized += KeywordRecognizer_OnPhraseRecognized;
35
        keywordRecognizer.Start();
36
    }
37
38
    private void KeywordRecognizer_OnPhraseRecognized(PhraseRecognizedEventArgs args)
39
    {
40
        System.Action keywordAction;
41
        if (keywords.TryGetValue(args.text, out keywordAction))
42
        {
43
            keywordAction.Invoke();
44
        }
45
    }
46
}

You use a KeywordRecognizer, which is a very similar structure to the GestureRecognizer. You add a list of keywords the Recognizer will listen for, and then the KeywordRecognizer_OnPhraseRecognized event will be triggered whenever a keyword is recognized. In this case a function is passed with the keyword which is then invoked.

The easy part of implementing voice commands is the code, the hard part is the design of the commands. Creating effective voice commands starts long before the coding, when you decide what the commands are, how they relate to each other and how you educate your users on what they are. In particular, pay attention to these points:

  • What actions can be taken through speech?
  • Is speech input a good option for completing a task?
  • How does a user know when speech input is available?
  • Is the app always listening?
  • What phrases initiate an action or behavior?
  • What is the interaction dialog between app and user?
  • Is network connectivity required?

If you give all of these points proper thought, your voice commands are much more likely to be effective and useful.

Audio

We have gone through the gaze-gesture-voice paradigm of input for the HoloLens development experience, and although audio isn't strictly speaking an input method, it does play a critical part in a successful mixed reality experience. HoloLens uses head related transfer function to mimic a binaural audio source. This means audio can be spatialized (and should in most cases) so that it appears as realistic as possible. Digital 3D assets will have natural audio, giving the whole experience much more immersive and genuine feel. 

68
 
1
using UnityEngine;
2
3
public class SphereSounds : MonoBehaviour
4
{
5
    AudioSource audioSource = null;
6
    AudioClip impactClip = null;
7
    AudioClip rollingClip = null;
8
9
    bool rolling = false;
10
11
    void Start()
12
    {
13
        // Add an AudioSource component and set up some defaults
14
        audioSource = gameObject.AddComponent<AudioSource>();
15
        audioSource.playOnAwake = false;
16
        audioSource.spatialize = true;
17
        audioSource.spatialBlend = 1.0f;
18
        audioSource.dopplerLevel = 0.0f;
19
        audioSource.rolloffMode = AudioRolloffMode.Logarithmic;
20
        audioSource.maxDistance = 20f;
21
22
        // Load the Sphere sounds from the Resources folder
23
        impactClip = Resources.Load<AudioClip>("Impact");
24
        rollingClip = Resources.Load<AudioClip>("Rolling");
25
    }
26
27
    // Occurs when this object starts colliding with another object
28
    void OnCollisionEnter(Collision collision)
29
    {
30
        // Play an impact sound if the sphere impacts strongly enough.
31
        if (collision.relativeVelocity.magnitude >= 0.1f)
32
        {
33
            audioSource.clip = impactClip;
34
            audioSource.Play();
35
        }
36
    }
37
38
    // Occurs each frame that this object continues to collide with another object
39
    void OnCollisionStay(Collision collision)
40
    {
41
        Rigidbody rigid = this.gameObject.GetComponent<Rigidbody>();
42
43
        // Play a rolling sound if the sphere is rolling fast enough.
44
        if (!rolling && rigid.velocity.magnitude >= 0.01f)
45
        {
46
            rolling = true;
47
            audioSource.clip = rollingClip;
48
            audioSource.Play();
49
        }
50
        // Stop the rolling sound if rolling slows down.
51
        else if (rolling && rigid.velocity.magnitude < 0.01f)
52
        {
53
            rolling = false;
54
            audioSource.Stop();
55
        }
56
    }
57
58
    // Occurs when this object stops colliding with another object
59
    void OnCollisionExit(Collision collision)
60
    {
61
        // Stop the rolling sound if the object falls off and stops colliding.
62
        if (rolling)
63
        {
64
            rolling = false;
65
            audioSource.Stop();
66
        }
67
    }
68
}

You register and import the audio bites in Unity, but then you can use the code above with relative ease. You set your audio clip properties, and then handle the system events for collisions, OnCollisionEnterOnCollisionStay and OnCollisionExit. It is that simple.

Just like with voice commands, the easy part is the code, the hard part is the audio design. There are four parts of audio design to get right to achieve great spatial audio for a HoloLens app:

  • Grounding: Just like real objects, you want to be able to hear holograms even when you can't see them, and you want to be able to locate them anywhere around you.
  • User Attention: When you want to direct your user's gaze to a particular place, rather than using an arrow to point them visually, placing a sound in that location is a very natural and fast way to guide them.
  • Immersion: When objects move or collide, we usually hear those interactions between materials. Spatialized sound make up the "feel" of a place beyond what we can see.
  • Interaction Design: When we press a button in the real world, the sound we hear comes from that button. By spatializing interaction sounds, we again provide a more natural and realistic user experience. 

Spatial Mapping

The HoloLens uses four environmental cameras on the front of the device to map the physical surroundings and build up a 3D model of the real world. As you use the device, it continually creates a spatial mapping model of the space you are in and updates any existing mapping. The really cool thing is that the developer portal that comes with the developer tools allows you to see this 3D spatial mapping in real time, and it works for both physical HoloLens devices as well as the emulator.

Make sure your device is on the same physical network as your computer, then access the developer portal by entering your device’s IP address in any browser. In the developer portal, go to the 3D View menu option on the left, which will bring up an empty plane. Press the Update button and the current state of the 3D mapping model is drawn on the plane as seen below.

 

Spatial mapping is what makes or breaks the HoloLens and the mixed reality experience you are creating. 

Conclusion

Although this is a very short intro to HoloLens development, it gives you an idea of just how approachable the development paradigm is. If you are already a C# developer, you will only have to get familiar with Unity 3D, which can be daunting I must admit. However, the reward is instant, and the best bit is that you don't event need a physical HoloLens device to get started. Included in the free tooling is an emulator that is as good as it gets. It gives a great quick impression of what your experience is going to be like, before you deploy to the real world and real devices.


All code samples are from official Microsoft examples and can be found on their HoloLens Developer Portal