Data and Machine Learning for Creative Practices Final Project Report

Music can be extremely freeing for a performer as well as the listener, and it is widely regarded as a very expressive form of art that conveys emotion and story telling. Music has traditionally been performed using physical instruments with the use of precision dexterity of the fingers, hand and feet. It is only with the development of technology in the digital age that digital instruments became more and more prevalent within the art.


Motivation

One such artist who thrives on digital instruments and performance is Imogen Heap[1]. She innovated the MiMu[2] gloves, which delivers music through movement. I was particularly inspired by the MiMu and its ability to deliver such fluid music through expressive gestures and movements.

I wanted to adapt this idea to people with loss of motor or sensory function in their limbs to enable them to freely express their emotions and tell stories through sound. My objective for this project is to create an application that will allow users to generate synthetic, digital sounds with expressions using their face.

My expectations for the project is not to replicate the technicality and precision seen in MiMu, but rather to replicate the concept of generating music with expression and movement for people with loss of motor or sensory function in their limbs.


Implementation

At a glance, this project was designed so that a Webcam Facial Expression Detector programme called FaceOSC_Wekinator_14Inputs[3] would send 14 input signals to Wekinator[4], which in turn would train on this data and send 3 output values to an output synthesiser programme called Processing_FMSynth_3ContinuousOutputs[5]. The following diagram shows the project pipeline:

After some time experimenting with different Supervised Learning Model Algorithms I felt it would be most efficient and fitting to use the Linear Regression Model. This was because the three outputs were ported into a continuous synthesiser that required constant real time numerical data to change its values, and ultimately its sound. This model repeatedly gave me the most precise response and created an accurate connection between the movement of the facial features and the audio output. Fundamentally, the Linear Regression model allows me to train data at either end of the movement ranges and it will plot a line to predict all of the values in-between. However, I decided to add more training data in-between to ensure the model would accurately learn all the individual movements and gestures.

Initially the Webcam Facial Expression Detector provided 14 input features, which resulted in a messy output with lots of noise in the training data. I figured that the only features that were required to control the 3 outputs were the Left-Right Rotation, Up-Down Tilt and Mouth Height. I customised the input and output signal connections using Wekinator to ensure that the Left-Right Rotation motion, Up-Down Tilt motion and the Mouth Height were all connected to separate output features on a one-to-one mapping. This enables the user to have more control over the instrument as each input is now directly related to a single output.

Once the instrument was responsive, I decided to experiment to see if I could remove any more noise and to further smoothen the input values for a more flowing output sound. I experimented with WekiInputHelper, but soon noticed that both ‘Velocity’ and ‘Acceleration’ were not measures required for this application and that any values computed over a window resulted in a delay between the facial movement and the output sound, rendering the instrument unplayable. I decided to leave my input values untouched to obtain a more live instrument feel.

The three input values are sent to Wekinator, where it trains the model to alter and output 3 features: Modulator Frequency, Modulator Amplitude, and a Modulator Offset. The clip below is a short demonstration of the application:

Application Demonstration: https://www.youtube.com/watch?v=MYiJyl4pwKU

Evaluation

This project was a great opportunity for me to further delve into the world of Machine Learning and this process allowed me to explore the different Supervised Learning Models in a practical way.

The programme used for facial detection and tracking facial movements was not intuitive enough and often stopped tracking the face if rotated too far outside its ‘detection’ capabilities. It was not capable of tracking the smaller, more precise mouth movements and its value would often fluctuate between around 0 and 0.5 when the mouth was only slightly open. It may be important to note that there may have been some additional external conditions that deteriorated the precision of the input programme, such as lighting and facial hair however, if this project were to be further developed in the future, I would recommend researching more advanced facial tracking programmes that will take these variables into consideration.

The most challenging aspect of this project was the time needed to experiment with all the different Supervised Learning Model Algorithms and to decide which was the most efficient for my desired outcome. I have learnt a lot from this process, and am now capable of discerning between Supervised Learning Models depending on whether it is a problem solved by classification or regression.

As expected, I was unable to create an application with similar standards and build quality as the MiMu, but I believe this project has proven that it is possible for people with loss of motor or sensory function in their limbs to play music and express their emotions through music as an art form.

Further improvements for this project would be to use a more advanced input webcam programme, and build a more customisable output programme with more than 3 outputs. An interesting addition for the next iteration would be to take more input features, such as eyebrow movements, to detect expression and emotion. The model will then train on this data, and output different sounds and notes in different keys according to the input emotion. For example, a happy, smiling face could output a major key instrument, whereas a frowning, upset face could output a minor key instrument, allowing the user the ability to handle multiple instruments.


Documentation and Instructions

Make sure to download my submission files which will include all of the following files listed in the instructions below.

Dependencies:
– Wekintor
– Processing

  1. Run the Input Programme ‘FaceOSC_Wekinator_14Inputs’.
  2. Run the Output Programme ‘Processing_FMSynth_3ContinuousOutputs.pde’.
  3. Open Wekinator.
  4. Select ‘No, just exploring or playing’ and click ‘Done’.
  5. Ignore this new project (You are required to open this ‘new project’ just to open an already saved project)
  6. In the Wekinator menu bar, click ‘File’ then ‘Open Project’ and find the file called ‘WekinatorProject.wekproj’
  7. The Wekinator project will now be listening for messages from the input programme on port 6448, and sending messages to the output programme through port 12000.
  8. Make sure the input webcam programme is tracking your face.
  9. Turn the volume on your device up to an appropriate amount.
  10. On Wekinator, click ‘Run’.
  11. You should now be able to play music through your facial movements.

References

[1] Imogen Heap – https://en.wikipedia.org/wiki/Imogen_Heap
[2] MiMu – https://mimugloves.com/
[3] FaceOSC_Wekinator_14Inputs – http://www.wekinator.org/examples/
[4] Wekinator – http://www.wekinator.org/
[5] Processing_FMSynth_3ContinuousOutputs – http://www.wekinator.org/examples/

Introduce Yourself (Example Post)

This is an example post, originally published as part of Blogging University. Enroll in one of our ten programs, and start your blog right.

You’re going to publish a post today. Don’t worry about how your blog looks. Don’t worry if you haven’t given it a name yet, or you’re feeling overwhelmed. Just click the “New Post” button, and tell us why you’re here.

Why do this?

  • Because it gives new readers context. What are you about? Why should they read your blog?
  • Because it will help you focus you own ideas about your blog and what you’d like to do with it.

The post can be short or long, a personal intro to your life or a bloggy mission statement, a manifesto for the future or a simple outline of your the types of things you hope to publish.

To help you get started, here are a few questions:

  • Why are you blogging publicly, rather than keeping a personal journal?
  • What topics do you think you’ll write about?
  • Who would you love to connect with via your blog?
  • If you blog successfully throughout the next year, what would you hope to have accomplished?

You’re not locked into any of this; one of the wonderful things about blogs is how they constantly evolve as we learn, grow, and interact with one another — but it’s good to know where and why you started, and articulating your goals may just give you a few other post ideas.

Can’t think how to get started? Just write the first thing that pops into your head. Anne Lamott, author of a book on writing we love, says that you need to give yourself permission to write a “crappy first draft”. Anne makes a great point — just start writing, and worry about editing it later.

When you’re ready to publish, give your post three to five tags that describe your blog’s focus — writing, photography, fiction, parenting, food, cars, movies, sports, whatever. These tags will help others who care about your topics find you in the Reader. Make sure one of the tags is “zerotohero,” so other new bloggers can find you, too.

Design a site like this with WordPress.com
Get started