An animatronic figure that can answer questions about itself and its mechanics.
The user speaks into the microphone and the program on the computer streams the audio to IBM’s speech-to-text for processing. Once silence is detected, the cloud returns the user’s speech as a string of text and displays it on-screen. This string is then sent to IBM’s Watson Assistant to be processed against an array of intents and dialog nodes. The assistant generates a response and sends back its response as a response string. This response string is forwarded to IBM’s text-to-speech engine and returns a .WAV audio file.
Project Overview #2
The audio response is then streamed through the offline acoustic model before it is played through the computer’s speakers. The model listens to the audio and generates a mouth shape based on what it hears. It then sends that mouth shape data over USB/serial to the micro-controller on the back of the animatronics’ head. The micro-controller maps this value to the following servos: jaw, upper right lip, upper left lip, lower right lip, and the lower left lip. This mapping is done using the servos’ endpoints and a three-point mapping function (minimum, maximum, and default positions). The jaw and lip values are shown on-screen as sliders. Although the micro-controller is a single-core processor, all the functions/systems including playback from a motion data file, to the automated blinking, to the mouth shapes, were written as asynchronous functions to maintain the serial connection with the computer. There are no delays in any of the looping processes, all code checks against the system clock: millis() until their “delay” time is up. All the dialog and questions were created using the Watson Assistant tool.
Sam was modeled after Kismet, a socially aware animatronic project built at MIT by Dr. Cynthia Breazeal. We designed Sam with similar facial attributes and expression range, albeit in a more human-like form. We were experimenting with an animatronic that could not only communicate verbally but also visually for maximum emotional conveyance. She can smile and laugh, roll her eyes, frown, and get mad—all by her facial expressions. This opened up possibilities to explore such as helping children with social awareness issues: i.e. individuals on the autism spectrum.
IBM Speech to Text with microphone (C#)
IBM Assistant dialog setup and integration (C#)
IBM Text to Speech and streaming audio (C#)
Acoustic modelling and mouth data streaming using “SALSA” and “Uduino” (C++/C#)
Program interface using Unity3D (C#)
Asynchronous servo motion data playback system (C++)
Potentiometer noise reduction using arrays and sorting (C++)
Servo motion data recording system (C++)
Servo ramping system (C++)
Three-point servo endpoint mapping (C++)
Serial connection auto-reconnect using “Uduino” (C++/C#)
State/intent management on the microcontroller (C++)
Asynchronous auto-blinking every 2-6 seconds (C++)
SD card playback/streaming asynchronously (C++)
The neck mechanism is based on a neck design by Custom Entertainment Solutions. This mechanism allows the head to have pitch and yaw movements and, most importantly, in its default resting position uses almost no power. The bottom servo allows the whole figure to spin with its bearing.
Lips, Mouth, and Jaw
The jaw pivots in the back and the servo mounted on the underside of the top faceplate controls the mechanism with a connecting rod. The servos on the ends of the lips move and down to create different mouth shapes also known as visemes. For example, the “AY” and “EE” sounds are created by moving the lips away from each other and the “OH” sound is created by moving them closer.
The plastic eyes were purchased from eBay by searching for “doll eyes.”
Both eyes are mounted and held in place with a u-joint with fish swivels that hook onto the eyeballs’ edges to control them. One servo pitches the eyes up and down and a second servo pivots the eyes left and right. To control both eyes simultaneously, a free-spinning servo horn is embedded in the face plate with a bearing to freely spin.
For the eyelids, we 3D printed out a design from Thingiverse and added a ball connector to the side. Each eyelid has one servo with a connecting rod that closes and opens the eyelid. The eyelids were important because it adds realism with blinking, squinting, giving Sam a tired look, or conversely, a surprised look.
The eyebrows are simply servos with 3D printed brows that spin. Those are the simplest parts of the entire figure yet require the most maintenance in terms of redefining limits due to loosening.
All of our current and future animatronics use Weigl products with Conductor Pro. That being said, VSA is the best for beginners. Not only because of it’s price tag, but also due to the simplicity of its interface.