NZSM Online

Get TurboNote+ desktop sticky notes

Interclue makes your browsing smarter, faster, more informative

SciTech Daily Review

Webcentre Ltd: Web solutions, Smart software, Quality graphics

Feature

Playing Musical Computers

Reading musical notation is an art computers find tricky.

David Bainbridge and Tim Bell

Imagine a computer system that could "read" printed music -- you could listen to a piece of printed music without any training in musical notation; a clarinetist could scan a tune and have it transposed automatically; a soloist could have the computer play an accompaniment for rehearsal; a musicologist could feed in a complex piece and have it analysed automatically or stored in a database for later searching; and an editor could read in an old edition and touch it up using a music publishing program.

Personal computers can be teamed up with scanners, allowing direct conversion of the printed word from optical to electronic form. Optical character recognition (OCR) technology has developed to the extent that one can expect to scan a typed or typeset page and have it recognised with very few errors. Optical music recognition (OMR) is the extension of this idea to pages of printed music.

At present musical information is usually entered into a computer using a combination of typing on the computer's keyboard, clicking with a mouse, and playing the music on an electronic keyboard. All of these methods are time consuming, particularly for entering details such as dynamics, articulation, and lyrics. If the music is already available in printed form. then OMR could accelerate this process significantly.

Commercial programs for OMR of printed music are only just beginning to appear, but there is still much research to be done to improve their performance. The ultimate goal is the reliable recognition of arbitrary quality music, including handwritten music.

Not As Easy As It Looks

OMR might seem a simple extension of OCR -- conventional music notation has a limited number of symbols and a structured framework in the form of the five-line stave. However, in practice, OMR is considerably more difficult.

One of the key differences from OCR is that objects on the page are superimposed on each other, so it is difficult to isolate features automatically. An OCR system matches characters against "templates" that represent the shapes of letters. This technique is not feasible for music because there are no isolated shapes to match against a template.

Most OMR systems use the following steps, once the original image has been scanned:

  • correct distortions in the page, such as rotating a mis-aligned page
  • find and (usually) remove the stave lines
  • recognise the objects on the page
  • interpret the musical significance of the objects

The first step is necessary because it is very difficult to place music squarely on a scanner. It may even be curled at the edge if it is from a tightly bound book.

The staff lines can be found by counting the amount of black in horizontal cross-sections of the page. If the amount of black exceeds a threshold (such as 50%) then it is very likely that it is a staff line. Removing the lines takes some care. They can't simply be replaced with white, as this would wipe out parts of objects that overlap the lines. Usually the system will go along a line, changing black pixels to white so long as there is nothing immediately above or below the staff at that point. With some fine tuning it is possible to remove most of the lines without damaging the remaining features too much.

There are a number of ways of musically classifying the located objects. This is generally done by measuring the similarity of an object to templates representing common musical objects. A simple measure is the similarity in size of the two objects -- some features are much larger than others. More accuracy is obtained by matching the object against simple geometric shapes such as noteheads, stems and beams. These form primitive sub-components of the object, and can be used to deduce what the shape is.

The musical significance of the objects is then interpreted. For example, a sharp sign needs to be associated with a note, or it might be part of a key signature. A dot might be a staccato mark above a note, or it may belong to a note to its left, in which case it alters that note's duration. Other examples include associating lyrics with notes, the numbers in a time signature, and the notes in a chord. In many systems these associations are achieved by "hard-wired" rules in a program. However, as one contemplates reading music, it becomes apparent that there are many of these rules about relationships between objects. One of the major areas of research at Canterbury is to develop a language for describing these rules so that it can be specified and modified readily as new musical features are encountered.

A Musical Challenge

The OMR researcher faces several other challenges that do not appear in ordinary character recognition. One difference is that musical notation frequently uses different shapes for the same object. For example, a tie may link two adjacent notes, or it might cross several bars, or even the end of the line. Another challenging object is the beam, which can be at any angle, may cover multiple notes, and may have ascending or descending stems.

Character recognition systems must be able to recognise different fonts, and may even need to cope with languages that work from top to bottom instead of left to right. However, there is an even greater diversity in music notation. As well as the wealth of symbols used in common music notation, there are several related systems that are widely used, such as the "tablature" notation widely used by guitarists, single-line staves used by percussionists, and four-line staves used for medival music.

When it comes to recognising music in practice there are even more challenges that an OMR system must cope with, due to the quality of the original image.

Publishers and printers frequently depart from "accepted practice", producing music with objects that are not placed where they should be. Sometimes distinct objects touch, which means that a template-matching system may need to match parts of isolated objects, rather than the whole object. This "bleeding" can also be caused by the scanning process. Conversely, if the image is too light then individual objects may be broken up. In either case, extra computation is required to match templates against variations on the connected objects found on the page.

Optical music recognition is just coming of age because of the growth of computational power, the increasing availability of scanners, and the popularity of computers with many musicians. It is likely to have a significant impact on the music publishing industry because of its potential to accelerate the very tedious task of getting music onto a computer, although it is a long way from putting any performers out of work.

Tim Bell and David Bainbridge are both with the Department of Computer Science at the University of Canterbury.

David Bainbridge is with the Department of Computer Science at the University of Canterbury.
Tim Bell is with the Department of Computer Science at the University of Canterbury.