Well, it's been a while since I wrote the last time. Meanwhile I moved (my new apartment is awful) and now nearly everything is tidied up here. Nevertheless, I also did some development work ;)
So first I am glad to mention that a new version of KEmu will soon be officially be released. This new version rewrites a lot of the existing code, making it (hopefully) more stable and the GUI more usable and clear to the user. However, some parts are not yet ported, so you have to wait a bit more.
Additionally, I also had a project at university: The digital pen&paper project.
The Problem
Take some form, e.g. when you are at the doctor. Whatever is written on them basically has to be converted (until now by hand) manually to the computer, meaning: Someone spends a lot of time reading the form and writing the contents once again in a computer program. This could easily be solved by using e.g. tablet computers, however, these are not yet accepted by a broader mass, thus writing in a form which is printed on a sheet of paper will remain the default for some time. The digital pen&paper approach could help here: While one can use the pen (and the accompanying receiver unit) like a normal pen, the unit later can be connected to a computer to have the handwriting data in a digital form, which can be used by a handwriting recognition software to convert it to text, which then can be stored in a database (or where ever needed in the actual use case).
Now, there remains one problem: The files you receive from the unit connected to the pen are like writings on a transparent you use with e.g. overhead projectors: Imagine you have a sheet with a form you put on the projector. Then, you put another sheet on top of the form sheet and write on it. As long as the two sheets remain in that position, the resulting projection looks like what you expect (a form with filled fields). However, if you remove the form...
This basic problem extends to the following cases (there might be more, but these are the main ones we considered):
- What, if either the form sheet or the receiver unit are misplaced (e.g. rotated by some degrees)?
- What, if you have multiple forms (or one form, that spread over several pages of paper)?
These problems show one thing: Having a representation of the bare form and the digital writing (also called "Digital Ink") is not sufficient to automatically convert the information hold by the writing to a computer representation that can be stored in a structured system.
The Solution
We considered one solution: We not only used the digital ink gathered from the device, but also a scanned image from the filled out form. Before you wonder: The image of the form is (standalone) not sufficient to gather all the information. So while it is perfectly possible to recognize the form elements printed on the sheet and even to separate the writing from the background, you currently cannot convert these information automatically to text. This is due to the fact, that handwriting recognition requires not only the actual shape of an object to convert it to text (or a single character) but it needs information about how that shape has been created, i.e. the separate strokes. These cannot be reconstructed from the scanned image.
So our approach is: From a set of ink files and scanned images, find the pairs that belong together. Next, find a projection (i.e. calculate a matrix that is applied to the ink data) such that you once again place the transparent with the form and the transparent on which you've written correctly (concerning our example with the overhead projector).
If you are interested, please have a look at http://www.gitorious.org/kpki2010. There you'll find what we developed over the last months. To give you a short overview:
General Structure
The project consists of several (standalone) programs. Each program does one step in the processing chain. This design is pretty useful: First, it allowed us to separate work. Additionally, every part of the chain thus can be replaced by another program. So rewriting a program or using an existing alternative is no problem. The sub-projects we worked on are:
imagefilter
This program is a command-line image filter (well, as the name suggests...) It's used to create a version of the scanned image that contains only the text data (as black strokes) and background. However, it is pretty generic; internally, it provides several filters. Each filter is a plugin (available as a separate library), thus, the program can be extended and used also in other contexts (not only for our project).
The program uses an interesting approach: It can be seen as a "scripting language for image processing". Each filter can be seen as a command. There are commands for file interaction (i.e. loading and saving), and commands that actually do something with the images in memory. This allows to use several filters consecutively to create interesting effects or generate multiple output images from the same source file in one program call.
As I already said: In our toolchain, the image filter program is used for preparing the scanned images, but it has the potential to be used in other situations as well ;)
img2roi
This tool is used to create a XML representation of the filtered images and rendered versions of the ink files. "roi" is short for region of interest, in our case this is a consecutive region in the image/inkfile (i.e. a letter or number or a whole word, if it is handwritten and there is no space between two letters).
Each region is written to the resulting XML file together with some "meta" information (bounding box/circle, center of gravity, ...). These values can later be used to compare two such XML files (where usually you compare a XML generated from a scanned image and one generated from a rendered ink file).
roimatcher
This tool is used to calculate a matrix which maps all regions from one input region file to the other. In our processing chain, it maps the ink files' regions to the regions in the scanned files. Additionally, it creates a "difference" value, that determines how much the two input region files differ from each other. This value is required to find the correct ink/scanned file pairs. Later more on this.
inktransformer
This tool is used to apply transformations to Ink files (or better: Ink XML files, an intermediate format, as the original Ink format is a proprietary, binary one :\ )
Summing it all up: formextract
This program is a "meta" program: As input, it takes a list of ink files and scanned images. It then uses the other programs to filter and process these files. After calling roimatcher on each ink/scanned file pair, it runs an optimization algorithm (namely simulated annealing) to find an optimal list of ink/scanned pairs. Then it continues on calling the other programs to generate transformed ink files and some XML files that contain the read texts from the ink files.
So, did it work?
Basically: Yes. The results we archived until now are satisfying. When the scanned images are of sufficient quality (i.e. scanned with enough DPI and not spilled over and over with coffee) the transformed ink files look pretty good and the found ink/scanned pairs are mostly correct, too.
Nevertheless; there's room for improvements. So the imagefilter part could use some recognition of figures as well for it's form recognition: Currently, it searches for the surroundings of the form by looking for something red (the color we used for the forms). However, we basically know how the form delimiters look like (dashes and plus signs), so the filter could check whether what it found looks like such a sign and if not, continue).
Additionally, image filtering is currently rather slow. Some improvements would not harm in this area.
Can I test it myself?
Unfortunately: No. While the programs we wrote are all released under the terms of the GNU GPL v.3, the program chain currently requires two additional programs (namely InkManager and InkRecognizer) that we do not own. If you are interested in testing and if you want to re-write this missing parts, just e-mail me (or have a look in the sources to learn how the missing parts are assumed to work).
=-=-=-=-=
Powered by Blogilo
