Blogs

Digital Pen&Paper: A Summary

Well, it's been a while since I wrote the last time. Meanwhile I moved (my new apartment is awful) and now nearly everything is tidied up here. Nevertheless, I also did some development work ;)

So first I am glad to mention that a new version of KEmu will soon be officially be released. This new version rewrites a lot of the existing code, making it (hopefully) more stable and the GUI more usable and clear to the user. However, some parts are not yet ported, so you have to wait a bit more.

Additionally, I also had a project at university: The digital pen&paper project.

The Problem

Take some form, e.g. when you are at the doctor. Whatever is written on them basically has to be converted (until now by hand) manually to the computer, meaning: Someone spends a lot of time reading the form and writing the contents once again in a computer program. This could easily be solved by using e.g. tablet computers, however, these are not yet accepted by a broader mass, thus writing in a form which is printed on a sheet of paper will remain the default for some time. The digital pen&paper approach could help here: While one can use the pen (and the accompanying receiver unit) like a normal pen, the unit later can be connected to a computer to have the handwriting data in a digital form, which can be used by a handwriting recognition software to convert it to text, which then can be stored in a database (or where ever needed in the actual use case).

Now, there remains one problem: The files you receive from the unit connected to the pen are like writings on a transparent you use with e.g. overhead projectors: Imagine you have a sheet with a form you put on the projector. Then, you put another sheet on top of the form sheet and write on it. As long as the two sheets remain in that position, the resulting projection looks like what you expect (a form with filled fields). However, if you remove the form...

This basic problem extends to the following cases (there might be more, but these are the main ones we considered):

  • What, if either the form sheet or the receiver unit are misplaced (e.g. rotated by some degrees)?
  • What, if you have multiple forms (or one form, that spread over several pages of paper)?

These problems show one thing: Having a representation of the bare form and the digital writing (also called "Digital Ink") is not sufficient to automatically convert the information hold by the writing to a computer representation that can be stored in a structured system.

The Solution

We considered one solution: We not only used the digital ink gathered from the device, but also a scanned image from the filled out form. Before you wonder: The image of the form is (standalone) not sufficient to gather all the information. So while it is perfectly possible to recognize the form elements printed on the sheet and even to separate the writing from the background, you currently cannot convert these information automatically to text. This is due to the fact, that handwriting recognition requires not only the actual shape of an object to convert it to text (or a single character) but it needs information about how that shape has been created, i.e. the separate strokes. These cannot be reconstructed from the scanned image.

So our approach is: From a set of ink files and scanned images, find the pairs that belong together. Next, find a projection (i.e. calculate a matrix that is applied to the ink data) such that you once again place the transparent with the form and the transparent on which you've written correctly (concerning our example with the overhead projector).

If you are interested, please have a look at http://www.gitorious.org/kpki2010. There you'll find what we developed over the last months. To give you a short overview:

General Structure

The project consists of several (standalone) programs. Each program does one step in the processing chain. This design is pretty useful: First, it allowed us to separate work. Additionally, every part of the chain thus can be replaced by another program. So rewriting a program or using an existing alternative is no problem. The sub-projects we worked on are:

imagefilter

This program is a command-line image filter (well, as the name suggests...) It's used to create a version of the scanned image that contains only the text data (as black strokes) and background. However, it is pretty generic; internally, it provides several filters. Each filter is a plugin (available as a separate library), thus, the program can be extended and used also in other contexts (not only for our project).

The program uses an interesting approach: It can be seen as a "scripting language for image processing". Each filter can be seen as a command. There are commands for file interaction (i.e. loading and saving), and commands that actually do something with the images in memory. This allows to use several filters consecutively to create interesting effects or generate multiple output images from the same source file in one program call.

As I already said: In our toolchain, the image filter program is used for preparing the scanned images, but it has the potential to be used in other situations as well ;)

img2roi

This tool is used to create a XML representation of the filtered images and rendered versions of the ink files. "roi" is short for region of interest, in our case this is a consecutive region in the image/inkfile (i.e. a letter or number or a whole word, if it is handwritten and there is no space between two letters).

Each region is written to the resulting XML file together with some "meta" information (bounding box/circle, center of gravity, ...). These values can later be used to compare two such XML files (where usually you compare a XML generated from a scanned image and one generated from a rendered ink file).

roimatcher

This tool is used to calculate a matrix which maps all regions from one input region file to the other. In our processing chain, it maps the ink files' regions to the regions in the scanned files. Additionally, it creates a "difference" value, that determines how much the two input region files differ from each other. This value is required to find the correct ink/scanned file pairs. Later more on this.

inktransformer

This tool is used to apply transformations to Ink files (or better: Ink XML files, an intermediate format, as the original Ink format is a proprietary, binary one :\ )

Summing it all up: formextract

This program is a "meta" program: As input, it takes a list of ink files and scanned images. It then uses the other programs to filter and process these files. After calling roimatcher on each ink/scanned file pair, it runs an optimization algorithm (namely simulated annealing) to find an optimal list of ink/scanned pairs. Then it continues on calling the other programs to generate transformed ink files and some XML files that contain the read texts from the ink files.

So, did it work?

Basically: Yes. The results we archived until now are satisfying. When the scanned images are of sufficient quality (i.e. scanned with enough DPI and not spilled over and over with coffee) the transformed ink files look pretty good and the found ink/scanned pairs are mostly correct, too.

Nevertheless; there's room for improvements. So the imagefilter part could use some recognition of figures as well for it's form recognition: Currently, it searches for the surroundings of the form by looking for something red (the color we used for the forms). However, we basically know how the form delimiters look like (dashes and plus signs), so the filter could check whether what it found looks like such a sign and if not, continue).

Additionally, image filtering is currently rather slow. Some improvements would not harm in this area.

Can I test it myself?

Unfortunately: No. While the programs we wrote are all released under the terms of the GNU GPL v.3, the program chain currently requires two additional programs (namely InkManager and InkRecognizer) that we do not own. If you are interested in testing and if you want to re-write this missing parts, just e-mail me (or have a look in the sources to learn how the missing parts are assumed to work).

=-=-=-=-=
Powered by Blogilo

Developing an intelligent agent for RPGBomber2009 - more detailed version

RPGBomber2009 BannerRPGBomber2009 Banner

Hi there,

as promised I want to give you a look behind the curtains of intelligent playing. Before that you have to know something 'bout artificial intelligence in general.

Developing an intelligent agent for RPGBomber2009

Hi there,

after a long break (much longer than wanted <.<) developing RPGBomber2009 I developed some artificial intelligence relating stuff. I don't want to talk about that in detail, well at least not now, couase I have not that much time these days... But I want to post a little video to give you some impressions of what you have to expect >:-) Anyway, this video contains a big mistake in the behavior of the agent. I think it speaks for itself. Have a look:

Anyway... I more detailed article about the desiogn and implementation of the agent, will follow up rather soon. So stay tuned ;)

Hardly a better place....

... for a picnic!

Well, actually, these people probably did not select that place because of it's the ideal place for something like that. As I saw the first of them arrive today, I though they just want to make fun. Indeed, they seem to protest against what the city is currently planning for its most famous excavation.

The "Wiener Loch"

.... that is the name for that amazing hole in the ground in Dresden's very heart. I guess, it's hardly more than 50m from the main station to what remained from some previous planning of how to make the city more attractive (to whoever). Even Wikipedia knows about it (article is in German). I cannot say too much about it myself, as it already was here when I moved to Dresden. Since nearly 3 years it is one of the first things I see when I enter my balcony in the morning.

In the past, there were several plans how to proceed with the hole. The probably most attractive one for the city would be to sell the area and let someone build something there. Yeah, that would at least bring some money for the city. However, in the past, that failed several times. Currently, the city is talking to some investor from Berlin who (still) has an interest in building a commercial there.

However, that is not the only possibility for the Wiener Loch. An (in my opinion) better solution would be to just fill up the hole and planting it. Currently, Dresden is already a city with a lot of parks and green places here and there; but especially around the main station the city looks like every other one: Gray.

An (and might it be such a small one) park would be great here, especially as the "Prager Straße" is nearby, where a lot of people pass each day. It would surely be cool to calm down after shopping or on the way back home after work. However, that solution would not earn any money for the city.

However, whatever happens, the situation cannot remain as it is. The city pays a lot of money to keep the hole free of water; additionally, the whole area is only surrounded by some fences. I saw children play several times around the hole. IMO, the danger that there might happen something is too big.

Well, I am very curious what will happen someday with that hole. However, I cannot report "live" anymore, as I'll move this summer. Too bad, because as you can see, some trees and bushes started growing around and in the hole over the last years, and it starts looking attractive and calming ;)

=-=-=-=-=
Powered by Blogilo

Digital Pen&Paper, AI and a new Interesting Project

To even don't let come up the feeling that we might have nothing to present here, I'll blog a bit between work, university and (later that evening) homework.

A new semester began early in April; and as the passed one, it is accompanied by a new, really interesting project for me :) But let me explain.

Maybe you already have heard of a neat little toy called digital pen&paper. If not: The idea is, that you have a pen. And you write with it. Yes, ordinary writing on a sheet of paper. However, somewhere the "digital" must come in ;) The pen sends information to a device, that you can attach to your computer. This device then records the things you've written on the paper and saves them (you might think of that as if working with a pen in tools like GIMP). The saved strokes can than either be saved as an image (so you get a "scanned" version of your handwritten stuff, though without scanning) or as a "strokes" page (i.e. the lines you drew are saved in a fitting format). Cool, isn't it? You can even set up that stuff to work as HID device (you can then use it similar to a graphic tablet). This way, you can use handwriting recognition software to directly digitalize your writing.

Now, if you think of it: Does this really make sense? For someone like me - a student - probably not. If I want a version of my handwritten stuff I can scan it. If I need it digitalized I take my netbook with me and write stuff there. But there are indeed cases, where such a setup makes sense! Currently, often things are still written on paper; for example in a hospital (where the doc has to fill forms about you) or on a mess (for example a contact form). This is due to human nature (and habits, of course). But actually, these information are needed digitalized (e.g. the data from the doctor's form might be needed by others in the hospital, so having it available in the local network is essential; similarly, the contact data from the mess is actually further processed and stored on computers anyway). So, after taking the notes using (analog) pen&paper in a second step the forms must be digitalized by hand; a time consuming and boring job (unless the persons have a really bad handwriting, so that "decoding" their writings is kinda game).

Here comes the digital pen&paper idea in: Using such a device, the conversion step can be omitted. The data - even though written in a known way - is simultaneously available digitalized and can easily be transferred into the local network and thus is accessible nearly immediately. Well... at least it could be.

Consider a form as the doctor fills it out. There, we have several problems:

Even if we have the strokes data from the pen, we don't know exactly which page (assuming a form with several pages) a digital page belongs to. We can make assumptions ("the doc will always fill the forms out in the same order"), but as you know, a lot of assumptions tend to don't hold when it comes down to the point where you need them. This means for us: We need to find out, which digitalized page belongs to which page of the form.

This would maybe even be easy, but we have another problem: What, if the receiver device is badly positioned and thus the recorded strokes are transformed compared to how the form actually looks like (e.g. moved a bit to the left/bottom, rotated by some degrees and so forth). This makes a mapping of strokes to pages of the form even more difficult; especially, as there is a cyclic dependency: To know, which page of strokes belongs to which page of the form we would need the strokes positions correctly inside their page. On the other side, to get the correct transformation, we need to know which page of the form to target.

This cyclic dependency can be broken up a bit: By scanning the pages of the form afterwards and evaluating the scanned images, it is possible to map the stroke pages to the pages of the form. This decreases the comfort factor a bit of course (the sense of digital pen&paper is to avoid an additional step), however, we assume that scanners scan faster than humans are able to type in the information by hand (and especially with progress in techniques this assumption is likely to hold).

Now, then, AI comes in. Our (yep, this time it's a two-man-project ;) ) task is, to use algorithms and ideas from AI, to extract the information, make the mapping and thus ease digitalizing handwritten forms even further. I think, this will be really fun; especially, as we were asked to realize the project using C#/.NET. So... hey Mono, so we meet again :)

=-=-=-=-=
Powered by Blogilo

Intelligent Playlists - further thinking

Well, some time back I planned to write a third part of the "Intelligent Playlist" series about my AI project. However; it would've been about programming and while programming itself is really fun, writing about it is IMAO pretty useless. So, I'd rather have some further thinking about how to further improve the plugin in the future ;)

So, how does the plugin works until now? By starting with an initial set of tracks, it tries to guess some tracks from your local collection that might fit. This "might fit" here is calculated using some parameters you can set freely in the plugin's configuration dialog. Now an interesting question arises: Is this sufficient?

Actually it might. At least my personal testing resulted in sufficient good playlists. However, I also discovered some parts that would need some further work.

Internally, what you can set is which attributes of the tracks to include in the similarity tests and how much each influences the result (i.e. so called weights). Enabling certain attributes is rather a question of taste and/or your collection (i.e. when you don't tend to set the year meta tag of your tracks you can disable it's evaluation in the calculations). The really interesting thing are the weights. Why?

Well, they are currently not easy to set. Not because of a bad GUI but because the result might not be what you expect. The plugin already has two solutions for this problem: First, it has a "background learning mode". By giving regular feedback how you rate its decisions it can alter the weights to better reflect your personal flavor. Second, you can use its "power learning mode", in which you once take a kind of "quiz" and then the plugin calculates initial weights from your answers. So far, so good.

This works so far, however, I think the playlists can even be better. Why? While the described system works, it is based internally on a random mechanism. Indeed, with initial settings what you get is rather a random playlist! However, if you observe your own listening habits, you'll notice that human tend to select tracks in other ways than by picking some random tracks that might somehow fit to a given selection.

You don't know what I mean? Here are some examples:

  • Some people (and I certainly belong to them) tend to drop whole albums to the playlist; and at least I don't want to shuffle the tracks but rather hear them as they were arranged on the album.
  • When selecting tracks e.g. when you sit around with your friends and have a party, you might (again as I) tend to limit the number of artists to a certain amount. This is not by definition but happens "just somehow".

Currently, the plugin is not able to act this way. It acts - as already said - purely non-deterministic. So, what about a bit more determinism in the inference procedure? So a cool idea would be:

  • Analyze the user's behavior, i.e. does he prefer to hear albums in a row or a rather total shuffled mix? Also, it would be possible to use some statistics (i.e. which artists/genres do the user like most?)
  • Use the gathered information to improve the inference algorithm. So e.g. if the user tends to hear at most 5 different artists in a playlist, don't choose artists when this would exceed this limit. When the user likes to hear songs in a row, sometimes generate sequences of songs from one album. Additionally, the plugin could from time to time limit the tracks to choose from to the genres/artists/etc the user likes most.

While using only such deterministic decisions in the algorithm would tend to generate "boring" lists, together with the non-deterministic part it might be really fun. Additionally, this would take away the burden from setting "correct" weights. Of course, more accurate weights improve the playlists further, but a bad decision by the non-deterministic part would (with some luck) be less bad than a pure random algorithm (by chance, the playlist would contain enough songs the user like, so that he gracefully oversees some few wrong decisions).

A further improvement would be to include an additional, logical inference procedure. What does this mean? Well, assume, we already have some tracks in the playlist and now, a further song shall be appended. The current (non-deterministic) procedure selects some random tracks and calculates a similarity value; if that value is great enough, the track will be selected. Now, how could a logical inference procedure work here? Easy, for example we could define "rules" as "For a random track: If the track's album and artist is in the current playlist, select the track. If the track's artist is in the playlist and its rating is at least 3 stars, select it. If ...".

This possibility is another cool addition. However, we must also consider the user, thus the library of rules to use must be configurable by the user. Ideally, these rules are automatically generated by the plugin, or suggested to the user by observing his listening habits.

You see, there are enough possibilities to improve the plugin a bit ;)

=-=-=-=-=
Powered by Blogilo

Finally...

it's done!

After starting KI3U a half year ago I just wanted to learn programming KDE/Qt4. However: Lately, developing it became a passion 8)

First, somehow existing GUIs for rsync didn't fit my taste - a good reason to stick with KI3U, even if I am the only one who will ever use it ;) And then of course the learning factor. The project started pretty simple: "I want to develop something for the KDE desktop environment". S/th I did not since the times back when I used Windows and Delphi/Lazarus (of course my wish then was to develop Windows desktop apps).

When your PC thinks for you

As I currently work for my AI course's project, I have a lot to do with QtScript resp. JavaScript. Generally, a nice language, as it allows pretty fast development, however, I'm a fan of strong typed languages. But no matter.

Generally, when working with it, it does what one expects. Well.. mostly.

What gave me some real headache lately: The plugin tended to give some errors. Not reproduceable and (much more worse) with pretty senseless messages. Now, today I found out why:

On Vacation

... or at last that's what they call it ;)

As a student, I have holidays since a week now. However, as the company I'm working for will attend at this year's MWC I kept spinning for another week.

Alive...

Um... yeah... Just realized, that my last (public) activities here were a pretty while ago (over a month... time flies :( )

Well, nevertheless, I cannot complain that my life's boring:
Currently, I'm forced to use so much different programming/scripting languages, that I slowly start to mix them (note: for ( var i = 0; i < list.count(); i++ ) { ... } is no valid C++ code ). But hey, pretty good training and that way I can collect some good ideas and impressions e.g. for Langadia (which, by the way, will hopefully experience some progress rather soon ;) ).

Syndicate content
Copyright (c) RPdev 2008, 2009, 2010