Resizing/Up-scaling images processing could learn a few things from the advances made in voice- and
text-recognition areas.
One needs to combine multiple processing methods together with occasional human assistance along the
way. It's a dynamic iterative processes that still needs a human to occasionally answer the question "Um,
we aren't sure what to do here...is the next letter/word/sound supposed to be [a, b, c, d....].
On can find some examples of this in very limited use in the application of filters in image processing
software. When the user, selects, say, where, during processing the software presents a matrix of possible results, and asks the user to choose which one looks the most accurate.
What we currently need is software that is good at identifying key image details it needs to ask humans for
help with. Ultimately, patterns, rules and dictionaries need to be built-up, integrated and successfully
brought into play just like it has been done with voice and text.
A typical scenario will ask an acceptable number of questions of the user during the processing phase
en-route to a result the user will find acceptable. The software will ask for help choosing accurate
identification of details concerning pixels, shapes, objects or context in relation to information pre-
identified with confidence from data embedded in the image itself , data determined with confidence by the
processing software, or determined with prior user input earlier in the processing.
It would be great if image processing could magically figure-out what we were looking at and produce
sharper & clearer images on its own. However we are not at the point yet were developers have identified
sufficient patterns, rules and logic for that to happen.
I'm not sure it can ever occur in one monolithic stand-along package with no input from anyone, or eve
from another system in real-time. For example, the processing software could, with the help of a person,
if necessary, concur that a part of the image being processed includes a license-plate. However, when
determining if one of the letters is either an "E" or an "F", the processing-software many have to leverage
external online area-specific databases.
For example, I assume that the photo-recognition software used by well-equipped law-enforcement agencies can make out the letters on a license-plate from an image taken from a video still. In would be a lot easier to do this if they dynamically submit some pre-determined attributes of the image: where & when an image might have been taken, type
of car, state license plate was issues (via color or VIN, etc.)....then logic pre-compiled or accessed in real-
time from individual US states would be consulted to determine most likely letter-number combination applies to the plate in the photo. This would be very similar to how credit-card transactions are verified on-line.