How can the information content of multiple images be combined to enhance the image of the subject?
Although any single image of a subject may not contain satisfactory detail, another image of the subject may contain additional information, allowing us to construct an image with reduced noise or better resolution. I restrict this study to using sequences of images with similar perspective of a static subject.
Given such a sequence of images, the first challenge is aligning them. Assuming the pinhole camera model, the differences in perspectives of the images is described by projective transformations on homogeneous coordinates. In general, projective transformations are uniquely specified by 8 real parameters and can transform any one quadrilateral on the plane to another. (Spoiler: in my analysis, I found that the image enhancement is dominated by errors from estimating this alignment.)
To estimate the alignment transformations, feature points of each frame are determined using Oriented FAST and rotated BRIEF (ORB) and then matched between frames. Using random sample consensus (RANSAC), the transformations were determined to a high confidence.
Aligning the frames according to these transformations generally results in sub-pixel details due to the continuous nature of the transformation, so the images were first upscaled which allows these details to become visible. Once the images were aligned, the frames were simply averaged to make the enhanced image.
Looking at the results, it seems that although this approach is effective at de-noising, it is ultimately not very effective at providing higher resolution because the automatically estimated alignments are not precise enough. This can probably be understood because the frames contain mostly redundant information and there are diminishing returns on the marginal increase of information content.
Resources: