
RDNU - Pre-Training Complete! Upscale Results
Hi everyone! If you haven't seen my original RDNU post in this thread, go check it out first. (Search RDNU in the thread. I'm too lazy to link it).
Results are in! Training completed today and I ran some preliminary tests. Test results were gathered using the M3VIR dataset (search HuggingFace). Keep in mind that the model has not been quantized to INT8 yet, so these are FP16 results. I will make FP16 and INT8 models available.
The model has been trained using RAFT Optical Flow inference to compute motion vectors on input images. The results are fair, but this will lead to ghosting during use. Estimating motion vectors through an optical flow model is great for preliminary training, but when moving objects in an image are occluded (think of background and foreground objects intersecting on screen), the resulting motion vector estimation assumes these objects continue moving along rather than understanding that they are occluded and no motion vectors should be calculated. This issue will be addressed through a round of fine-tuning with a dataset I am creating that includes ground truth motion vectors and G-Buffer data exported from UE5.
These are the results of 300,000 iterations, 4x upscaling, training on 96x96 to 384x384.
Test #1:
Test #2:
Comparison (Input, Output, Ground Truth)
Comparison (Input, Output, Ground Truth)
Test #3:
Comparison (Input, Output, Ground Truth)
Test #4:
Comparison (Input, Output, Ground Truth)
My Thoughts:
Straight lines look good, foliage looks alright, some textures are blurred, rounded objects and patterns are not handled well. From the full test results, motion looks quite good with little artifacting from frame to frame. Halton Jitter compensation was used on the input data prior upscaling to mimic the TSR jittered frames provided by the game engine and I don't see any glaring issues resulting from this, so the model should handle this well and, if nothing else, perform as a a good anti-aliasing implementation at this stage. The 4x upscaling training also causes very noticable over-smoothing to most geometry, resulting in a softened, blurred image. This can be effectively addressed through more training iterations with more realistic scaling factors.
More training is needed- especially to acount for lower scaling factors (720p to 1440p, 1080p to 4k) which provide more details to be upscaled. Over the next few days, I will populate my GitHub repo with submodules for the training data, training recipies, and the training pipeline, as well as the integration scaffolding. I may also quantize the current model to INT8 and provide some kind of test release simply as a proof of concept to confirm my implementation targetting RDNA3 hardware.
I have also tweaked the training pipeline to account for different scaling factors, created a rendering pipeline through UE5 using some high quality free scenes to get accurate motion vectors, and am running another 600,000 iterations (should take about a week) to see if we can lower the pixel loss (model showed pixel loss between 0.012-0.026 for these results with a consistency loss between 0.001 and 0.004).
Over all test results, performance was pretty consistent. Straight lines are regularly smoothed effectively with most geometry being noticably better than the input, but noticably worse than the ground truth.
Note:
If after increased training, the output is not considerably improved, some replies on my previous post suggested different models with better architectures to look into. I believe I am currently quite far from the theoretical performance ceiling for the RDG model I am currently using, but a different model may prove effective if I hit a plateau.