u/ZoronicElysium2012

Hi everyone! If you haven't seen my original RDNU post in this thread, go check it out first. (Search RDNU in the thread. I'm too lazy to link it).

Results are in! Training completed today and I ran some preliminary tests. Test results were gathered using the M3VIR dataset (search HuggingFace). Keep in mind that the model has not been quantized to INT8 yet, so these are FP16 results. I will make FP16 and INT8 models available.

The model has been trained using RAFT Optical Flow inference to compute motion vectors on input images. The results are fair, but this will lead to ghosting during use. Estimating motion vectors through an optical flow model is great for preliminary training, but when moving objects in an image are occluded (think of background and foreground objects intersecting on screen), the resulting motion vector estimation assumes these objects continue moving along rather than understanding that they are occluded and no motion vectors should be calculated. This issue will be addressed through a round of fine-tuning with a dataset I am creating that includes ground truth motion vectors and G-Buffer data exported from UE5.

These are the results of 300,000 iterations, 4x upscaling, training on 96x96 to 384x384.

Test #1:

Input (96x96)

Upscaled (384x384)

Ground truth (384x384)

Test #2:

Input (960x540)

Upscaled (3840x2160)

Ground Truth (1920x1080)

Comparison (Input, Output, Ground Truth)

Test #3:

Input (960x540)

Upscaled (3840x2160)

Ground Truth (1920x1080)

Comparison (Input, Output, Ground Truth)

Test #4:

Input (960x540)

Upscaled (3840x2160)

Ground Truth (1920x1080)

Comparison (Input, Output, Ground Truth)

My Thoughts:

Straight lines look good, foliage looks alright, some textures are blurred, rounded objects and patterns are not handled well. From the full test results, motion looks quite good with little artifacting from frame to frame. Halton Jitter compensation was used on the input data prior upscaling to mimic the TSR jittered frames provided by the game engine and I don't see any glaring issues resulting from this, so the model should handle this well and, if nothing else, perform as a a good anti-aliasing implementation at this stage. The 4x upscaling training also causes very noticable over-smoothing to most geometry, resulting in a softened, blurred image. This can be effectively addressed through more training iterations with more realistic scaling factors.

More training is needed- especially to acount for lower scaling factors (720p to 1440p, 1080p to 4k) which provide more details to be upscaled. Over the next few days, I will populate my GitHub repo with submodules for the training data, training recipies, and the training pipeline, as well as the integration scaffolding. I may also quantize the current model to INT8 and provide some kind of test release simply as a proof of concept to confirm my implementation targetting RDNA3 hardware.

I have also tweaked the training pipeline to account for different scaling factors, created a rendering pipeline through UE5 using some high quality free scenes to get accurate motion vectors, and am running another 600,000 iterations (should take about a week) to see if we can lower the pixel loss (model showed pixel loss between 0.012-0.026 for these results with a consistency loss between 0.001 and 0.004).

Over all test results, performance was pretty consistent. Straight lines are regularly smoothed effectively with most geometry being noticably better than the input, but noticably worse than the ground truth.

Note:

If after increased training, the output is not considerably improved, some replies on my previous post suggested different models with better architectures to look into. I believe I am currently quite far from the theoretical performance ceiling for the RDG model I am currently using, but a different model may prove effective if I hit a plateau.

Since AMD is either dragging their feet or straight up not bringing FSR4 to RDNA3, I've decided to take matters into my own hands to improve 4K performance on my RX7900XTX.

Background:

FSR versions prior to FSR4, unlike DLSS and FSR4, do not actually use any of the AI acceleration hardware (WMMA - Wave Matrix Multiply Accumulate) in RDNA3 cards. FSR 3.1 and XeSS also suck and I'm tired of it. I'm a big fan of Gray Zone Warfare and am dissapointed in the performance I get and don't want to turn graphics settings down, given as the visuals are one of the main draws to the game. It seems like there are no existing solutions either that utilize this hardware on RDNA3, so I've decided to make one.

I own a small company with some considerable AI hardware and already do a lot of AI training and fine-tuning work, so I've decided to use that hardware and do sort-of a "marketing" thing that also improves my gaming experience. Best part is, I'll be open sourcing all of it.

Goal:

Create an AI upscaler targeting RDNA3 hardware (but keeping it as platform agnostic as possible) that utilizes every drop of AI acceleration that uses the same interface as DLSS. Initially, I intend to use Optiscaler to inject this in games that already work with DLSS and hijack the DLSS calls. The same information (motion vectors, G-Buffer, etc) will be used on a small VSR (Video Super Resolution) model that targets RDNA3 WMMA hardware. Through optiscaler, this will of course piss off anti-cheat, so unfortunately, no anti-cheat enabled multiplayer games at first. But if this works out well, and given the open nature, it may hopefully be implemented in the games officially.

Benefits:

AI upscaling will allow unused hardware on RDNA3 cards to be fully taken advantage of, so better performance than FSR's simple shader-based temporal upscaling. It also provides much higher quality output. The main difference between DLSS and FSR<4 is literally AI and the quality difference is apparent.

Status:

I'm currently training a VSR model through RDG with the same data DLSS takes from engines. Optiscaler already provides a framework for upscaler injection, so I plan to fork it and use my model once it's trained. I'll post the complete technical breakdown on my GitHub page if you're interested. The model will target FP16.

Please let me know if you're interested in this.

Edit: Please excuse the typos and/or dog shit grammar. I need to sleep. It's been a long night of studying and optimizing PyTorch scripts. What asshole decided to use Python for the industry standard ML framework??

Edit: I just did some double checking and I wanna thank you guys for your comments regarding INT8 vs FP8 and DP4A support on RDNA3. INT8 is supported and I'll consider it for performance. RDNA3 does not support FP8 and in my sleepless-zombie state last night, I confused the two. Please correct me if I'm wrong and I'd highly appreciate sources where I can read more about such topics.

RDNU - Pre-Training Complete! Upscale Results

RDNU - Radeon Decoupled Neural Upscaler