I'm planning to do a very similar upgrade from RX 580 4GB, but I want at least 16 GB of VRAM and I need the PCIe slot next to the card. Even with a riser, the maximum thickness that fits is 45 mm, which leaves me with a second-hand RX 6800 reference model as the only AMD option (all the models from partners have larger coolers taking around 2.5 slots (50 mm), and lower models have less VRAM).
So I'm also glad I checked the dimensions of new cards before buying, but in my case the result wasn't that positive.. :)
Hopefully the RX 7800 XT stays under 250 W (like the RX 6800) so there is a chance for a current-gen 2-slot alternative. (The 7900 GRE is 260 W with 80 CU, so I would hope the 7800 XT with 60 CU draws less power, even if they push the clocks a bit higher... Can't wait for a more detailed leak that includes TBP. :) )
ROCm is basically AMD's answer to CUDA. Just (as usual) more open, less polished, and harder to use. Using something called HIP, CUDA application can be translated to work with ROCm instead (and therefore run on AMD cards without a complete rewrite of the app).
AFAIK they started working on it 6 or 7 years ago as the replacement for OpenCL. Not sure why exactly, but OpenCL apparently wasn't getting enough traction (and I think Blender even recently dropped OpenCL support).
After all the time, the HW support is still spotty (mostly only supporting the Radeon Pro cards, and still having no proper support for RDNA3 I think), and the SW support focuses mainly on Linux (and only three blessed distros, Ubuntu, RHEL and SuSe get official packages, so it can be pain to install anywhere else due to missing or conflicting dependencies).
So ROCm basically does work, and keeps getting better, but nVidia clearly has a larger SW dev team that makes the CUDA experience much more polished and painless.