GPU_programming

94 readers
3 users here now

Programming Lemmy instance focused on GPUs. CUDA, OpenCL, ROCm, DirectX, Vulkan are all on subject here.

founded 1 year ago
MODERATORS
1
 
 

Currently getting Hacker News hug of death right now, but hopefully in a few days the traffic subsides. From what I could load, it looks like a good article.

https://web.archive.org/web/20240606103630/https://edw.is/learning-vulkan/

Archive.org does have a mirror.

2
3
4
 
 

Sorry for the spam, just did a bit of a research / search and decided I needed to save at least these DirectX12 links and read up on them later.

5
6
7
 
 

Just searching for stuff, and this came up. I figure it was worth saving here.

8
9
 
 

Looks like the update's big feature is "Work Graphs".

10
11
 
 

Someone wanted a more portable printf implementation, so they created one for themselves.

I'll be giving this article a good look over for sure.

12
13
 
 

Saving this .pdf here.

The relational join operator is a very memory-intensive and even computationally-intensive operation. Though real-life databases can be in the TB range, there are a number of applications of smaller, memory-only databases that could feasibly fit in the 4GB or 8GB of smaller GPUs.

Its a well known fact that relational-joins (and joins-of-joins) can be parallelized. Database programmers meticulously perform planning-algorithms to optimize this important operation and parallelize it across cores or even systems. Seeing research into a natural GPU application warms my heart at least!

GPUs are well known to parallelize and improve upon sorting algorithms (see embarrassingly parallel solutions like Bitonic Sort... but also GPU-specific / SIMD-designed sorting algorithms like MergePath). One of the most common ways to perform a relational join is to sort both sets of data on the relational-join, and then linearly scan through both relations matching up (left.blah == right.blah). This paper seems to take this approach and measures how good GPUs are at this. (At least, for data that does fit in the GPU RAM).

There's also "Hash-Join", which is investigated in this paper as well.

14
15
 
 

The abstract stuck out to me, and I like dabbling in the 3SAT stuff on a hobby level.

The gist is that these researchers have utilized the TensorCores / FP16 Matrix Multiplication routines found in neural-network chips/instructions to start searching for MaxSAT (which seems to be related to 3SAT somehow, I'll be reading more about this...)

16
 
 

Found this reference online, figured I'd save it here. Looks like an excellent introduction to fragment/pixel shaders.

17
18
19
 
 

Seems like a personal project that is basically a personal-version of the Intel "ispc" tool. Still, 2nd or 3rd programming languages of this nature isn't a bad thing, if anything, we need more ideas and more implementations to figure out how best to map GPU-like programming to AVX512 or other CPU SIMD languages.

20
21
 
 

A solid example of how to perform performance analysis on modern GPUs and video games.

DirectX programmers (and GPU programmers of all kinds) probably should use this article as a template for thinking about GPU-performance in relationship to a greater task.

You search to find the "slowest shader", you analyze it and its parallelism to see if its adequate. You hone in and perform deeper analysis. I guess a programmer goes one step further and needs to think of a solution / improvement though (rather than "just" benchmarking).

But this style of analysis is very useful and helpful in the optimization process.

22
23
24
 
 

StreamHPC here goes over some concepts to solve N-Queens on a GPU.

N-Queens is a classic "homework problem" for traditional AI courses (back when search algorithms were considered AI at least). As usual, the GPU has a couple of changes if you want to run as fast as possible.

25
view more: next ›