2023년
1번
- NVCC
- Thrust
omp_get_thread_num
- block, shared
- guided
- loop variables
2번
omp_set_thread_num(NUM)
- 12
private(x)
3번
A
#pragma omp parallel sections
The individual code blocks are distributed over the threads
B
parallel 블록 내부에서만 접근 가능하다.
C
firstprivate
variables are initialized to their value before the parallel section
4번
5번
-
(i) The serious problem:
- Too many threads
-
(ii) Why?:
- Excessive parallelism overwhelms resources
2013
1번
- Synchronization
- Conditional Variables
- be blocked / wait
- independent
- divide
- merge
- data parallel
- pthread_join
- pthread_exit
2번
There are several ways in which a pthread may be terminated
- The thread returns from its starting routine (the main routine for the initial thread)
- The thread is canceled is terminated due to making a call to either the
exec()
orexit()
- The thread makes a call to the
pthread_exit
subroutine - If main() finishes first, without calling
pthread_exit
explicitly itself
5번
#pragma omp parallel for reduction(+:sum) private(x)
2018
1번
- warp
__global__
- void
__host__
__device
- shared memory
- block
- private
- Mutual Exclusion
- block
- return
pthread_join
pthread_exit
2번
parallel for shared(m, n, a, b, c) private(i, j, sum)
for (i = 0; i<m; i++) {
int temp = 0;
for(j = 0; j<n; j++) {
temp = temp + b[i*N+j] * c[j];
}
a[i] = temp;
}
3번
- blockIdx
- threadIdx
- blockDim
- threadDim
4번
- chunk size n 인 round-robin
- n 개씩 동적으로
- chunk size 가 최소 n까지 줄어든다
5번
왼쪽 q1 찾기
오른쪽 q2 binary search
q3 위치 계산
A[q3] = T[q1]
spawn Par-Merge()
spawn Par-Merge()
sync
6번
- a
- b: cudaMalloc
- c: stencil1D
2017
1번
- General Purpose GPU
- device
- host
- execution configuration (block, thread)
- SIMD
- shared
- loop
- private
2번
- A: thread_id
- B: pthread status code
3번
reduction(+:sum)
4번
- if n ≤ appropriate 숫자 일 때 sorting
5번
- prefix
6번
CUDA!
2016
1번
- block
- shared
- sections
- shared
- loop
- private
2번
3번
4번
- floor (p1 + r1 / 2)
if n >= 30
sorting other algorithm
else
5번
double start = omp_get_wtime()
double end = omp_get_wtime()
2015
1번
- loop
- undefined
- SMID
- execution configuration
- call by host
- execute in device
- return status void
- block
- pthread_join
- pthread_exit
2번
#pragma parallel for reduction(+:sum)
3번
4번
- shared
- destination, source, size, device memory
5번
2014
1번
- shared memory
- warp
- Scalability
- load balancing
- a power of two
- loop variable