68 | | || [[Image(BMEOR_v3.jpg, 400px, title=BME Training Reactor Geometry)]] ''BME Training Reactor Geometry'' || [[Image(BME_OR_1e-6s_n2e26_neutrondensity_Guardyan_v3.jpg, 400px, title=Neutron density distribution in the BME Training reactor calculated by GUARDYAN)]] ''Neutron density distribution in the BME Training reactor calculated by GUARDYAN'' || |
| 68 | || [[Image(BMEOR_v3.jpg, 400px, title=BME Training Reactor Geometry)]] \\''BME Training Reactor Geometry'' || [[Image(BME_OR_1e-6s_n2e26_neutrondensity_Guardyan_v3.jpg, 400px, title=Neutron density distribution in the BME Training reactor calculated by GUARDYAN)]] \\''Neutron density distribution in the BME Training reactor calculated by GUARDYAN'' || |
| 69 | ''Figure 4.: Geometry of the subcritical verification model'' |
| 70 | |
| 71 | We experienced, that the vectorized code ran 1.5x slower than the history-based algorithm. To better understand the underlying reasons we looked into the kernel execution times. In case of the event based version of GUARDYAN, every energy law was implemented in a separate kernel, thus an application profiling tool is able to reveal which task consumed most resources. Inspecting the profile shown in Fig. 5, several conclusions can be made: |
| 74 | * The main part of the execution time is due to calling the ”transition kernel”. This function transports a particle to the next collision site, and performs the selection of reaction type for that particle. Long calculation time is most likely caused by theWoodcock method used for path length selection (a phenomenon termed the heavy absorber problem) and slow energy grid search algorithms implemented in GUARDYAN. A possible solution for the heavy absorber problem may be solved according to our recent investigation of biased Woodcock algorithms |
| 75 | |
| 76 | * Memory transaction costs are much greater than computational costs of simulating different reactions. The ”CUDA memcpy DtoH” and ”CUDA memcpy HtoD” tasks stand for the communication between host and device, taking up more simulation time than simulating elastic scatter and ACE laws. |
| 77 | |
| 78 | * The ”Thrust sort” kernel includes all computational overhead that is associated with event-based tracking. Note, that sorting is done two orders of magnitudes faster than memory transactions. |
| 79 | |
| 80 | Fig. 5 indicates that history based tracking may be more effective because most of the calculation time is due to calling one kernel (called ”transition kernel”) which is applied to all particles before every collision. In order to execute the simulation of any type of reaction, the event based version must wait for the transition step to end for all particles. On the other hand, the history-based simulation can go on unsynchronized, i.e. threads may diverge (one may execute a transition step while the other simulates a collision), but threads do not need to wait for others to proceed. By optimization of the transition step, we may reach a different conclusion. |