Ddh in arome, debugging, verification and manual




Дата канвертавання26.04.2016
Памер36.82 Kb.
Tomislav Kovacic
DDH in AROME, debugging, verification and manual
Stay in Toulouse, 1-02-2007 to 16-03-2007


The aim of my stay was to:
1. Find the bug which was appearing when AROME was run with DDH and option LONLYVAR=.TRUE.

2. To run some cases to validate the results of DDH in AROME

3. To write a manual for usage of DDH in ALADINA and AROME.
A. Work on tora.



  1. The cycle

Debugging and modification were done on pack based on cycle CY32bf.


  1. The bug


Description:
When AROME was run with DDH and LONLYVAR=.TRUE. program was crashing with message:
ctivating SIGALRM=14 and calling alarm(10), time = 495.14

JSETSIG: sl->active = 0

signal_drhook(SIGALRM=14): New handler installed at 0x4bdc080; old preserved at

0x0


tid#1 starting drhook traceback, time = 495.15

[myproc#1,tid#1,pid#1281654864]: MASTER

[myproc#1,tid#1,pid#1281654864]: CNT0

[myproc#1,tid#1,pid#1281654864]: CNT1

[myproc#1,tid#1,pid#1281654864]: CNT2

[myproc#1,tid#1,pid#1281654864]: CNT3

[myproc#1,tid#1,pid#1281654864]: CNT4

[myproc#1,tid#1,pid#1281654864]: STEPO

[myproc#1,tid#1,pid#1281654864]: SCAN2H

[myproc#1,tid#1,pid#1281654864]: SCAN2MDM

[myproc#1,tid#1,pid#1281654864]: GP_MODEL

[myproc#1,tid#1,pid#1281654864]: CPG

[myproc#1,tid#1,pid#1281654864]: CPG_DIA

[myproc#1,tid#1,pid#1281654864]: CPCUDDH

tid#1 starting sigdump traceback, time = 495.16
Dump of user stack :
address = 0000000000cb41b0

address = 000000000045cc40

address = 00000000003c58c0

address = 0000000000c32db0

address = 0000000000b5fa40

address = 0000000000c74560

address = 0000000000b9e290

address = 0000000000bbcdc0

address = 0000000000bb5cd0

address = 0000000000bb5060

address = 0000000000bb4b10

address = 0000000000bb4270

address = 0000000000013e60

address = 0000000000000510

[myproc#1,tid#1,pid#1281654864,signal#14(SIGALRM)]: Received signal :: 2325MB (heap), 2325MB (rss), 0MB (stack), 0 (paging), nsigs 2, time 505.53

debug =>[ -a NODE.001_01 ]



Solution:


  1. It was found that program is crashing in the time step when historical file is written. Until then DDH works well and outputs from it are received.




  1. Program was crashing in two nested DO loops within CPCUDDH:

DO JROF = KSTART, KPROF

DO JCV = IPLSTA, IPLEND

HDCVB1(JCV,KDDHI(JROF),ITHREAD) = HDCVB1(JCV,KDDHI(JROF),ITHREAD)&

& + PDHCV(JROF,JCV-IPLSH)*PDHSF(JROF)

ENDDO


ENDDO
IF(LDPHY) THEN

DO JROF = KSTART, KPROF

DO JCV = 1, IPLEND2

HDCS1(JCV,KDDHI(JROF),ITHREAD) = HDCS1(JCV,KDDHI(JROF),ITHREAD)&

& + PDHCS(JROF,JCV)*PDHSF(JROF)

ENDDO


ENDDO

ENDIF



  1. The problem with second DO loop is solved by setting to zero the array ZDHCS in CPG_DIA.




  1. The problem of the first DO loop is not solved. Program can run only when DR_HOOK=0 and DR_HOOK_IGNORE_SIGNALS=-1.

Numerous tests where done to find the reason why program is crashing in the first DO loop.




  1. Influence of AROME DDH subroutines

To exclude the possibility that some allocations done in DDH subroutines specific for AROME are causing the problem, all of them where stopped. For the same reason all calls of budget subroutines where stopped, too.

Only the allocation of array YAPFT was left and it was used as usual. This array is needed during writing DDH files because it contains descriptions of fields. It is believed that it cannot cause the problem.

Still, the bug was there. Program was crashing the same way as before. The conclusion is that new DDH subroutines, introduced for DDH in AROME, are not causing the bug.




  1. Array HDCVB1

Dimensions were checked and loops containing it, as well. Indices of the loops never run out of dimensions bounds.


  1. Array PDHCV

Dimensions were checked and loops containing it, as well. Indices of the loops never run out of dimensions bounds.

Program was crashing only when values from PDHCV were assigned to HDCVB1.




  1. Array PDHSF

The same as for PDHCV is valid.


B. Work on tori


  1. The cycle

Work on tori was based on cicle CY32t0.


  1. First run

First runs where done with guru's executable. The run with variables and fluxes in DDH program was crashing when budgets were called in subroutine RAIN_ICE. It was find that budget calls for some processes were deleted, probably due to problems to compile this subroutine. Deleted lines in subroutine RAIN_ICE where longer then 132 character and compilation failed on them. Message saying that line is to long is often find when compiling on tori.

The deleted lines where put back in subroutine RAIN_ICE and with setting RAIN_ICE to zero (as is described before) program was executed well. Three kinds of runs where done; without DDH, DDH and only variables and DDH with variabls and fluxes. All of them where OK on Midi Pyrene domain.




  1. Verification

Outputs in DDH files where verified for run with variables and fluxes. Values of some fluxes where suspicious. The possible reason for this was found in subroutine CPG. In parallel DO loop all threads were using array APFT at the same time. To avoid this a fourth dimension is given to array APFT who's index is taking a number acctual thread. Now each thread uses its own part of APFT.


  1. Compilation problem

The last few days of my stay I couldn't compile the pack on tori. Last pack I used is /cnrm/gp/mrpa/mrpa661/pack/aroddh32t0_3.

I couldn't compile cpg_dia.F90 and message is:


Compile:

sxmpif90 -clear -c -dw -dW -Pstack -Wf,-pvctl,nomsg,loopcnt=1000000,vl=fix256,vwork=stack,-P,nh,-ptr,byte -sx8r -DADDRESS64 -DHIGHRES -DNECSX -DSX4 -DBLAS -DVOCL=CDIR -DOCL=CDIR -DNOVREC=NODEP -C hopt -Wf,-pvctl,chgpwr,noassume -pi auto line=500 -R5 -Wf,-pvctl,fullmsg local/arp/adiab/cpg_dia.F90

/utmp/nqs.59094.tori-batch/ow12396_ppdir/i.cpg_dia.F90:
f90: error(311): i.cpg_dia.F90, line 815:

Number of section subscripts does not

agree with the rank of the part name.

f90: error(311): i.cpg_dia.F90, line 816:

Number of section subscripts does not

agree with the rank of the part name.

f90: error(311): i.cpg_dia.F90, line 822:

Number of section subscripts does not

agree with the rank of the part name.

f90: error(311): i.cpg_dia.F90, line 823:

Number of section subscripts does not

agree with the rank of the part name.

f90: error(311): i.cpg_dia.F90, line 824:

Number of section subscripts does not

agree with the rank of the part name.

f90: error(311): i.cpg_dia.F90, line 824:

Number of section subscripts does not

agree with the rank of the part name.

f90: i.cpg_dia.F90, cpg_dia: There are 6 errors.

C. Modifications
The main modification was to correct an error occurring in parallel execution of DO loop in CPG. All threads were using the same array APFT resulting with incorrect values of fluxes in DDH files. A fourth dimension was added to APFT with extend equal to maximal number of threads. Now each thread uses its own part of APFT.
adiab/cpg_dia.F90


  1. ZDHCS is set to zero,

dia/sualtdh.F90



  1. allocation of APFT is put here

dia/aro_cpphddh.F90



  1. changes needed for fourth dimension in APFT

dia/posddh.F90



  1. a bug reported by M. Hamrud to J.-M. Pirou was corrected;

module/yomphft.F90



  1. changes needed for fourth dimension in APFT

phys_dmn/apl_arome.F90



  1. changes needed for fourth dimension in APFT

phys_dmn/aro_iniapft.F90



  1. changes needed for fourth dimension in APFT


D. Verification

Verification was done on DDH outputs made before array APFT was given fourth dimension for threads. After this correction was done there was no run with DDH because of problems with compilation on tori.


E. Manual

Not done.


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка