Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge gpu branch onto main #21

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

merge gpu branch onto main #21

wants to merge 3 commits into from

Conversation

cguzman95
Copy link
Collaborator

Hello,

This pull request is an update of the GPU part, which has around 2 years of development.

There are minor changes on the CPU part related to quality-of-life improvements (save stats like execution time). Please check that it works as usual in your environments.

The GPU part has a lot of changes. Now most of the CVODE algorithm is translated to CUDA. There is a new folder related to compiling GPU (camp/compile/power9), where the files to compile and run are:

./compile.libs.camp.sh
./compile.camp.sh
./run.sh

You may notice that for GPU I'm using a cvode version from https://github.com/mattldawson/cvode instead of the .tar version, because it has some time counters to compare both CPU and GPU execution times. Also, for the same reason, I'm using MPI to measure those timers, so to use the GPU you should have MPI.

Also, I added a new test to check the GPU. It requires Python and is tested for version Python/3.7.0. I usually develop running a very similar file located at test/monarch/TestMonarch.py.

I will be glad to hear any feedback.

Best Regards,
Christian

@cguzman95 cguzman95 added the enhancement New feature or request label Nov 10, 2023
Copy link
Collaborator

@K20shores K20shores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only real concern is with all of the syncthreads, but maybe they're all needed? When I get time, I'll try to run everything on NCAR's supercomputers

Comment on lines 365 to 367
! Free the interface and the solver
deallocate(camp_interface)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause a crash if it's left in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it produced a crash, but seems now is not the case. I will leave it as before.

Comment on lines 347 to 349
! Free the interface and the solver
deallocate(camp_interface)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, does this cause a crash?

Comment on lines -1469 to +1612
camp_mpi_pack_size_real_array(this%init_state, l_comm)
camp_mpi_pack_size_real_array(this%init_state_cell, l_comm)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above both init_state and init_state_cell were initialized. Why wouldn't both be sent over mpi? Perhaps I just need to stare at this PR some more. I guess I don't understand the difference between the two yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because both arrays are initialized to 1.0 at line 800:

          ! Set the activity coefficients to 1.0 as default
          do i_name = 1, size(unique_names)
            i_state_elem = rep%spec_state_id(unique_names(i_name)%string)
            this%init_state(i_state_elem + i_cell * this%size_state_per_cell) = &
                    real(1.0d0, kind=dp)
            this%init_state_cell(i_state_elem) = &
                    this%init_state(i_state_elem + i_cell * this%size_state_per_cell)
          end do

I created this "init_state_cell" to accelerate MPI communications, since I only send a cell instead of all the cells.

I believe "init_state" can be simplified quite well, but for the moment this patch is enough for me.

Comment on lines +1742 to +1750
call camp_mpi_unpack_real_array(buffer, pos, this%init_state_cell, l_comm)

allocate(this%init_state(this%size_state_per_cell * this%n_cells))
do i_cell = 0, this%n_cells - 1
do i_state_elem = 1, this%size_state_per_cell
this%init_state(i_state_elem + i_cell * this%size_state_per_cell)=&
this%init_state_cell(i_state_elem)
end do
end do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this answer my question. Does this imply that init_state is not initialized before the broadcast?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always. For example, for "camp/data/CAMP_v1_paper/camp_monarch_interface" it is only initialized for MPI rank=0. Then, most of the variables are sent to the rest of the processes. I detect only 1 variable only present in rank=0, which is unique_names (i.e. the names of the species), but it may be more.

Summarizing, "init_state" is initialized at rank=0, then sent to the other ranks

Comment on lines 599 to 608
//if(i_cell==0){
//print_double(md->grid_cell_env,CAMP_NUM_ENV_PARAM_,"env689");
//print_double(md->grid_cell_state,n_state_var,"state688");
//double *yp = N_VGetArrayPointer(sd->y);
//print_double(yp,md->n_per_cell_dep_var,"y660");
//}
//print_double(md->grid_cell_env,CAMP_NUM_ENV_PARAM_,"env689");
//double *yp = N_VGetArrayPointer(sd->y)+i_cell*md->n_per_cell_dep_var;
//print_double(yp,md->n_per_cell_dep_var,"y660");
//print_double(md->grid_cell_state,md->n_per_cell_state_var,"state688");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//if(i_cell==0){
//print_double(md->grid_cell_env,CAMP_NUM_ENV_PARAM_,"env689");
//print_double(md->grid_cell_state,n_state_var,"state688");
//double *yp = N_VGetArrayPointer(sd->y);
//print_double(yp,md->n_per_cell_dep_var,"y660");
//}
//print_double(md->grid_cell_env,CAMP_NUM_ENV_PARAM_,"env689");
//double *yp = N_VGetArrayPointer(sd->y)+i_cell*md->n_per_cell_dep_var;
//print_double(yp,md->n_per_cell_dep_var,"y660");
//print_double(md->grid_cell_state,md->n_per_cell_state_var,"state688");

If this is no longer needed, please delete, or wrap in an ifdef to check for a debug print mode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 791 to 793
//print_double(atmp1,86,"atmp1766");
int fflag=cudaDevicef(t_0 + t_j, atmp1, acorr,md,sc,&aux_flag);
//print_double(acorr,86,"acorr721");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//print_double(atmp1,86,"atmp1766");
int fflag=cudaDevicef(t_0 + t_j, atmp1, acorr,md,sc,&aux_flag);
//print_double(acorr,86,"acorr721");
int fflag=cudaDevicef(t_0 + t_j, atmp1, acorr,md,sc,&aux_flag);

please remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1078 to 1098
//print_double(md->dtempv,86,"dtempvN_VLinearSum1");
md->dtempv[i]=sc->cv_rl1*md->dzn[i+md->nrows]+md->cv_acor[i];
//print_double(md->dtempv,86,"dtempvN_VLinearSum2");
md->dtempv[i]=sc->cv_gamma*md->dftemp[i]-md->dtempv[i];
//print_double(md->dtempv,86,"dtempvcv_lsolve1");
solveBcgCudaDeviceCVODE(md, sc);
__syncthreads();
#ifdef CAMP_PROFILE_DEVICE_FUNCTIONS
if(threadIdx.x==0) sc->dtBCG += ((double)(int)(clock() - start))/(clock_khz*1000);
#endif
md->dtempv[i] = md->dx[i];
//print_double(md->dtempv,86,"dtempvcv_lsolve2");
__syncthreads();
cudaDeviceVWRMS_Norm_2(md->dx, md->dewt, &del, md->n_shr_empty);
md->dftemp[i]=md->dcv_y[i]+md->dtempv[i];
__syncthreads();
//print_double(md->dcv_y,86,"dcv_y2994");
//print_double(md->dftemp,86,"cv_ftemplsolve");
int guessflag=CudaDeviceguess_helper(0., md->dftemp,
md->dcv_y, md->dtempv, md->dtempv1,md->dtempv2, &aux_flag, md, sc
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//print_double(md->dtempv,86,"dtempvN_VLinearSum1");
md->dtempv[i]=sc->cv_rl1*md->dzn[i+md->nrows]+md->cv_acor[i];
//print_double(md->dtempv,86,"dtempvN_VLinearSum2");
md->dtempv[i]=sc->cv_gamma*md->dftemp[i]-md->dtempv[i];
//print_double(md->dtempv,86,"dtempvcv_lsolve1");
solveBcgCudaDeviceCVODE(md, sc);
__syncthreads();
#ifdef CAMP_PROFILE_DEVICE_FUNCTIONS
if(threadIdx.x==0) sc->dtBCG += ((double)(int)(clock() - start))/(clock_khz*1000);
#endif
md->dtempv[i] = md->dx[i];
//print_double(md->dtempv,86,"dtempvcv_lsolve2");
__syncthreads();
cudaDeviceVWRMS_Norm_2(md->dx, md->dewt, &del, md->n_shr_empty);
md->dftemp[i]=md->dcv_y[i]+md->dtempv[i];
__syncthreads();
//print_double(md->dcv_y,86,"dcv_y2994");
//print_double(md->dftemp,86,"cv_ftemplsolve");
int guessflag=CudaDeviceguess_helper(0., md->dftemp,
md->dcv_y, md->dtempv, md->dtempv1,md->dtempv2, &aux_flag, md, sc
);
md->dtempv[i]=sc->cv_rl1*md->dzn[i+md->nrows]+md->cv_acor[i];
md->dtempv[i]=sc->cv_gamma*md->dftemp[i]-md->dtempv[i];
solveBcgCudaDeviceCVODE(md, sc);
__syncthreads();
#ifdef CAMP_PROFILE_DEVICE_FUNCTIONS
if(threadIdx.x==0) sc->dtBCG += ((double)(int)(clock() - start))/(clock_khz*1000);
#endif
md->dtempv[i] = md->dx[i];
__syncthreads();
cudaDeviceVWRMS_Norm_2(md->dx, md->dewt, &del, md->n_shr_empty);
md->dftemp[i]=md->dcv_y[i]+md->dtempv[i];
__syncthreads();
int guessflag=CudaDeviceguess_helper(0., md->dftemp,
md->dcv_y, md->dtempv, md->dtempv1,md->dtempv2, &aux_flag, md, sc
);

please remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1116 to 1144
//print_double(md->cv_acor,86,"cv_acor1060");
//print_double(md->dcv_y,86,"dcv_y1060");
if (m > 0) {
sc->cv_crate = SUNMAX(0.3 * sc->cv_crate, del / delp);
}
dcon = del * SUNMIN(1.0, sc->cv_crate) / md->cv_tq[4+blockIdx.x*(NUM_TESTS + 1)];
flag_shr2[0]=0;
__syncthreads();
if (dcon <= 1.) {
//print_double(md->cv_acor,86,"cv_acor1505");
//print_double(md->dewt,86,"dewt1505");
cudaDeviceVWRMS_Norm_2(md->cv_acor, md->dewt, &sc->cv_acnrm, md->n_shr_empty);
//print_double(&sc->cv_acnrm,1,"cv_acnrm1151");
__syncthreads();
sc->cv_jcur = 0;
__syncthreads();
return CV_SUCCESS;
}
m++;
if ((m == md->cv_maxcor) || ((m >= 2) && (del > RDIV * delp))) {
if (!(sc->cv_jcur)) {
return TRY_AGAIN;
} else {
return RHSFUNC_RECVR;
}
}
delp = del;
__syncthreads();
//print_double(md->dcv_y,86,"dcv_y1137");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//print_double(md->cv_acor,86,"cv_acor1060");
//print_double(md->dcv_y,86,"dcv_y1060");
if (m > 0) {
sc->cv_crate = SUNMAX(0.3 * sc->cv_crate, del / delp);
}
dcon = del * SUNMIN(1.0, sc->cv_crate) / md->cv_tq[4+blockIdx.x*(NUM_TESTS + 1)];
flag_shr2[0]=0;
__syncthreads();
if (dcon <= 1.) {
//print_double(md->cv_acor,86,"cv_acor1505");
//print_double(md->dewt,86,"dewt1505");
cudaDeviceVWRMS_Norm_2(md->cv_acor, md->dewt, &sc->cv_acnrm, md->n_shr_empty);
//print_double(&sc->cv_acnrm,1,"cv_acnrm1151");
__syncthreads();
sc->cv_jcur = 0;
__syncthreads();
return CV_SUCCESS;
}
m++;
if ((m == md->cv_maxcor) || ((m >= 2) && (del > RDIV * delp))) {
if (!(sc->cv_jcur)) {
return TRY_AGAIN;
} else {
return RHSFUNC_RECVR;
}
}
delp = del;
__syncthreads();
//print_double(md->dcv_y,86,"dcv_y1137");
if (m > 0) {
sc->cv_crate = SUNMAX(0.3 * sc->cv_crate, del / delp);
}
dcon = del * SUNMIN(1.0, sc->cv_crate) / md->cv_tq[4+blockIdx.x*(NUM_TESTS + 1)];
flag_shr2[0]=0;
__syncthreads();
if (dcon <= 1.) {
cudaDeviceVWRMS_Norm_2(md->cv_acor, md->dewt, &sc->cv_acnrm, md->n_shr_empty);
__syncthreads();
sc->cv_jcur = 0;
__syncthreads();
return CV_SUCCESS;
}
m++;
if ((m == md->cv_maxcor) || ((m >= 2) && (del > RDIV * delp))) {
if (!(sc->cv_jcur)) {
return TRY_AGAIN;
} else {
return RHSFUNC_RECVR;
}
}
delp = del;
__syncthreads();

please remove the prints

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, please remove all of the commented out prints, or wrap them in debug ifdefs.

Also, I feel like some of the __syncthreads may be superfluous, but I could be very wrong.

Copy link
Collaborator Author

@cguzman95 cguzman95 Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove 3 quarters of ___syncthreads (from 200 calls to 50)

Comment on lines -431 to -474
/** \brief Calculate the time derivative \f$f(t,y)\f$ for only some specific
* types
*
* \param model_data Pointer to the model data
* \param time_deriv TimeDerivative to use to build derivative array
* \param time_step Current model time step (s)
*/
#ifdef CAMP_USE_SUNDIALS
void rxn_calc_deriv_specific_types(ModelData *model_data,
TimeDerivative time_deriv,
realtype time_step) {
// Get the number of reactions
int n_rxn = model_data->n_rxn;

// Loop through the reactions advancing the rxn_data pointer each time
for (int i_rxn = 0; i_rxn < n_rxn; i_rxn++) {
// Get pointers to the reaction data
int *rxn_int_data =
&(model_data->rxn_int_data[model_data->rxn_int_indices[i_rxn]]);
double *rxn_float_data =
&(model_data->rxn_float_data[model_data->rxn_float_indices[i_rxn]]);
double *rxn_env_data =
&(model_data->grid_cell_rxn_env_data[model_data->rxn_env_idx[i_rxn]]);

// Get the reaction type
int rxn_type = *(rxn_int_data++);

// Call the appropriate function
switch (rxn_type) {
case RXN_HL_PHASE_TRANSFER:
rxn_HL_phase_transfer_calc_deriv_contrib(model_data, time_deriv,
rxn_int_data, rxn_float_data,
rxn_env_data, time_step);
break;
case RXN_SIMPOL_PHASE_TRANSFER:
rxn_SIMPOL_phase_transfer_calc_deriv_contrib(
model_data, time_deriv, rxn_int_data, rxn_float_data, rxn_env_data,
time_step);
break;
}
}
}
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not use in any part of the code. It is some old code that I added time ago.

Copy link
Collaborator

@mattldawson mattldawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this is a lot of work! I was able to build and run the tests, and they pass (cpu only). I didn't look in detail at the cuda code, but the changes to the rest of the code look mostly ok as far as I can tell. I added some comments, but mostly minor things.

Comment on lines 5 to 8
* Dawson, M. L., Guzman, C., Curtis, J. H., Acosta, M., Zhu, S., Dabdub, D., Conley, A., West, M., Riemer, N., and Jorba, O.: Chemistry Across Multiple Phases (CAMP) version 1.0: an integrated multiphase chemistry model, Geosci. Model Dev., 15, 3663–3689, https://doi.org/10.5194/gmd-15-3663-2022, 2022.
* M. Dawson, C. Guzman, J. H. Curtis, M. Acosta, S. Zhu, D. Dabdub,
A. Conley, M. West, N. Riemer, and O. Jorba (2021),
Chemistry Across Multiple Phases (CAMP) version 1.0: An
Integrated multi-phase chemistry model, in preparation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this overwrote the correct reference

@@ -79,8 +90,11 @@ program box_model
camp_state%state_var( idx_O2 )
end do

deallocate( camp_core )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that line in my last commit and is working fine, so It was fixed somewhere in the way. I will push that last commit with the rest of the changes.

I think it was failing because a missing " call camp_mpi_finalize( )" or something like that

@@ -108,7 +108,6 @@ program box_model
camp_state%state_var( idx_O2 )
end do

deallocate( camp_core )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be deleted?

@@ -170,7 +170,6 @@ program box_model
#endif
!! [output]

deallocate( camp_core )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be deleted?

Comment on lines 1 to 36
@article{Tie2003,
author = {Tie, Xuexi and Emmons, Louisa and Horowitz, Larry and Brasseur, Guy and Ridley, Brian and Atlas, Elliot and Stround, Craig and Hess, Peter and Klonecki, Andrzej and Madronich, Sasha and Talbot, Robert and Dibb, Jack},
title = {Effect of sulfate aerosol on tropospheric NOx and ozone budgets: Model simulations and TOPSE evidence},
journal = {Journal of Geophysical Research: Atmospheres},
volume = {108},
number = {D4},
pages = {},
keywords = {tropospheric aerosol, NOx, ozone},
doi = {https://doi.org/10.1029/2001JD001508},
url = {https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2001JD001508},
eprint = {https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2001JD001508},
abstract = {The distributions of NOx and O3 are analyzed during TOPSE (Tropospheric Ozone Production about the Spring Equinox). In this study these data are compared with the calculations of a global chemical/transport model (Model for OZone And Related chemical Tracers (MOZART)). Specifically, the effect that hydrolysis of N2O5 on sulfate aerosols has on tropospheric NOx and O3 budgets is studied. The results show that without this heterogeneous reaction, the model significantly overestimates NOx concentrations at high latitudes of the Northern Hemisphere (NH) in winter and spring in comparison to the observations during TOPSE; with this reaction, modeled NOx concentrations are close to the measured values. This comparison provides evidence that the hydrolysis of N2O5 on sulfate aerosol plays an important role in controlling the tropospheric NOx and O3 budgets. The calculated reduction of NOx attributed to this reaction is 80 to 90\% in winter at high latitudes over North America. Because of the reduction of NOx, O3 concentrations are also decreased. The maximum O3 reduction occurs in spring although the maximum NOx reduction occurs in winter when photochemical O3 production is relatively low. The uncertainties related to uptake coefficient and aerosol loading in the model is analyzed. The analysis indicates that the changes in NOx due to these uncertainties are much smaller than the impact of hydrolysis of N2O5 on sulfate aerosol. The effect that hydrolysis of N2O5 on global NOx and O3 budgets are also assessed by the model. The results suggest that in the Northern Hemisphere, the average NOx budget decreases 50\% due to this reaction in winter and 5\% in summer. The average O3 budget is reduced by 8\% in winter and 6\% in summer. In the Southern Hemisphere (SH), the sulfate aerosol loading is significantly smaller than in the Northern Hemisphere. As a result, sulfate aerosol has little impact on NOx and O3 budgets of the Southern Hemisphere.},
year = {2003}
}
@article{Wennberg2018,
author = {Wennberg, Paul O. and Bates, Kelvin H. and Crounse, John D. and Dodson, Leah G. and McVay, Renee C. and Mertens, Laura A. and Nguyen, Tran B. and Praske, Eric and Schwantes, Rebecca H. and Smarte, Matthew D. and St Clair, Jason M. and Teng, Alexander P. and Zhang, Xuan and Seinfeld, John H.},
title = {Gas-Phase Reactions of Isoprene and Its Major Oxidation Products},
journal = {Chemical Reviews},
volume = {118},
number = {7},
pages = {3337-3390},
year = {2018},
doi = {10.1021/acs.chemrev.7b00439},
note ={PMID: 29522327},
URL = {https://doi.org/10.1021/acs.chemrev.7b00439},
eprint = {https://doi.org/10.1021/acs.chemrev.7b00439}
}
@techreport{JPL15,
author = {J. B. Burkholder, S. P. Sander, J. Abbatt, J. R. Barker, R. E. Huie, C. E. Kolb, M. J. Kurylo, V. L. Orkin, D. M.
Wilmouth, and P. H. Wine},
title = {Chemical Kinetics and Photochemical Data for Use in Atmospheric Studies, Evaluation No. 18 JPL Publication 15-10},
institution = {Jet Propulsion Laboratory},
location = {Pasadena},
year = {2015},
url = {http://jpldataeval.jpl.nasa.gov}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are still needed for some of the newer reaction types

!> A new MONARCH interface
type(monarch_interface_t), pointer :: new_obj
!> Path to the PartMC-camp configuration file list
function constructor(camp_config_file, output_file_title, &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were the comments removed from this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was easier for me to understand the file. I wanted to just put the minimum and work from there.

!> Interface for the MONACH model and PartMC-camp
module camp_monarch_interface
!> Interface for the MONACH model and CAMP-camp
module camp_monarch_interface_2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this renamed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is not needed, I passed so much time with 2 camp_monarch_interface that I get used to that and don't even think it can be renamed.

Comment on lines 421 to 422
!print*,"MONARCH_conc381",MONARCH_conc(i,j,k,this%map_monarch_id(:))
!print*,"state_var421",this%camp_state%state_var(:)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 826 to 830
type(camp_monarch_interface_t), intent(inout) :: this
if (associated(this%camp_core)) deallocate(this%camp_core)

end subroutine finalize

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these pointer deallocations removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it failed, but it seems to be fixed now, so I will left it as before.

Copy link
Collaborator Author

@cguzman95 cguzman95 Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the error: Some variables like "init_conc_camp_id" are set only for rank 0, so it fails when using 40 MPI processes. Expect a new commit, also with another fix for the nGPUs set

!> Number of time steps to integrate over
integer, parameter :: NUM_TIME_STEP = 5
!> Index for water vapor in water_conc()
integer(kind=i_kind), parameter :: NUM_EBI_SPEC = 72
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were the comments removed from this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as I commented before: It was easier for me to understand the file. I wanted to just put the minimum and work from there.

@cguzman95
Copy link
Collaborator Author

My only real concern is with all of the syncthreads, but maybe they're all needed? When I get time, I'll try to run everything on NCAR's supercomputers

Only some of them are needed. I didn't remove them because it shouldn't affect performance and are useful for debugging

@cguzman95
Copy link
Collaborator Author

Pushed a new commit with the changes from your suggestions.

@cguzman95
Copy link
Collaborator Author

cguzman95 commented Feb 26, 2024

Run the GPU version requires using CVODE from the GitHub repository (The one zipped is not updated). We plan to upload a "first_install.sh" to download all dependencies.

Copy link
Collaborator

@mattldawson mattldawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cguzman95 - I lost track of this, but it if you've addressed all the comments, I'm fine with merging it in. (Although it looks like some of Kyle's comments still need to be addressed)

@cguzman95
Copy link
Collaborator Author

Hi @mattldawson, yes, please merge it. I addressed Kyle'sc comments; the pending comments were about removing some prints, which I already did but forgot to comment on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants