Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError with PrototypicalCalibrationBlock #73

Open
gladdduck opened this issue Apr 10, 2024 · 5 comments
Open

OutOfMemoryError with PrototypicalCalibrationBlock #73

gladdduck opened this issue Apr 10, 2024 · 5 comments

Comments

@gladdduck
Copy link

Hello, when I train my dataset using DeFRCN, I encountered an issue. The base training process goes smoothly, but when I attempt K-shot finetuning, I keep getting an OutOfMemoryError.

I tried to solve it and found that when setting PCB_ENABLE to False, this issue doesn't occur.

However, when PCB_ENABLE is set to True, even if I adjust IMS_PER_BATCH to 1 on A100-40G, I still encounter the OutOfMemoryError.

Has anyone else experienced a similar issue? How was it resolved?

@cnjhh
Copy link

cnjhh commented Apr 11, 2024

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

@gladdduck
Copy link
Author

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

thanks for your reply! this works!

@gladdduck
Copy link
Author

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

However, this error still occurs from time to time.
the code locate in calibration_layer py build_prototypes function
features = self.extract_roi_features(img, boxes)
extract_roi_features function
conv_feature = self.imagenet_model(images.tensor[:, [2, 1, 0]])[
I'm very confused about this, even though I used gc.collect() and torch.cuda.empty_cache()

@cnjhh
Copy link

cnjhh commented Apr 19, 2024

features = self.extract_roi_features(img, boxes) boxes = None img = None all_features.append(features.cpu().data) features = None

features create by you customs datasets for novel classes ,you can solve this by reducing the number of novel classes, or generate features offline, instead of loading the novel datas to train it when the model validated, save it through the pickle module, and then modify the code to load the offline trained one directly during validation

@cnjhh
Copy link

cnjhh commented Apr 19, 2024

The device I use is A800 80G, and the novel data I set is 10 shot 13class, and when the model is loaded with the pcb module, the video memory occupies 53G, and 80G is not enough before the modification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants