Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[luci/pass] introduce quantize weight with GPTQ pass #14285

Closed
wants to merge 6 commits into from

Conversation

BLee-bot
Copy link
Contributor

This introduces quantize weight with GPTQ pass

ONE-DCO-1.0-Signed-off-by: Banseok Lee [email protected]

Related Issue : #13480
Draft PR: #13585

int channel_dim_index, std::vector<float> &scaling_factor,
std::vector<float> &max, std::vector<float> &min)
{
int channel_idx = indices[channel_dim_index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some assertion checks for input arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indices[channel_dim_index]; may produce out of bounds problem and produce segmentation fault.

Comment on lines 253 to 254
void compute_asym_scale_zp(float min, float max, float &scaling_factor, int64_t &zp,
float &nudged_min, float &nudged_max, int32_t k_max_scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some arguments are input and some are outputs.
it would be nice to add some comments about this.
k_max_scale would be better to group with other inputs.

cholesky_inverse(hessian, size_hessian);
cholesky_decomposition(hessian, size_hessian);

// transpose hessian to make upper trangular
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trangular ?

error[cal_offset(dimension, indices)] =
(data - (quantized_values[cal_offset(dimension, indices)] - zp[channel_idx]) *
scaling_factor[channel_idx]) /
hessian[cal_offset_2d(dimension_hessian, indices_diag_hessian)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets split hessian[cal_offset_2d(dimension_hessian, indices_diag_hessian)] to

auto offset = cal_offset_2d(dimension_hessian, indices_diag_hessian)
hessian[offset];

private:
void fake_quantize_cwq(luci::CircleConst *weights, std::vector<float> &hessian) const
{
// assert(output_type == loco::DataType::U8); // FIX_CALLER_UNLESS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert(output_type == loco::DataType::U8); // FIX_CALLER_UNLESS

auto new_weights = luci::clone(weights);
node->filter(new_weights);

auto hessian = (*hessian_map)[node];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this hessian_map declared?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to use class and use _ prefix to member variables.
this class is not simple to declare as struct

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to a class and added _ prefix to member variables.

Comment on lines 7 to 9
{
struct QuantizeWeightsWithGPTQPassTest : public ::testing::Test
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
struct QuantizeWeightsWithGPTQPassTest : public ::testing::Test
{
{
struct QuantizeWeightsWithGPTQPassTest : public ::testing::Test
{

struct QuantizeWeightsWithGPTQPassTest : public ::testing::Test
{
/**
* nconv graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is nconv ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied MakeGraph code blocks from "compiler/luci/pass/src/QuantizeWeightsPass.test.cpp"
It seems that this should change to conv2d

Comment on lines 20 to 26
#include <loco.h>

#include <logo/Pass.h>

#include <luci/Pass/QuantizationParameters.h>
#include <luci/IR/CircleNode.h>
#include <unordered_map>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <loco.h>
#include <logo/Pass.h>
#include <luci/Pass/QuantizationParameters.h>
#include <luci/IR/CircleNode.h>
#include <unordered_map>
#include <luci/Pass/QuantizationParameters.h>
#include <luci/IR/CircleNode.h>
#include <logo/Pass.h>
#include <loco.h>
#include <unordered_map>

#include <iostream>
#include <cmath>
#include <functional>
#include <limits>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is limits necessary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary. I'll remove it

This patch includes minor refactoring to improve overall code quality.

ONE-DCO-1.0-Signed-off-by: y01000.you <[email protected]>
using HessianMap = std::unordered_map<const luci::CircleNode *, std::vector<float>>;

/**
* @brief Pass to quantize weights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this brief doesn't give any help. plz remove or give more useful description.


private:
std::unique_ptr<Context> _ctx;
HessianMap *_hessian_map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HessianMap *_hessian_map;
HessianMap *_hessian_map = nullptr;

@@ -292,6 +292,11 @@ uint32_t cal_offset(loco::TensorShape &dimension, uint32_t *indices)
indices[2] * dimension.dim(3).value() + indices[3];
}

uint32_t cal_offset_2d(loco::TensorShape &dimension, uint32_t *indices)
{
return indices[0] * dimension.dim(1).value() + indices[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know caller is calling with safe index.
It would be better to add some assertion check before accessing ptr and array.

}
}
}
return;
Copy link
Contributor

@seanshpark seanshpark Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need to add return here?

Comment on lines +111 to +112
std::cout << "Error: Matrix is not positive definite." << std::endl;
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the caller know about this error?

{
damp += hessian[i * size_hessian + i];
}
damp /= size_hessian;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check size_hessian != 0

zp = nudged_zero_point;
}

void asymmetric_wquant_per_channel(CircleConst *node, std::vector<float> &min,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is quite long and have many depths that may increase complexity.
plz split this to several methods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to divide the method into smaller parts, but if there are still parts that should be further divided, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to split to small methods under some certain lines, line 20 or 30.

Comment on lines 413 to 415
error[cal_offset(dimension, indices)] =
(data - (quantized_values[cal_offset(dimension, indices)] - zp[channel_idx]) *
scaling_factor[channel_idx]) /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to extract cal_offset(dimension, indices) to some variable and use it like error[index].
and to quantized_values[index] - zp[channel_idx] too

auto _h_offset = cal_offset_2d(dimension_hessian, indices_hessain);

node->at<loco::DataType::FLOAT32>(cal_offset(dimension, indices_channel_first)) -=
error[cal_offset(dimension, indices_error)] * hessian[_h_offset];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto to cal_offset(dimension, indices_error)

y01000.you added 2 commits November 1, 2024 10:23
This commit adds a description to the QuantizeWeightsWithGPTQPass class. Additionally, some variables have been extracted from the function parameters to improve readability and maintainability.

ONE-DCO-1.0-Signed-off-by: y01000.you <[email protected]>
This commit adds support for the GPTQ algorithm and hessian map to the CircleQuantizer class in LUCI.

ONE-DCO-1.0-Signed-off-by: y01000.you <[email protected]>
Comment on lines +254 to +263
/**
* @brief Compute the scale and zero point for the given range of values
* @param min: The minimum value in the range of values to be quantized.
* @param max: The maximum value in the range of values to be quantized.
* @param k_max_scale: The maximum value of the quantization scale.
* @param scaling_factor: The computed scaling factor for the quantization.
* @param zp: The computed zero point for the quantization.
* @param nudged_min: The nudged minimum value after applying the scaling factor and zero point.
* @param nudged_max: The nudged maximum value after applying the scaling factor and zero point.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it's ok to add explanations for each arguments but as other functions don't provide this, so we don't need to add helps like this.
If you really want to add this much of comments, it would be better add to others too... but I don't recommend that.

As I wrote, some arguments are input and some are outputs and this can't be identified at sight when you previously had k_max_scale at the end but it was used as input.

Please read my comments carefully.

I recommend to remove all these comments if not really necessary understanding this function.

@@ -0,0 +1,697 @@
/*
* Copyright (c) 2024 Samsung Electronics Co., Ltd. All Rights Reserved
* Copyright 2019 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what TF code did you reference?

}

luci::QuantizeWeightsWithGPTQPass pass(std::move(ctx), &hessian_map);
EXPECT_NO_THROW(pass.run(&_g));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some value tests? It is difficult for me to ensure that those implementations (cholesky things) are correct.

This commit refactors the QuantizeWeightsWithGPTQPass.cpp file to improve its readability and maintainability.

ONE-DCO-1.0-Signed-off-by: y01000.you <[email protected]>
@seanshpark
Copy link
Contributor

@01000-you , @BLee-bot , there are lots of changes and mixed files
that got lots of comments even I cannot remember what I've left.

Please split this PR to several PRs, like under 50 lines, that I and you guys can manage.

@seanshpark
Copy link
Contributor

you can split PRs with

  • CircleQuantizer.h
  • QuantizationUtils.h and cpp
  • QuantizeWeightsWithGPTQPass.h and cpp

that doesn't exceed 50 lines with single context of change.

context means, introducing something, removing something, changing existing codes for single purpose.

@seanshpark seanshpark closed this Nov 4, 2024
@01000-you
Copy link

@seanshpark, @jinevening

Thank you for your kind reviews. I'll follow your guide and upload it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants