-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a mistake in script MixtureEstimationMethods.py? #81
Comments
Hello, Thanks for your interest and for closely looking at ipdSummary.py. I would like to try to help answer your question.
The objective function in the optimization is mixModelFn, defined on line 81 of MixtureEstimationMethods.py:
# Return value of mixture model log likelihood function
def mixModelFn(self, p, a0, a1):
tmp = (1 - p) * a0 + p * a1
return -np.log(tmp[np.nonzero(tmp)]).sum()
# return -np.ma.log( tmp ).sum()
The input arguments are:
p = methylated fraction
a0 = value of Gaussian pdf for un-methylated IPDs: N( mu0, sd0 )
a1 = value of Gaussian pdf for methylated IPDs: N( mu1, sd1 )
When fminbound is used to do single-variable optimization for a function like mixModelFn, the optimization is done only over the first input argument – in this case, p. The remaining arguments (a0 and a1) are passed to fminbound as fixed values.
Now, here is estimateSingleFraction, defined in line 96 of MixtureEstimationMethods.py:
The input arguments are:
mu1 = mean of log IPDs for methylated case
data = observed log IPDs
mu0 = mean of log IPDs for un-methylated case
L = number of observations in the input data
optProp = if set to True, returns optimal mixing proportion.
1. First, the observed log IPDs and the fixed values of mu1 and mu0 are used to calculate the values of the Gaussian pdf N( mu0, sd0 ) and N( mu1, sd1 )
2. Now, single-variable optimization is done using the objective function defined in mixModelFn. The values of a0 and a1 are fixed, and the variable is the mixing proportion.
# Return optimum argument (mixing proportion) of mixture model log likelihood function.
def estimateSingleFraction(self, mu1, data, mu0, L, optProp=True):
# NOTE: ignoring the warnings here is sloppy, should be looked at later.
with np.errstate(all="ignore"):
a0 = self.replaceScipyNormPdf(data, mu0)
a1 = self.replaceScipyNormPdf(data, mu1)
# if f'(0) < 0 (equ. a1/a0 < L), then f'(1) < 0 as well and
# solution p-hat <= 0
if np.divide(a1, a0).sum() <= L:
res = 0.0
# if f'(1) > 0 (equ. a0/a1 < L), then f'(0) > 0 as well and
# solution p-hat >= 1
elif np.divide(a0, a1).sum() <= L:
res = 1.0
else:
# unconstrained minimization of convex, single-variable function
res = fminbound(self.mixModelFn, 0.01, 0.99,
args=(a0, a1), xtol=1e-02)
if optProp:
# return the optimal proportion
return res
else:
# return the corresponding log likelihood function value
return self.mixModelFn(res, a0, a1)
Does this help at all? Please feel free to write back if not.
Thanks again for your interest in this,
From: xuyw <[email protected]>
Sent: Monday, November 30, 2020 2:49 AM
To: PacificBiosciences/kineticsTools <[email protected]>
Cc: Subscribed <[email protected]>
Subject: [PacificBiosciences/kineticsTools] Is there a mistake in script MixtureEstimationMethods.py? (#81)
[EXTERNAL MESSAGE] Be mindful when clicking links or attachments
Hi authors,
I was interested in how ipdSummary gives out the frac value, so I read the source code and I was confused about this function:
[Image removed by sender. image]<https://user-images.githubusercontent.com/28685819/100600251-02070180-333c-11eb-81d2-43c4f378619b.png>
Why estimateSingleFraction is optimized to estimate mu1 ( which I guess is the mean IPD value if methylated bases)?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#81>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AANKTE7I4BD62YGDL6J6HN3SSN2DJANCNFSM4UHMSQSA>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi authors,
I was interested in how ipdSummary gives out the
frac
value, so I read the source code and I was confused about this function:Why estimateSingleFraction is optimized to estimate mu1 ( which I guess is the mean IPD value if methylated bases)?
The text was updated successfully, but these errors were encountered: