Question about optimal transport and possibilities of OptimalTransport.jl #157

ignace-computing · 2021-12-22T12:39:54Z

Sorry if this is not the right place to ask this kind of questions.

I am looking for an algorithm that has the following characteristics (see below).
Is there any functionality in OptimalTransport.jl that could be used for this?

Many thanks!

INPUT:

histogram H,
desired mean and variance (M,V)

OUTPUT:

histogram H' that has desired mean and variance (M,V) but that also has a shape that is obtained by adapting H, while somehow minimizing some cost that penalizes large modifications w.r.t. H.

(I can of course give more information about the problem is the description is not very clear to you).

The text was updated successfully, but these errors were encountered:

davibarreira · 2021-12-22T13:22:28Z

When you say histogram, do you mean the sum of diracs?

davibarreira · 2021-12-22T13:25:26Z

Do you have a more formal (mathematical) description of your problem? I don't think there is an easy function that does what you are looking for. It looks like more as an optimization problem, such as Linear Programming.

devmotion · 2021-12-22T13:38:07Z

The p-Wasserstein distance between two univariate histograms H and H' (ie. sum of Diracs) with the same number n of components is just 1/n ||H - H'||_p^p (see e.g. eq 2.33 in Computational Optimal Transport). Thus based on the p-Wasserstein distance it seems you could (try to) solve the optimization problem

min_{H'} 1/n ||H - H'||_p^p
s.t. 1/n \sum_{i=1}^n H'_i = mean
     1/n \sum_{i=1}^n (H'_i - mean)^2 = variance

Of course, the factor 1/n in the objective function could be ignored.

Edit: An important assumption that I forgot to mention is of course that both H and H' are sorted.

devmotion · 2021-12-22T14:18:46Z

It seems this optimization problem can be solved analytically at least for p = 2. Using Langrange multipliers it seems (better check it carefully, I just did a quick sketch) one obtains the possible optima H' = m - (H - mean(H)) / std(H) * s and H' = m + (H - mean(H)) / std(H) * s where m is the desired mean and s = sqrt(variance) is the desired standard deviation. Clearly, both satisfy mean(H') = m and var(H') = s^2 = variance. However, only H' = m + (H - mean(H)) / std(H) * s is sorted in ascending order (H is sorted in ascending order!), the other possible solution is sorted in reverse order. Hence H' = m + (H - mean(H)) / std(H) * s seems to be the unique solution of the optimization problem.

ignace-computing · 2022-01-04T11:15:52Z

Dear @davibarreira end @devmotion, thank you for having answered my question!
And happy new year of course, meanwhile.

I think that what @devmotion suggests is perfectly correct, it looks like a good idea to try to minimize the p-Wasserstein distance, and your solution to the optimization problem also intuitively makes sense. However, it might not work in the setting I am envisaging. Therefore let me clarify:

We have a grid x,
on on that grid we define a (discretized) density function u.
More specifically,
in each grid cell x_i,
u_i is the average value of u (integrated over the grid cell).
So let the pair (x,u) define a histogram H.
The mean value of H is given by mu = u’*x and the variance equals var = u’*(x – mu).^2

Now we wish to obtain a new histogram H’, thus a pair (x’,u’), with the desired properties (namely, mean and variance (M,V)). It is also desired that x’=x (this makes it different from what you suggested.)

ignace-computing · 2022-01-04T11:18:38Z

@davibarreira Is it now more clear what I meant with histogram? It's like, you specify a number of bins and then you evaluate a probability density function on that grid.

davibarreira · 2022-01-04T12:40:28Z

Yes, it's clear. But the package does not have a solution to your problem. Is your grid 1D? If yes, you could use the solution @devmotion proposed. Otherwise, I don't think there is an easy answer.

ignace-computing · 2022-01-04T15:08:05Z

Yeah, so just for completeness let me add the form of the optimization problem that I am wishing to solve.
x and H are vectors of the same dimension carrying the grid an the probability density thereon, respectively. The ' symbol denotes transpose (for clarity never applied to H or H')

min_{H'} ||H - H'||_p^p
s.t. x'*H' = mean
     (x-mean)'.^2*H' = variance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about optimal transport and possibilities of OptimalTransport.jl #157

Question about optimal transport and possibilities of OptimalTransport.jl #157

ignace-computing commented Dec 22, 2021

davibarreira commented Dec 22, 2021

davibarreira commented Dec 22, 2021

devmotion commented Dec 22, 2021 •

edited

Loading

devmotion commented Dec 22, 2021 •

edited

Loading

ignace-computing commented Jan 4, 2022 •

edited

Loading

ignace-computing commented Jan 4, 2022

davibarreira commented Jan 4, 2022

ignace-computing commented Jan 4, 2022

Question about optimal transport and possibilities of OptimalTransport.jl #157

Question about optimal transport and possibilities of OptimalTransport.jl #157

Comments

ignace-computing commented Dec 22, 2021

davibarreira commented Dec 22, 2021

davibarreira commented Dec 22, 2021

devmotion commented Dec 22, 2021 • edited Loading

devmotion commented Dec 22, 2021 • edited Loading

ignace-computing commented Jan 4, 2022 • edited Loading

ignace-computing commented Jan 4, 2022

davibarreira commented Jan 4, 2022

ignace-computing commented Jan 4, 2022

devmotion commented Dec 22, 2021 •

edited

Loading

devmotion commented Dec 22, 2021 •

edited

Loading

ignace-computing commented Jan 4, 2022 •

edited

Loading