Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add positional_encodings_ layer to Dlib #3019

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Cydral
Copy link
Contributor

@Cydral Cydral commented Sep 24, 2024

This pull request introduces a new layer, positional_encodings_, to the Dlib library.
The positional_encodings_ layer adds "positional encodings" to the input tensor, which is particularly useful for models processing sequential data, such as transformers. The positional encodings are computed using sine and cosine functions of different frequencies, as described in the paper "Attention is All You Need" by Vaswani et al. This enhancement aims to provide positional information to the model, improving its ability to understand the order and position of elements in a sequence.

The implementation includes methods for setup, forward propagation, and backward propagation, along with serialization support.

Cydral and others added 8 commits September 16, 2024 13:45
* remove using namespace std from headers

* more std::

* more std::

* more std:: on windows stuff

* remove uses of using namespace std::chrono

* do not use C++17 features

* Add Davis suggestion

* revert some more stuff

* revert removing include

* more std::chrono stuff
@Cydral Cydral changed the title Add positional encodings layer to Dlib Add positional_encodings_ layer to Dlib Sep 24, 2024
@pfeatherstone
Copy link
Contributor

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

@Cydral
Copy link
Contributor Author

Cydral commented Sep 25, 2024

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

It's obviously an interesting question and yes, I can confirm that it's also a huge job to add such layers to Dlib, but the idea here isn't necessarily to build a neural network that's exactly the same as what you'll find in other libraries. For example, I'm studying in parallel the impact of convolution layers to replace certain ‘linear’ type layers that you'll actually find in PyTorch.
Of course, the absence of an autograd function also means that we have to design derivatives specifically for the gradient, but it's also a way of getting a better handle on what we're doing. So far I've managed to achieve some interesting results (we'd been trying for a while to introduce a mechanism for predicting the ‘next’ sequence in Dlib using a feed forward principle).
In particular, I've managed to get the network to memorise sequences of text, which I think is already a good start, and even if I decide not to go to the end of this "exercise", the new layers I've added could also be used for specific processing on images too.
In this respect, I think we have a lot of great things to do with Dlib. This isn't the first time I've read about developers moving on to PyTorch, but if nobody tries to develop Dlib any more, then let's shut down development of this library... even though it's still very efficient for many tasks. Adding new functions does not, in principle, oblige the developer to maintain it all. It's also the role of the community around this library to help fix any functional problems it sees (as I sometimes do for certain parts contributed by other contributors).

@arrufat
Copy link
Contributor

arrufat commented Sep 25, 2024

Yes, I want to encourage you to keep up with this great work! I can't wait to try training transformer-based networks in dlib :)

And of course, I will try my best to help to maintain this stuff :D

@Cydral
Copy link
Contributor Author

Cydral commented Sep 25, 2024

I'm continuing and making progress... I've just finished reworking gemm() to take into account the matricial dimension of 4D tensors.

@davisking
Copy link
Owner

Yeah it's all good. I am a huge fan of pytorch, but there are also things about it I would do differently. Frankly, if I wasn't extremely busy working at a pre-money startup and trying to ensure we are successful long term (which I feel pretty good about our future), I would be working way more on dlib and making more open source ML stuff in particular.

There is also way more going on in dlib than the deep learning stuff. The deep learning tooling is honestly the least interesting part of dlib for me. There are some pretty sweeping changes I would make to the deep learning stuff and will at some point. Anyway, that's all to say, it doesn't matter how many people use the deep learning parts of dlib. There are tons of other things in it that lots of people use on a huge number of projects too.

And the dnn tooling is very low dependency and easy to compile, which is really it's selling point. And people use that still and that's fine.

That's all to say, this PR is cool. So knock yourself out :)

@Cydral
Copy link
Contributor Author

Cydral commented Sep 30, 2024

There are no more conflicts for this modification. However, the precompilation tests fail because the tril_ class (previously committed in the master branch) is not found in the dnn.cpp program... if you merge the modification, it should still pass, shouldn't it?

@arrufat
Copy link
Contributor

arrufat commented Sep 30, 2024

You should be able to merge the master branch into this one and get the tril_ layer here too, so the tests pass. Something seems off, where you have the tests of that layer, but not the implementation...

@Cydral
Copy link
Contributor Author

Cydral commented Sep 30, 2024

Something went wrong when I retrieved the latest versions after the integration of the tril_ layer... I just deleted the accesses to tril_ in the positional_encodings branch, in the dnn.ccp file. I hope that when the merge is done, nothing will disappear in the master branch.

Comment on lines 4456 to 4591
// ----------------------------------------------------------------------------------------

void test_tril()
{
print_spinner();
using net_type = tag1<tril_mask<tag2<input<matrix<float>>>>>;
net_type net;

// Input tensor
dlib::rand rnd;
const int nr = 2, nc = 3;
constexpr int n_samples = 3, k = 1;
std::vector<matrix<float>> x(n_samples);
matrix<float> xtmp(nr, nc);
for (int ii = 0; ii < n_samples; ++ii) {
for (int jj = 0; jj < nr; ++jj)
for (int kk = 0; kk < nc; ++kk)
xtmp(jj, kk) = rnd.get_random_gaussian();
x[ii] = xtmp;
}

// Convert input matrix to tensor
resizable_tensor input_tensor;
net.to_tensor(&x[0], &x[0] + n_samples, input_tensor);
net.forward(input_tensor);

// Expected output tensor (manually set for comparison)
resizable_tensor expected_output;
expected_output.copy_size(input_tensor);
tt::copy_tensor(false, expected_output, 0, input_tensor, 0, input_tensor.k());
for (int ii = 0; ii < n_samples; ++ii) {
expected_output.host()[tensor_index(expected_output, ii, 0, 0, 1)] = -std::numeric_limits<float>::infinity();
expected_output.host()[tensor_index(expected_output, ii, 0, 0, 2)] = -std::numeric_limits<float>::infinity();
expected_output.host()[tensor_index(expected_output, ii, 0, 1, 2)] = -std::numeric_limits<float>::infinity();
}

// Compare output tensor with expected output
auto& net_output = layer<tag1>(net).get_output();
DLIB_TEST(max(abs(mat(net_output) - mat(expected_output))) < 1e-5);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disappear from master when merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, the problem is that I can't manage to merge the master branch in my fork branches, which contains tril_... if only the tril_ tests are missing from dnn.cpp, I can put them back once all the synchronisation has been done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can't get the master version at Dlib level (i.e. with the recently integrated tril_ class), and after resolving the conflicts, I'm now trying to add the definitions for tril_ once again. If the tests pass, we'll have to merge to see if everything works correctly. If it's confirmed, I'll do the same for all the other PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that seems to work. So I'm going to do the same with the other PRs

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I merge PRs by squashing the PR into one commit and adding it to master, so master will always be fine regardless. You can (and probably should here since things have gotten complicated) squash your branch into one commit too. Having merge commits when you are working with multiple branches is a huge pain, as you are experiencing :D

There are a bunch of ways to squash a branch into one commit. I like to use git reset --soft to just undo all the commits and then recommit them. But there is git merge --squash and probably several other ways. Make a backup branch before you do anything though if you aren't real familiar with what I'm talking about though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but normally you can proceed because I've manually caught up the ‘mixed’ code from the different branches. When everything has been placed under the master branch, I'll remove the branches from my forks to resynchronise the fork and start again from a clean base. All that's left is to rework the last PR for softmaxm, which I'll do today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davis, all the "alignments" have been made to enable the merge to go ahead.

@Cydral
Copy link
Contributor Author

Cydral commented Oct 18, 2024

@davis, There were indeed new conflicts following the integration of the last class that you dealt with. It looks good now, but could you please go through the integration in chronological order and thus consider frist this new class instead. If it's OK and integrated, I'll make the following changes to avoid going back and forth and I'll keep you informed. Thank you in advance for your help and support.

@davisking
Copy link
Owner

Ah you have a merge error or something. Check the PR contents, it's missing all your code changes :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants