Add positional_encodings_ layer to Dlib #3019

Cydral · 2024-09-24T11:08:29Z

This pull request introduces a new layer, positional_encodings_, to the Dlib library.
The positional_encodings_ layer adds "positional encodings" to the input tensor, which is particularly useful for models processing sequential data, such as transformers. The positional encodings are computed using sine and cosine functions of different frequencies, as described in the paper "Attention is All You Need" by Vaswani et al. This enhancement aims to provide positional information to the model, improving its ability to understand the order and position of elements in a sequence.

The implementation includes methods for setup, forward propagation, and backward propagation, along with serialization support.

…PU & CUDA) and Add `add_to` Parameter

* remove using namespace std from headers * more std:: * more std:: * more std:: on windows stuff * remove uses of using namespace std::chrono * do not use C++17 features * Add Davis suggestion * revert some more stuff * revert removing include * more std::chrono stuff

pfeatherstone · 2024-09-25T08:15:33Z

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

Cydral · 2024-09-25T09:10:42Z

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

It's obviously an interesting question and yes, I can confirm that it's also a huge job to add such layers to Dlib, but the idea here isn't necessarily to build a neural network that's exactly the same as what you'll find in other libraries. For example, I'm studying in parallel the impact of convolution layers to replace certain ‘linear’ type layers that you'll actually find in PyTorch.
Of course, the absence of an autograd function also means that we have to design derivatives specifically for the gradient, but it's also a way of getting a better handle on what we're doing. So far I've managed to achieve some interesting results (we'd been trying for a while to introduce a mechanism for predicting the ‘next’ sequence in Dlib using a feed forward principle).
In particular, I've managed to get the network to memorise sequences of text, which I think is already a good start, and even if I decide not to go to the end of this "exercise", the new layers I've added could also be used for specific processing on images too.
In this respect, I think we have a lot of great things to do with Dlib. This isn't the first time I've read about developers moving on to PyTorch, but if nobody tries to develop Dlib any more, then let's shut down development of this library... even though it's still very efficient for many tasks. Adding new functions does not, in principle, oblige the developer to maintain it all. It's also the role of the community around this library to help fix any functional problems it sees (as I sometimes do for certain parts contributed by other contributors).

arrufat · 2024-09-25T11:52:49Z

Yes, I want to encourage you to keep up with this great work! I can't wait to try training transformer-based networks in dlib :)

And of course, I will try my best to help to maintain this stuff :D

Cydral · 2024-09-25T14:04:40Z

I'm continuing and making progress... I've just finished reworking gemm() to take into account the matricial dimension of 4D tensors.

davisking · 2024-09-30T02:08:05Z

Yeah it's all good. I am a huge fan of pytorch, but there are also things about it I would do differently. Frankly, if I wasn't extremely busy working at a pre-money startup and trying to ensure we are successful long term (which I feel pretty good about our future), I would be working way more on dlib and making more open source ML stuff in particular.

There is also way more going on in dlib than the deep learning stuff. The deep learning tooling is honestly the least interesting part of dlib for me. There are some pretty sweeping changes I would make to the deep learning stuff and will at some point. Anyway, that's all to say, it doesn't matter how many people use the deep learning parts of dlib. There are tons of other things in it that lots of people use on a huge number of projects too.

And the dnn tooling is very low dependency and easy to compile, which is really it's selling point. And people use that still and that's fine.

That's all to say, this PR is cool. So knock yourself out :)

Cydral · 2024-09-30T13:02:42Z

There are no more conflicts for this modification. However, the precompilation tests fail because the tril_ class (previously committed in the master branch) is not found in the dnn.cpp program... if you merge the modification, it should still pass, shouldn't it?

arrufat · 2024-09-30T13:15:30Z

You should be able to merge the master branch into this one and get the tril_ layer here too, so the tests pass. Something seems off, where you have the tests of that layer, but not the implementation...

Cydral · 2024-09-30T13:31:22Z

Something went wrong when I retrieved the latest versions after the integration of the tril_ layer... I just deleted the accesses to tril_ in the positional_encodings branch, in the dnn.ccp file. I hope that when the merge is done, nothing will disappear in the master branch.

arrufat · 2024-09-30T13:42:11Z

dlib/test/dnn.cpp

-// ----------------------------------------------------------------------------------------
-
-    void test_tril()
-    {
-        print_spinner();        
-        using net_type = tag1<tril_mask<tag2<input<matrix<float>>>>>;
-        net_type net;
-
-        // Input tensor
-        dlib::rand rnd;
-        const int nr = 2, nc = 3;
-        constexpr int n_samples = 3, k = 1;
-        std::vector<matrix<float>> x(n_samples);
-        matrix<float> xtmp(nr, nc);
-        for (int ii = 0; ii < n_samples; ++ii) {
-            for (int jj = 0; jj < nr; ++jj)
-                for (int kk = 0; kk < nc; ++kk)
-                    xtmp(jj, kk) = rnd.get_random_gaussian();
-            x[ii] = xtmp;
-        }
-
-        // Convert input matrix to tensor
-        resizable_tensor input_tensor;
-        net.to_tensor(&x[0], &x[0] + n_samples, input_tensor);
-        net.forward(input_tensor);
-
-        // Expected output tensor (manually set for comparison)
-        resizable_tensor expected_output;
-        expected_output.copy_size(input_tensor);
-        tt::copy_tensor(false, expected_output, 0, input_tensor, 0, input_tensor.k());
-        for (int ii = 0; ii < n_samples; ++ii) {
-            expected_output.host()[tensor_index(expected_output, ii, 0, 0, 1)] = -std::numeric_limits<float>::infinity();
-            expected_output.host()[tensor_index(expected_output, ii, 0, 0, 2)] = -std::numeric_limits<float>::infinity();
-            expected_output.host()[tensor_index(expected_output, ii, 0, 1, 2)] = -std::numeric_limits<float>::infinity();
-        }
-
-        // Compare output tensor with expected output
-        auto& net_output = layer<tag1>(net).get_output();
-        DLIB_TEST(max(abs(mat(net_output) - mat(expected_output))) < 1e-5);
-    }
-


This will disappear from master when merging.

Damn, the problem is that I can't manage to merge the master branch in my fork branches, which contains tril_... if only the tril_ tests are missing from dnn.cpp, I can put them back once all the synchronisation has been done.

As I can't get the master version at Dlib level (i.e. with the recently integrated tril_ class), and after resolving the conflicts, I'm now trying to add the definitions for tril_ once again. If the tests pass, we'll have to merge to see if everything works correctly. If it's confirmed, I'll do the same for all the other PRs.

Well, that seems to work. So I'm going to do the same with the other PRs

Yeah I merge PRs by squashing the PR into one commit and adding it to master, so master will always be fine regardless. You can (and probably should here since things have gotten complicated) squash your branch into one commit too. Having merge commits when you are working with multiple branches is a huge pain, as you are experiencing :D

There are a bunch of ways to squash a branch into one commit. I like to use git reset --soft to just undo all the commits and then recommit them. But there is git merge --squash and probably several other ways. Make a backup branch before you do anything though if you aren't real familiar with what I'm talking about though.

OK, but normally you can proceed because I've manually caught up the ‘mixed’ code from the different branches. When everything has been placed under the master branch, I'll remove the branches from my forks to resynchronise the fork and start again from a clean base. All that's left is to rework the last PR for softmaxm, which I'll do today.

@davis, all the "alignments" have been made to enable the merge to go ahead.

Cydral · 2024-10-18T20:17:15Z

@davis, There were indeed new conflicts following the integration of the last class that you dealt with. It looks good now, but could you please go through the integration in chronological order and thus consider frist this new class instead. If it's OK and integrated, I'll make the following changes to avoid going back and forth and I'll keep you informed. Thank you in advance for your help and support.

davisking · 2024-10-23T02:57:27Z

Ah you have a merge error or something. Check the PR contents, it's missing all your code changes :(

Cydral and others added 8 commits September 16, 2024 13:45

Fix Stride Indexing Bugs in reorg and reorg_gradient Functions (C…

4d698fc

…PU & CUDA) and Add `add_to` Parameter

'add_to' parameter missing in cuda call reorg_gradient.launch_kernel()

1d73b6c

Merge branch 'refs/heads/master' into Cydral-master

724ec09

fix build error

4dca9b2

Adjust comment formatting to be like other dlib comments

2f68a11

Merge branch 'davisking:master' into master

64e3471

Add positional encodings layer to Dlib

640c02f

Cydral changed the title ~~Add positional encodings layer to Dlib~~ Add positional_encodings_ layer to Dlib Sep 24, 2024

Merge branch 'master' into positional-encodings-layer

d5b6430

Update

2efcf47

arrufat reviewed Sep 30, 2024

View reviewed changes

Cydral added 3 commits September 30, 2024 16:20

Resynchronisation with tril_ class

f3c2594

Merge branch 'master' into positional-encodings-layer

5697aae

removing duplicates

34925bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add positional_encodings_ layer to Dlib #3019

Add positional_encodings_ layer to Dlib #3019

Cydral commented Sep 24, 2024

pfeatherstone commented Sep 25, 2024

Cydral commented Sep 25, 2024 •

edited

Loading

arrufat commented Sep 25, 2024

Cydral commented Sep 25, 2024

davisking commented Sep 30, 2024

Cydral commented Sep 30, 2024

arrufat commented Sep 30, 2024

Cydral commented Sep 30, 2024

arrufat Sep 30, 2024

Cydral Sep 30, 2024

Cydral Sep 30, 2024

Cydral Sep 30, 2024

davisking Oct 1, 2024

Cydral Oct 1, 2024

Cydral Oct 3, 2024

Cydral commented Oct 18, 2024

davisking commented Oct 23, 2024

Add positional_encodings_ layer to Dlib #3019

Are you sure you want to change the base?

Add positional_encodings_ layer to Dlib #3019

Conversation

Cydral commented Sep 24, 2024

pfeatherstone commented Sep 25, 2024

Cydral commented Sep 25, 2024 • edited Loading

arrufat commented Sep 25, 2024

Cydral commented Sep 25, 2024

davisking commented Sep 30, 2024

Cydral commented Sep 30, 2024

arrufat commented Sep 30, 2024

Cydral commented Sep 30, 2024

arrufat Sep 30, 2024

Choose a reason for hiding this comment

Cydral Sep 30, 2024

Choose a reason for hiding this comment

Cydral Sep 30, 2024

Choose a reason for hiding this comment

Cydral Sep 30, 2024

Choose a reason for hiding this comment

davisking Oct 1, 2024

Choose a reason for hiding this comment

Cydral Oct 1, 2024

Choose a reason for hiding this comment

Cydral Oct 3, 2024

Choose a reason for hiding this comment

Cydral commented Oct 18, 2024

davisking commented Oct 23, 2024

Cydral commented Sep 25, 2024 •

edited

Loading