Pre-trained models & data sourcing #66
Replies: 16 comments 33 replies
-
@lucidrains Hey I know you aren't actively open-source researching but I just wanted to let you know that the linear attention layers really paid off! Of to train the transformer! :) Early results but during training a smaller dataset, it took around 6 epochs to reach 3 loss and using the 14k models dataset it reached it about 1 epoch, that is a sign that the transformer scales as well with data. |
Beta Was this translation helpful? Give feedback.
-
Hi @MarcusLoppe thanks for the great work : ) |
Beta Was this translation helpful? Give feedback.
-
Hi @MarcusLoppe Excellent work.Can you share the dataset to Google Drive?My country's internet is bad. Thank you very much. : ) |
Beta Was this translation helpful? Give feedback.
-
Thank you for your generous sharing! I read all your posts in discussion area and get inspired a lot. I'm especially impressed by your enthusiam to work on this with pretty limited computing resources. I always take insufficient computing resources as an excuse for my lazyness, your persistent trials are really a good example to me! |
Beta Was this translation helpful? Give feedback.
-
Besides, I wonder if there are any tutorials or reference code on how to simplify the mesh, i.e. reducing the face numbers from more than 20k to less than 800 so that training is possible. Any help on this will be much appreciated! |
Beta Was this translation helpful? Give feedback.
-
Hi @MarcusLoppe, this is really exciting stuff! Referring to the image in your first post on this page; was it made with models that were part of the training data? Or was it from a separate set of validation/test data? I did a comparison with the demo_mesh models (loading the autoencoder from your 0.47 checkpoint), failing to reproduce the mesh. Training on the (augmented) demo_mesh models the corresponding result was (as expected) almost flawless. |
Beta Was this translation helpful? Give feedback.
-
Hey @MarcusLoppe great work on using this repository to reproduce/improve upon the results from the meshgpt paper, at least with respect to the autoencoder training. I tried to reproduce your results, using your dataset from the google drive and the MeshGPT_demo.ipynb notebook (I just hacked in the npz dataset and did not do any data duplication). I managed ~50 epochs in 24 hours, however, I only managed to get to a reconstruction loss around 1.0. Especially after the first 10 epochs training slows down considerably. Even now after 48 hours, I'm only looking at a loss around 0.7. Is there any other trick you use to get to 0.36 loss in 24 hours, or are you simply using a more powerful GPU that can do more epochs in the same time? |
Beta Was this translation helpful? Give feedback.
-
Hi @MarcusLoppe , thanks for your great work! |
Beta Was this translation helpful? Give feedback.
-
@MarcusLoppe Are you on discord? It would be great to have a discord server for all the people interested in extending MeshGPT. My discord id is |
Beta Was this translation helpful? Give feedback.
-
Hi @MarcusLoppe Thank you for the great work and the checkpoints, I was wondering what are the augmentations that you utilized to increase the dataset size |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the wonderful contribution! It's a little bit late to be involved in the discussion, but I am just reproducing the results with your provided pre-trained autoencoder and GPT-2 small/medium, getting something like the figure here with text prompt only. Am I missing something important or is it still require category-specific fine-tunning? I am currently holding 8 free 4090, and I am wondering if would you still be interested in collaborating in training it? I would very much like to share the weights as well. |
Beta Was this translation helpful? Give feedback.
-
已收到你的邮件。
|
Beta Was this translation helpful? Give feedback.
-
Hello @MarcusLoppe , This is a very interesting projects. well done. I have tried your model and it is impressive however if you input more than a single words and it weirdly breaks everything. Any idea why this happens. For example this is the result for "tree": and this one for "a tall tree": this is the result for "a tall tree with small leaves": |
Beta Was this translation helpful? Give feedback.
-
Hi, I have read part of the code about MeshAutoencoder and I would like to ask you about the encoder encode section, how the input vertices, faces, face_edges, face_mask and face_edges_mask were processed. In other words, what happens when you load the mesh model to extract features from encode normally. I tried to emulate your code using pytorch, but I ran into a lot of difficulties, most importantly calculating edges and connecting vertex data to face data. Please help me. @MarcusLoppe |
Beta Was this translation helpful? Give feedback.
-
Hi, does anyone know if there is a discord server? Could you please invite me into it? I am also very interested in extending MeshGPT. Thanks very much. |
Beta Was this translation helpful? Give feedback.
-
已收到你的邮件。
|
Beta Was this translation helpful? Give feedback.
-
There are tons of mesh models that are free for research usage, however the mesh models are either too big or hard too download.
In this thread I'll explain where I source my meshes from and share early results of a pre-trained model.
I'm hoping that this will open up a discussion to accelerate this project, please comment with your thoughts and your training process.
Training book available in my fork: MarcusLoppe/meshgpt-pytorch
Training data:
Currently I'm only training on mesh with less then 250 triangles due training on kaggles free GPU.
I managed to get around 1184 models from ShapeNet + ModelNet40 but it wasn't enough data for the model to generalize.
I wanted to download and use Objaverse but it's around 8.9TB which I don't have space for, so using my script-kiddie skills I managed to download the dataset with file limit specifications Link.
I downloaded all the models from 2 to 40KB, this resulted in 37k models which I filtered out all meshes above 250 faces and got a total of 13.5k models!
Using the 1184 models from ShapeNet & ModelNet and the 13.5k from Objaverse, I augmented them 15 times and got a dataset of 218k meshes.
Results:
I've experimented with different type of codebook sizes and model setups, I've uploaded the models I think will perform well.
Scaling up the model results in better performance but since I've only trained on 186M tokens it's probably a over-parametrized model.
Training times using a single P100:
Auto-encoder: around 30hrs+ (12hrs to 0.6 then it was bit slower)
Transformers: Varies but I think it was about more then 48hrs each, I stopped when the learning progression slowed down and improved only 0.06 per epoch which made it unreasonable to train further using a single GPU.
Available models /datasets:
Auto-encoder 51M parameters: mesh-encoder_16k_2_4_0.339.pt
Transformer GPT-2 small - 141M parameters: mesh-transformer.16k_768_12_12_loss_2.335.pt
Transformer GPT-2 small/medium - 321M parameters: mesh-transformer_16k_768_24_16_loss_2.147.pt
Dataset 186M tokens: objverse_shapenet_modelnet_max_250faces_186M_tokens.npz
Finetune dataset 14.1M tokens: shapenet_25_x50_finetune_dataset.npz
https://drive.google.com/drive/folders/1C1l5QrCtg9UulMJE5n_on4A9O9Gn0CC5?usp=sharing
Thoughts:
Since the Objaverse uses a variety of meshes from minecraft characters to furnitures I have a high confidence that it will be able to encode and decode all kinds of shapes it haven't seen before.
I've attached a image below of the output & the ground truth, the goal of this model is to ensure that the encoder and the decoder can talk to each other using the codebook.
When the auto-encoder can compress the mesh structure into codes that are generalizable and not over-fitted of a dataset, I believe then is when we can see some awesome results!
This model is currently training and will do so for a few 12hr session, please let me know if you have access to better GPU's so we can get his project up and running for real!
I decided on using a 16k codebook size since a hypothesis I have is that using a small codebook will make i the transformers job harder. Imagine if you have a 2k vs 16k codebook and the transformer is generating the tokens for the base of the chair, for each token generated using the 2k codebook it might have the option of 1 or 2 tokens to generate a good base, but if you where using a 16k codebook it might have the option of 1-6 tokens that will result in also a good base.
Data sources:
ModelNet40: https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset/data
ShapeNet - Extracted model labels Repository: https://huggingface.co/datasets/ShapeNet/shapenetcore-gltf
Objaverse - Downloader Repository: https://huggingface.co/datasets/allenai/objaverse
Results, it's about 14k models so with the limited training time and hardware It's a great result.
Beta Was this translation helpful? Give feedback.
All reactions