This is a project with Hao and Jeeheh.
Our goal is to create storybooks generated using generative AI models
- LLMs for story text generation
- Stable Diffusion for illustrations (either picture or video)
- (Optional) Text to speech narration of the story
We also plan to support
- putting custom toys/kid into story teller using text inversion
Currently we have a utility to create animation video from a driving video. It depends on the automatic1111 api:
on mac to start a server, use
bash webui.sh --no-half --api
then generate videos as follows
python video2video.py
python create_video_from_images.py
The source/driving video:
example.mp4
With the prompt of "tiger" and negative prompt of "worst-quality", the output video when using txt2img:
example_output.mov
the output video when using img2img:
output.mov
python story2illustration.py
Currently doesn't handle negative prompt. Upon some thought, I think story creation should be interactive: that is have a chat with chatgpt to collaboratively create the story and then use stable diffusion to illustrate the story.
Prompt: generate a kid story about a tiger
ChatGPT response with Stable Diffusion illustration (breakdomainrealistic_M2050.safetensors [4f0d8aacd3] with negative prompt of "worst-quality"):
Once upon a time, in a lush green forest, there lived a friendly and curious tiger named Tilly. Tilly was unlike any other tiger in the forest. Instead of being fierce and intimidating, she had a gentle nature and a heart full of kindness.
Tilly loved exploring the forest and meeting new friends. She would spend her days hopping from one tree branch to another, carefully observing the vibrant colors of the forest and the playful antics of the animals around her.
One sunny morning, as Tilly was strolling near a sparkling river, she heard a faint cry for help. She followed the sound and discovered a little squirrel named Sammy, stuck on a tall tree branch.
Without hesitation, Tilly sprang into action. She carefully climbed up the tree, using her strong claws, and gently rescued Sammy. From that moment on, Tilly and Sammy became inseparable friends.
News of Tilly's kind-hearted nature spread throughout the forest, and animals from far and wide sought her help. Whenever someone was in trouble or needed guidance, they would approach Tilly, knowing she would offer a helping paw.
One day, a group of lost baby birds found their way to Tilly. They were tired and hungry, unable to find their way back to their nest. Tilly comforted them with her warm presence and led them safely back home. The grateful bird parents chirped joyfully and thanked Tilly for her compassion.
As time passed, Tilly's reputation as a kind and helpful tiger grew. Animals from neighboring forests would come seeking her assistance, and Tilly never turned them away. She dedicated herself to protecting the weak, teaching the young, and spreading love and harmony among all creatures.
The forest became a peaceful haven, thanks to Tilly's unwavering commitment to kindness. Animals of all kinds lived together in harmony, sharing their stories and celebrating their differences.
And so, the story of Tilly, the gentle tiger, became a legend in the forest. Her acts of compassion and bravery were passed down from generation to generation, reminding all the animals of the power of kindness and the importance of helping others.
And as long as there were tigers like Tilly, the forest remained a place where friendship, love, and acceptance flourished, creating a beautiful world for everyone to enjoy.
TODO: text -> image -> image ... -> image
The image to image should ideally consider movement. An simple way is to take an open source text to video model and use Stable Diffusion to enhance the images
use the function in lib/utils.py to download some images
download_ddgs_image_search('jakie chen')
then align images with
python align_faces.py -d output/ddgs_images/jakie_chen/2.jpg -i output/ddgs_images/jakie_chen -o output/jakie_chen
output.mov
use the "-t" setting to filter out disimilar images for better result (default set to None: e.g., no filtering); Good value is something smaller than 0.4