Current video editing methods predominantly rely on text-driven approaches, focusing on altering content using textual descriptions. However, these methods often struggle with semantic homogeneity and temporal inconsistency, particularly in editing video scenescapes. To address these challenges, we introduce ``AudioScenic'', an audio-driven framework for video scenescape editing. Unlike existing methods, AudioScenic leverages the unique properties of audio – semantic diversity, magnitude control, and frequency alignment – to guide the editing process while preserving the foreground content. Our framework employs audio semantic embeddings to edit the video scenescapes and presents a mask blending module to restrict the audio embedding’s influence exclusively to the scenescape areas of videos. We introduce an audio magnitude-aware module for controlling the editing effect. Additionally, we integrate an audio frequency fusion module to ensure temporal alignment between the edited scenescapes and the audio conditions, improving the temporal coherence of synthesized results. Our approach enhances the visual diversity and maintains temporal consistency. We demonstrate the effectiveness of our method through extensive experiments, showing significant improvements over existing text-driven and audio-driven models in video scenescape editing.
generated from eliahuhorwitz/Academic-project-page-template
-
Notifications
You must be signed in to change notification settings - Fork 0
jiajiaxiaoskx/AudioScenic
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published