High memory usage when embedding large texts #222

Barney241 · 2024-05-01T13:55:08Z

Hi i am experiencing high memory usages which caused my pod to be killed because of exeeding its limits. after some experiments i found its related to text length that i am trying to embed.

when using models such as e5 large or paraphrase base v2 the program starts with about 1.5GB of ram usage which is expected for this model, but after some iterations this process uses 16GB of virtual memory and 6GB of ram. which is a lot

I tried other smaller models as well and its happening for them too but in smaller scale. So my intuiton was that i am just using texts that are too long so i cut all to max 100 chars. which slowed the ram usage increase, but still if not using constant batch of text it kept growing untill the program was eventually killed.

For more context i am building vector api that returns embeddings for texts. so model sessions is active from the start.

To reproduce you can use this array of texts with e5 large model. in default example with infinite loop

[
            "Spain",
            "Spain,Barcelona,Avignon,Maspalomas,Disfrutar,Bohemia Suites & Spa,Mandarin Oriental, Barcelona,Restaurant Milano",
            "",
            "Seattle's best vegan restaurants",
            "Seattle,The Wayward Vegan,Lin Handmade Dumplings & Noodles,The Sushi Samurai (100% Vegan),Harvest Beat,Cafe Flora,Araya's Place,Plum Bistro,Georgetown Liquor Company,Pi Vegan Pizzeria,Cycle Dogs",
            "Must try vegan restaurants when visiting Seattle!",
            "10 must visit breweries in Seattle!",
            "Seattle,Great Notion Brewing - Ballard,Bizarre Brewing,Stoup Brewing - Ballard,Reuben's Brews - The Ballard Taproom,Urban Family Brewing Co.,Fremont Brewing's Urban Beer Garden,Seapine Brewing Company,Obec Brewing,Cloudburst Brewing,Holy Mountain Brewing Company",
            "Seattle is a mecca for incredible beer, here are 10 must visit breweries!",
            "Newport",
            "Newport,Elizabeth Oceanfront Suites,Rogue Bayfront Public House,Local Ocean Seafoods",
            "",
            "Iceland",
            "Iceland,Seljalandsfoss,Skaftafell",
            "",
            "Paris",
            "Paris,Eiffel Tower,Place de la Concorde,Jardin des Plantes",
            "",
            "Weekend in Tokyo",
            "Tokyo,Tokyo,Tokyo Metropolitan Government Building,Circus Tokyo,Asakura Museum of Sculpture,Kanae,Hotel Sunroute Plaza Shinjuku,Inshotei,Nata de Cristiano,GYOPAO Gyoza Roppongi,d47 SHOKUDO,Shibuya Scramble Crossing,Little Nap Coffee Stand,Shinjuku Golden Gai,Sarutahiko Coffee,Mag's Park Rooftop Shibuya Crossing,Ueno Toshogu Shrine,Kayaba Coffee,Sakurai Japanese Tea Experience,Ōkuma Shōkai,Meiji Jingu",
            "",
            "The Best Cheap Eats in Tokyo",
            "Tokyo,Tokyo,Yamatarō,Karaage Hitosuji,Butagumi Shokudo,Torikatsu Chicken Shibuya,Niigata Katsudon Tarekatsu Shibuya,Matsuya,YAKINIKU LIKE Shibuya Udagawacho,Ginza Shabutsu,Hakata Furyu Akihabara flagship shop,Pork Vindaloo Taberu Fukudaitoryo",
            "",
            "Port Clyde",
            "Port Clyde,Saint George,Boothbay,Bristol,Rockland,Rockland Café,Monhegan Boat Line,Pemaquid Point Lighthouse,Ocean Point Walk,Marshall Point Lighthouse",
            "",
            "Las Vegas",
            "Las Vegas,Las Vegas,Shin Lim - Limitless,Tacos El Gordo,High Roller,Fountains of Bellagio,M&M’S Las Vegas,SAHARA Las Vegas,Bavette's Steakhouse & Bar,Delmonico Steakhouse,Skybar,Shark Reef Aquarium at Mandalay Bay,Fremont Street Experience,Las Vegas Hilton at Resorts World,Secret Pizza,Allegiant Stadium,Eataly",
            "",
            "Hershey",
            "Hershey,The Mill in Hershey,Hershey Gardens,The Hotel Hershey",
            "",
            "Paia - Best beaches ",
            "Paia,Paia Inn,Paia Fish Market Restaurant,Milagros Food Co.,Paia Surf Lessons,Paia Bay,Mana Foods,Baldwin Beach Park,Café Des Amis,Flatbread Company",
            "",
            "24 Wild Hours on the Las Vegas Strip",
            "Las Vegas Strip,Las Vegas,Garden of the Gods Pool Oasis,High Roller,The Shops at Crystals,MGM Grand,Marquee Nightclub,Shin Lim - Limitless,Ghost Donkey,Tacos El Gordo,Momofuku,Eggslut,WAKUDA,Sphere",
            "",
            "10 Artsy Attractions in Barcelona",
            "Barcelona,Fundació Antoni Tàpies,Museu Nacional d'Art de Catalunya,Picasso Museum,Casa Batlló,CaixaForum Barcelona,El Gat de Botero,Moco Museum Barcelona,La Pedrera-Casa Milà,Barcelona Museum of Contemporary Art,Joan Miró Foundation",
            "",
            "Las Vegas",
            "Las Vegas,Las Vegas,Red Rock Canyon National Conservation Area,Fremont Street Experience,Fountains of Bellagio",
            "",
            "5 Incredible Dubrovnik Eats with Epic Views",
            "Dubrovnik,Komolac,Bistro Izvor,Nautika,Restaurant 360,Komarda,Restaurant Panorama",
            "",
            "Midweek Mammoth Ski Trip",
            "Mammoth Lakes,Bishop,Mammoth Mountain,The Stove,Erick Schat's Bakery,Mammoth Mountain Inn,Convict Lake,Roberto's Cafe,Holy Smoke Texas Style BBQ",
            "",
            "Oregon",
            "Oregon,Eugene,Columbia River Gorge National Scenic Area,Multnomah Falls,Hult Center for the Performing Arts",
            "",
            "Maldives",
            "Maldives,Malé,Hulhumalé,Fihalholi,Hulhule Island,Velassaru,Naladhu Private Island Maldives,Ithaa Undersea Restaurant,Seagull Café,Fish Market,Hulhumale,Sea House Café,MU Beach Bar & Grill",
            "",
            "10 must try breweries in Seattle!",
            "Seattle,Fremont Brewing's Urban Beer Garden,Holy Mountain Brewing Company,Cloudburst Brewing,Standard Brewing,Great Notion Brewing - Ballard,Seapine Brewing Company,Bizarre Brewing,Obec Brewing,Stoup Brewing - Ballard,Reuben's Brews - The Ballard Taproom",
            "Seattle is a mecca for incredible beer! Here are the breweries that are a must visit while here!",
            "Japan",
            "Japan,Sapporo,Kyoto,Hikone,Lake Biwa,Funaoka Onsen,Hōheikyō Hot Spring",
            "",
            "The 10 Best Outdoor Adventures in Dubrovnik",
            "Dubrovnik,Studenci,Dubrovnik Game of Thrones Tour,Dubrovnik Local Guides - City Walls Walking Tour,Lokrum,Dubrovnik Local Guides - Old Town Walking Tour,Blue Cave Dubrovnik | Official tours,Plaža Sveti Jakov,Dubrovnik Elafiti Cruise,반예비치,Betina Cave beach,Kravica Waterfall",
            "",
            "The Perfect Barcelona Weekend",
            "Barcelona,Moog,SlowMov,Banksy museum - Barcelona,Can Paixano,Entrepans Díaz,Bo de B,Barceloneta Beach,Palau de la Música Catalana,Funky Eatery,Nomad Coffee - Lab & Shop,Majestic Hotel & Spa Barcelona,La Sagrada Familia,Mercado de La Boqueria,Park Güell,La Pepita,Three Marks Coffee",
            "",
            "Japan",
            "Japan,Tokyo,Osaka,Kyoto,Urayasu,Nisekohirafu 2 Jo,Arashiyama Bamboo Forest,Aman Tokyo,Tokyo Disneyland,Afuri,OKO - Fun Okonomiyaki Bar,Dotonbori River,Sukiyabashi Jiro,Imperial Palace",
            "",
            "Aruba",
            "Aruba,Noord,Oranjestad,West Deck,Buccaneer Restaurant,Eagle Beach,Hyatt Regency Aruba Resort Spa And Casino,Palm Beach,Malmok Beach",
            "",
            "Czechia drone map",
            "Czechia,Štěchovice,Pňovany,Brno,Most,Nalžovské Hory,Velké Karlovice,Kovářov,Velemín,Bartošovice v Orlických horách,Kdyně,Seč,Dvůr Králové nad Labem,Dolní Morava,Chrudim,Jílové,Nechanice,Skalná,Třemošnice,Černý Důl,Turnov,Chudenice,Oslov,Libošovice,Velká Jesenice,Jílové u Prahy,Svojanov,Rabyně,Místo,Přítluky,Žleby,Nové Dvory,Točník,Podolí I,Heřmanice,Klapý,Nedvědice,Bříství,Klenová,Ráby,Zřícenina hradu Ballymote,Kost,Čarták lookout (Súkenická viewtower),Most u Červené nad Vltavou,Milešovka,Royal Forest Dam,Vyhlídka na přehradu,Veveří Castle,castle Svojanov,Nové Mlýny,Hospital Kuks,Bridge Pňovany,Hrádek u Nechanic,Vyhlídka Bednář,Ruins of the Chapel Exaltation. Crisis,Vyhlídka Neratov,Točník Castle,Nákladní lanovka Černý důl,Podolský most,Žampach viaduct,lookout Rýzmberk,Hněvín,Lichnice Castle ruins,Viewpoint Máj,Obora Žleby,Klenová - castle, chateau, gallery,lookout Bolfánek,Pernštejn Castle,Rock lookout Hlavatice,Žďákov Bridge,Seč Dam,Vildštejn Castle,Kunětice Mountain,Hazmburk,Hasištejn Castle,Rozkoš Lake,Velhartice,Železniční viadukt,Stezka v oblacích,Děčínský Sněžník,Kačina,Rozhledna Bára",
            "flying from castle park is prohibited, but never had problem to get permission from the castle,tip fromhttps://www.facebook.com/groups/letamesdrony/permalink/3295302460761543,tip fromhttps://www.facebook.com/groups/letamesdrony/posts/3295168087441647/,tip fromhttps://www.facebook.com/groups/letamesdrony/posts/3294486957509760/,https://www.youtube.com/watch?v=uh5Jv12hn7c&ab_channel=siamifpv",
            "10 Iconic Portland Eats ",
            "Portland,Hat Yai,Departure Restaurant + Lounge,Stumptown Coffee Roasters,Le Pigeon,Lil' America,Pine State Biscuits | NW 23rd,Great Notion Brewing - NW28th,Nong's Khao Man Gai (Downtown),Matt's BBQ,Voodoo Doughnut - Old Town",
            "",
            "Hottest new restaurants in Seattle!",
            "Seattle,Kirkland,Burien,Sap Sap Lao Deli & Cafe,Sushi Suzuki,Layers Green Lake,El Encanto,Ballard Beer Box,Ananas Pizzeria,Watson's Counter,Donna's,Stevie's Famous Burien,Miss Pho,Ben's Bread Co.,Sammich Seattle,G.H PASTA Co.,The Boat,Gold Coast Ghal Kitchen,FamilyFriend",
            "Come check out the hottest new restaurants in Seattle.",
            "1 Week in Los Angeles",
            "Los Angeles,Avalon,Hermosa Beach,Pasadena,Long Beach,Beverly Hills,Manhattan Beach,Central LA,San Marino,ROW DTLA,Stahl House,Mt Ada,Griffith Park,Tail O' the Pup,The Westin Long Beach,In-N-Out Burger,Catalina Express,Beverly Hills Park,Book Soup,Descanso Beach Club,Pages: a bookstore,Rose Bowl Flea Market,curious...,Pizzana,Uncorked The Wine Shop,The Shay - Destination by Hyatt,Avalon Hotel Beverly Hills,Gyu-Kaku Japanese BBQ,Anderton Court Shops - Frank Lloyd Wright,Airplane Landing View Point,Gum Tree Shop & Café,Catalina Museum For Art & History,Los Angeles County Museum of Art,Martha’s Hermosa Beach,Griffith Observatory,Catalina Casino,The Langham Huntington, Pasadena,Veterans Parkway (Hermosa Valley Greenbelt),Hollywood Walk of Fame,Clark Street Diner,Sunset Boulevard,Vroman's Bookstore,Dialog Cafe,HomeState,The Huntington Library, Art Museum, and Botanical Gardens,Avalon Grille,Los Angeles International Airport,Vista",
            "With 7 days you can see LA the way many first-time visitors do, as well as dig a little deeper. Hope you enjoy!This is a shopping day! Start at Gum Tree and if you need coffee, get it here before browsing their shop. I recommend a Dirty Chai or one of their seasonal specialty drinks.,Reward yourself for the early exercise with a White Corn Omelette and Bloody Mary at Martha's. Since it is mere steps from the beach, walk down to it to put your feed in the sand afterward.,You'll find all the quirky and super fun gifts at this LGBTQIA-friendly shop that always has a wonderful display in their window for PRIDE.,Go here before you tuck into bed for the night. Urban Light in the evening has its own special vibe and since it's outside the museum entrance, it's accessible 24/7/365.,Leave time today or tomorrow for the pool at this boutique hotel with interior design by Kelly Wearstler. ,After taking the ferry back to the mainland, get a final taste of California. The In-N-Out by LAX brings the bonus of being right under a landing zone so you watch planes land while enjoying your cheeseburger animal style. ,While watching planes land, take off your shoes. Walking on grass with bare feet is supposed to help you before and after flying.,Safe travels and enjoy your flight home!,Time your visit to arrive on a Saturday before the Rose Bowl Flea Market the next day. It takes place every second Sunday of the month. Head straight to The Langham and check-in. Take advantage of their amazing Chuan Spa to unwind after your flight.,Wake up super early and go shopping at the Flea. The early bird truly gets the worm and the best finds. It's like an LA right of passage and a great way to pick up a souvenir you literally can't find anywhere else.,Reward yourself for waking up early by heading here after the Flea. Nosh on a chorizo taco, queso and a Ranchwater or Paloma for maximum satisfaction. They have vegetarian options too!,Spend the balance of your day in the stacks finding your next read before you start the [possibly long - LA traffic!] drive to your next hotel.,This will be a packed day! So pick up some caffeine, if that's your thing, at the Alfred Coffee on the corner of Beverly Drive and Santa Monica to fuel up. Then walk across the street and stroll the park, headed in the direction of the Beverly Hills Sign.,Look for the angled circular staircase and tower by Frank Lloyd Wright that's hiding in plain sight here. If it's open, take a walk up to the top for a nice view of Rodeo Drive.,A cold Asahi or Sapporo goes well with their BBQ.,Try to catch sunset here in the park or at the top by the Griffith Observatory. Top tip if you have a rental car: since there is never enough parking up by the Observatory, look for the shuttle that allows you to park near the Greek Theater and get a ride up the hill. If not, Uber straight to the top.,Make sure to head inside and see the Tesla Coil if science is your thing!,Enjoy Happy Hour here where they have wine flights and a great selection of bottles you could take home to your hotel to enjoy before leaving California.,Start the day early with a wonderful walk on this greenbelt that goes through the beach communities of the South Bay.,If you want to be so close to the beach you could touch it for Happy Hour, or dinner, go to Vista instead of Uncorked. Or start at Uncorked and finish your day with dinner at Vista!,Pick up a book to read at the beach club tomorrow!,Stay here for the amazing happy hour, views over Catalina, the included golf carts so you can explore the island at leisure and the wonderfully sumptuous guest rooms.,Take the casino tour where you will understand why there's no gambling in this island landmark!,If you don't yet have a book for the plane ride home, this is a great place to pick one up. Or a second one!,Grab lunch at this landmark that harkens back to Hollywood's golden eras. ,You have to get tickets to tour this home before you arrive in Los Angeles but it's worth planning ahead for!,You can't leave Los Angeles without trying the pizza. It gives New York a run for its money and this is one of the best spots for it in LA.,Because you have to visit the stars when you're in Los Angeles, right?,Stay at The Westin so you're close to the Long Beach ferry to Catalina Island which you will want to board early the next day! It has a Travel Sustainability Rating of Level 3 according to booking.com!,Take an early ferry to Catalina Island to spend the day and night at California's version of The Riviera.,Visit this museum so you understand the connection the Chicago Cubs and Marilyn Monroe have to the island.,Enjoy dinner here before heading up the hill to Mt. Ada for a wonderful night's rest.,Lay out on the beach or enjoy the delicious tacos at their restaurant. If you're feeling the lux vibe, reserve a cabana. You can also snorkel or kayak in the bay.,Sit down for brinner, aka: breakfast for dinner, at this locally-loved but Tik Tok-friendly diner that's on the way from the Walk of Fame to Griffith Park.,Grab breakfast or brunch at this West Hollywood classic.,If you're not too tired from walking the Flea, head to this garden which is one of the most beautiful in all of Los Angeles.,If you can't get tickets to Stahl House or you want to experience another part of LA, check out ROW which is shops and restaurants that represent the best of creative LA. You can also feast on excellent pizza here, instead of Pizzana, at Pizzeria Bianco.,Take in the The Sunset Strip and do some shopping before returning to Book Soup. ,This hotel is incredibly situated whether you're driving yourself, taking public transportation or grabbing ubers during your visit to Los Angeles. It's also across the street from The Platform which is a super hot outdoor shopping and events complex.",
            "Beach Trip on Paros Island",
            "Paros,Santa Maria,Parasporos,Naousa,Chrisi Akti,Pounta,Kolimpithres,Kolympethres Beach,Parasporos Beach Club,Kalogeros beach,Faragas,Santa Maria,Paros Park,Port of Pounda,Παραλία Παρασπόρος,Golden Beach Paros,Piperi Beach,Santa Maria Beach Bar",
            "Famous for being the best overall beach on Paros!,Cool rocks and views.,Be sure to check out the healing mud.,Considered the best beach in Naoussa.,Rocks, caves, hiking and more!,Parasporos Beach is the best beach near Parikia.,Great spot for windsurfing!,Ferry to Antiparos",
            ]

The text was updated successfully, but these errors were encountered:

NirantK · 2024-05-10T07:01:35Z

There might be a few things at play here:

Size of the text — we load everything in memory, both the text and it's vectors. This means as vectors get computed, there is more RAM consumption which keeps increasing.
Data Parallel — we use Python's multi-processing under the hood to split large lists of strings into smaller and process in parallel. This can lead to large RAM consumption as well sometimes.

Can you give some estimates of 3 things:

How many tokens are there per sentence?
How many sentences are there?
What are the device RAM and CPU specs?

Barney241 · 2024-05-13T09:27:56Z

@NirantK

tokens not shure at the moment but in latest testing i used max of 512 characters for e5 large model
single batch had about 130 items
64GB RAM, AMD Ryzen 9 5950X 16-Core Processor

But the main problem i have is that after processing of batch is done the memory is not released. it only keeps growing untill the process gets killed. i think that the problem is in onnx session when using continuosly without recreating. when i use the model dirrectly from Hugging face, there is no growth of memory and no problem

NirantK · 2024-05-14T02:04:36Z

Thanks for sharing specifics @Barney241 — very helpful.

i think that the problem is in onnx session when using continuosly without recreating.

This'd be among my areas to investigate as well. Thanks for sharing. I'll look into this as energy permits! I'm leaving this issue open and marking it as a bug because of the impact it has on how usable FastEmbed is.

johnreyev · 2024-09-06T03:52:18Z

Hi, any updates on this? We've experienced the same issue.

joein · 2024-09-11T17:17:41Z

Hi @johnreyev

Could you tell us the version of fastembed you're using and maybe a reproducible code snippet?
We've tried to reproduce the issue with the code above, but unsuccessfully, we haven't been able to observe the memory growth

johnreyev · 2024-09-12T02:06:39Z

Hi @joein, you can try this

fastembed==0.3.6
lorem==0.1.1

from timeit import default_timer as timer
import lorem
from fastembed import TextEmbedding

def run():
    model = TextEmbedding(model_name='BAAI/bge-small-en-v1.5')

    # start timer
    start = timer()

    # Generate a large amount of sample text
    docs = [lorem.text() for _ in range(10000)]

    print(f'Each document is {len(docs[0])} chars long.')
    print(f'Embedding {len(docs)} documents...')

    # embed the documents
    embeddings = model.embed(documents=docs, batch_size=512)

    proto_embeddings = [
        embedding.tolist()
        for embedding in embeddings
    ]

    print(f'len(embedding_list) {len(proto_embeddings)}')
    print('Done!')

    # end timer
    end = timer()

    print(f'Time taken: {end - start:.2f} seconds')

if __name__ == '__main__':
    run()

It gets stuck at this:

For the device spec I'm using 14-inch MacBook Pro with M3 Pro

hh-space-invader · 2024-09-12T07:16:10Z

@johnreyev
I would suggest to do this:

for embedding in model.embed(documents=docs, batch_size=512):
    first_batch_result = embedding.tolist()
    # process batch here, e.g., save it to a file or database or whatever

rather than this:

proto_embeddings = [
    embedding.tolist()
    for embedding in embeddings
]

because you are embedding the documents in batches but you save them all in one variable and it might explode your memory when you increase the document length

kartik-gupta-ij mentioned this issue May 13, 2024

FAQ Update qdrant/landing_page#879

Merged

NirantK added the bug Something isn't working label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage when embedding large texts #222

High memory usage when embedding large texts #222

Barney241 commented May 1, 2024

NirantK commented May 10, 2024

Barney241 commented May 13, 2024

NirantK commented May 14, 2024

johnreyev commented Sep 6, 2024

joein commented Sep 11, 2024

johnreyev commented Sep 12, 2024 •

edited

Loading

hh-space-invader commented Sep 12, 2024 •

edited

Loading

High memory usage when embedding large texts #222

High memory usage when embedding large texts #222

Comments

Barney241 commented May 1, 2024

NirantK commented May 10, 2024

Barney241 commented May 13, 2024

NirantK commented May 14, 2024

johnreyev commented Sep 6, 2024

joein commented Sep 11, 2024

johnreyev commented Sep 12, 2024 • edited Loading

hh-space-invader commented Sep 12, 2024 • edited Loading

johnreyev commented Sep 12, 2024 •

edited

Loading

hh-space-invader commented Sep 12, 2024 •

edited

Loading